Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-2451: Add BYTE_STREAM_SPLIT support for FIXED_LEN_BYTE_ARRAY, INT32 and INT64 #1291

Merged
merged 3 commits into from
Apr 26, 2024

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Mar 5, 2024

Implement the format additions described in PARQUET-2414.

Jira

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines
    from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Style

  • My contribution adheres to the code style guidelines and Spotless passes.
    • To apply the necessary changes, run mvn spotless:apply -Pvector-plugins

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

@pitrou pitrou force-pushed the byte-stream-split-extended branch from c1e1aa5 to b3031ad Compare March 5, 2024 18:42
@pitrou
Copy link
Member Author

pitrou commented Mar 5, 2024

@wgtmac @etseidl @gszadovszky This is a draft PR. It works fine on the integration test but still needs unit tests. Since I'm quite new to this codebase (and to Java in general), I welcome any early comments.

@pitrou pitrou force-pushed the byte-stream-split-extended branch from b3031ad to 1f5d488 Compare March 5, 2024 18:50
@pitrou pitrou marked this pull request as ready for review March 7, 2024 09:02
@pitrou
Copy link
Member Author

pitrou commented Mar 7, 2024

This is ready for review now. The only failing test is the interop test, since the required file is not in parquet-testing yet (I have checked it passes locally).

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a look after I am back from vacation.

@gszadovszky Could you help review this, too?

Copy link
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to remove DO NOT MERGE: from the title.

@pitrou pitrou changed the title DO NOT MERGE: Expand BYTE_STREAM_SPLIT to support FIXED_LEN_BYTE_ARRAY, INT32 and INT64 Expand BYTE_STREAM_SPLIT to support FIXED_LEN_BYTE_ARRAY, INT32 and INT64 Mar 18, 2024
@pitrou pitrou force-pushed the byte-stream-split-extended branch from c52e48a to 76ca342 Compare March 18, 2024 15:20
@pitrou pitrou changed the title Expand BYTE_STREAM_SPLIT to support FIXED_LEN_BYTE_ARRAY, INT32 and INT64 PARQUET-2451: Expand BYTE_STREAM_SPLIT to support FIXED_LEN_BYTE_ARRAY, INT32 and INT64 Mar 18, 2024
@pitrou pitrou changed the title PARQUET-2451: Expand BYTE_STREAM_SPLIT to support FIXED_LEN_BYTE_ARRAY, INT32 and INT64 PARQUET-2451: Add BYTE_STREAM_SPLIT support for FIXED_LEN_BYTE_ARRAY, INT32 and INT64 Mar 18, 2024
@pitrou pitrou force-pushed the byte-stream-split-extended branch from 76ca342 to f4a7ce3 Compare March 18, 2024 15:25
@pitrou
Copy link
Member Author

pitrou commented Mar 18, 2024

For the record, I must still address some of the review comments.

@pitrou pitrou force-pushed the byte-stream-split-extended branch 2 times, most recently from f8634b3 to 1ad366f Compare March 18, 2024 17:41
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on my side.

@pitrou pitrou force-pushed the byte-stream-split-extended branch from 1ad366f to 0f7a043 Compare April 3, 2024 14:24
@pitrou pitrou force-pushed the byte-stream-split-extended branch from 884cb22 to 431a025 Compare April 3, 2024 15:12
@pitrou
Copy link
Member Author

pitrou commented Apr 3, 2024

@gszadovszky I think I addressed your comments, could you take another look?

@wgtmac
Copy link
Member

wgtmac commented Apr 23, 2024

@pitrou Do you want to have this in the 1.14.0 release?

Copy link
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response, @pitrou. Thanks for fixing my comments.

@wgtmac wgtmac merged commit 09445b5 into apache:master Apr 26, 2024
9 checks passed
@pitrou pitrou deleted the byte-stream-split-extended branch May 2, 2024 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants