Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add null pages and boundary order (Fixes #92) #94

Merged
merged 2 commits into from
Jul 13, 2023

Conversation

wilwade
Copy link
Member

@wilwade wilwade commented Jul 11, 2023

Problem

Parquet file column indexes are required to have null_pages and boundary_order, but they were missing from Parquetjs generated files.

https://github.com/apache/parquet-format/blob/1603152f8991809e8ad29659dffa224b4284f31b/src/main/thrift/parquet.thrift#L955

Closes #92

Solution

Note: While required, the requirement is not always a hard requirement depending on the library.

Steps to Verify:

  1. Checkout the branch
  2. npm i && npm run build && npm pack
  3. Install parquet cli tools (macOS brew: brew install parquet-cli)
  4. Checkout the bug repo from Missing column index information in generated parquet file #92 https://github.com/noxify/parquetjs_bug/
  5. cd parquetjs_bug/parquetjs && npm i
  6. node index.js && parquet column-index ../generated_files/parquetjs/change.parque will FAIL
  7. npm i ../parquetjs/dsnp-parquetjs-0.0.0.tgz
    8 node index.js && parquet column-index ../generated_files/parquetjs/change.parque will PASS!

Copy link
Collaborator

@enddynayn enddynayn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 great!

@noxify
Copy link

noxify commented Jul 11, 2023

awesome - will test it tomorrow.

Copy link
Collaborator

@shannonwells shannonwells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me!

@wilwade wilwade merged commit 43732c5 into main Jul 13, 2023
1 check passed
@wilwade wilwade deleted the bug/add-null_pages-and-boundary_order branch July 13, 2023 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing column index information in generated parquet file
5 participants