Skip to content

PARQUET-84: Avoid reading rowgroup metadata in memory on the client side.#45

Closed
julienledem wants to merge 12 commits intoapache:masterfrom
julienledem:skip_reading_row_groups
Closed

PARQUET-84: Avoid reading rowgroup metadata in memory on the client side.#45
julienledem wants to merge 12 commits intoapache:masterfrom
julienledem:skip_reading_row_groups

Conversation

@julienledem
Copy link
Copy Markdown
Member

This will improve reading big datasets with a large schema (thousands of columns)
Instead rowgroup metadata can be read in the tasks where each tasks reads only the metadata of the file it's reading

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix comment: [ startOffset, endOffset )

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

@julienledem julienledem changed the title Avoid reading rowgroup metadata in memory on the client side. PARQUET-84: Avoid reading rowgroup metadata in memory on the client side. Sep 3, 2014
@julienledem julienledem force-pushed the skip_reading_row_groups branch from 7957a92 to 5b6bd1b Compare September 3, 2014 00:32
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment endOffset )

@tsdeng
Copy link
Copy Markdown
Contributor

tsdeng commented Sep 4, 2014

LGTM!

@julienledem
Copy link
Copy Markdown
Member Author

@tsdeng and the build is green!

@asfgit asfgit closed this in 5dafd12 Sep 5, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants