PARQUET-84: Avoid reading rowgroup metadata in memory on the client side. by julienledem · Pull Request #45 · apache/parquet-java

julienledem · 2014-08-27T18:56:00Z

This will improve reading big datasets with a large schema (thousands of columns)
Instead rowgroup metadata can be read in the tasks where each tasks reads only the metadata of the file it's reading

tsdeng · 2014-08-29T00:05:00Z

fix comment: [ startOffset, endOffset )

tsdeng · 2014-09-03T21:16:28Z

comment endOffset )

tsdeng · 2014-09-04T18:39:23Z

LGTM!

julienledem · 2014-09-05T03:39:04Z

@tsdeng and the build is green!

tsdeng reviewed Aug 29, 2014
View reviewed changes

julienledem mentioned this pull request Aug 30, 2014

PARQUET-79: add a streaming Thrift API, to enable processing the metadata as we read it and skipping unnecessary fields. apache/parquet-format#8

Closed

julienledem changed the title ~~Avoid reading rowgroup metadata in memory on the client side.~~ PARQUET-84: Avoid reading rowgroup metadata in memory on the client side. Sep 3, 2014

julienledem added 9 commits September 2, 2014 17:32

first stab at integrating skipping row groups

9bb8059

implement task side metadata

4d16df3

cleanup

ab95a45

fix read summary

3da37d8

cleanup readFooters methods

2c20b46

fix backward compatibility check

fb11f02

review feedback

f599259

sdd unit tests

323d254

more tests

5b6bd1b

julienledem force-pushed the skip_reading_row_groups branch from 7957a92 to 5b6bd1b Compare September 3, 2014 00:32

tsdeng reviewed Sep 3, 2014
View reviewed changes

adress review feedback

3d7e35a

julienledem added 2 commits September 4, 2014 13:59

Merge branch 'master' into skip_reading_row_groups

24a2050

fix parquet-hive

ccdd08c

asfgit closed this in 5dafd12 Sep 5, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PARQUET-84: Avoid reading rowgroup metadata in memory on the client side.#45

PARQUET-84: Avoid reading rowgroup metadata in memory on the client side.#45
julienledem wants to merge 12 commits intoapache:masterfrom
julienledem:skip_reading_row_groups

julienledem commented Aug 27, 2014

Uh oh!

tsdeng Aug 29, 2014

Uh oh!

julienledem Aug 29, 2014

Uh oh!

tsdeng Sep 3, 2014

Uh oh!

tsdeng commented Sep 4, 2014

Uh oh!

julienledem commented Sep 5, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

julienledem commented Aug 27, 2014

Uh oh!

tsdeng Aug 29, 2014

Choose a reason for hiding this comment

Uh oh!

julienledem Aug 29, 2014

Choose a reason for hiding this comment

Uh oh!

tsdeng Sep 3, 2014

Choose a reason for hiding this comment

Uh oh!

tsdeng commented Sep 4, 2014

Uh oh!

julienledem commented Sep 5, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants