Skip to content
This repository was archived by the owner on May 10, 2024. It is now read-only.

PARQUET-1193: [CPP] Implement ColumnOrder to support min_value and max_value#430

Closed
majetideepak wants to merge 6 commits intoapache:masterfrom
majetideepak:PARQUET-1193
Closed

PARQUET-1193: [CPP] Implement ColumnOrder to support min_value and max_value#430
majetideepak wants to merge 6 commits intoapache:masterfrom
majetideepak:PARQUET-1193

Conversation

@majetideepak
Copy link
Copy Markdown

@majetideepak majetideepak commented Jan 17, 2018

Changes:

  1. Update parquet.thrift format
  2. Add ColumnOrder Implementation
  3. Make Int96 sort order UNKNOWN

auto column_chunk5 = ColumnChunkMetaData::Make(
reinterpret_cast<const uint8_t*>(&col_chunk), schema.Column(4), &version);
ASSERT_TRUE(column_chunk5->is_stats_set());
ASSERT_FALSE(column_chunk5->is_stats_set());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this changed needed here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column_chunk5 is defined for LogicalType::INTERVAL, which has sort order UNKNOWN.
I modified this PR to not set stats for all UNKNOWN sort orders.

};

class ColumnOrder {
public:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From where stems the need to have a full class for the column order? From looking at the code, it seems to me like using the enum should be sufficient.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, enum is sufficient for this PR. I added the full class for further extensions. The Java implementation did something similar:
https://github.com/apache/parquet-mr/pull/435/files#diff-37183c50b2eaab3ca048f9c5954e5b52R26

Copy link
Copy Markdown
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks for the comments. Everything makes sense!

@majetideepak majetideepak force-pushed the PARQUET-1193 branch 5 times, most recently from ff53935 to 7d4a888 Compare January 20, 2018 12:41
@xhochy xhochy closed this in 62de4b1 Jan 24, 2018
xhochy pushed a commit to xhochy/parquet-cpp that referenced this pull request Feb 11, 2018
…x_value

Changes:

1. Update parquet.thrift format
2. Add ColumnOrder Implementation
3. Make Int96 sort order UNKNOWN

Author: Deepak Majeti <deepak.majeti@hpe.com>

Closes apache#430 from majetideepak/PARQUET-1193 and squashes the following commits:

d31df36 [Deepak Majeti] Fix unused function Warning
4ed405f [Deepak Majeti] Add comments
dec58ca [Deepak Majeti] clang-format
2cd9f11 [Deepak Majeti] Make Int96 sort order UNKNOWN
ff41b3c [Deepak Majeti] Add ColumnOrder Implementation
6221cba [Deepak Majeti] Pull updated parquet.thrift format
@majetideepak majetideepak deleted the PARQUET-1193 branch February 28, 2018 17:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants