Skip to content

PARQUET-1494: [C++] Recognize statistics built with UNSIGNED sort order by parquet-mr 1.10.0 onwards#3441

Closed
zilder wants to merge 2 commits intoapache:masterfrom
zilder:fix_parquet_1494
Closed

PARQUET-1494: [C++] Recognize statistics built with UNSIGNED sort order by parquet-mr 1.10.0 onwards#3441
zilder wants to merge 2 commits intoapache:masterfrom
zilder:fix_parquet_1494

Conversation

@zilder
Copy link
Contributor

@zilder zilder commented Jan 21, 2019

Fixes the issue with min-max statistics built by parquet-mr 1.10+:
https://issues.apache.org/jira/browse/PARQUET-1494

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be parquet-mr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, sorry

Copy link
Contributor

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: Can you add the file generated by parquet-mr to the data folder in the parquet-testing submodule https://github.com/apache/arrow/tree/master/cpp/submodules and add a test as well?

@zilder
Copy link
Contributor Author

zilder commented Jan 22, 2019

Added a test case.

@zilder
Copy link
Contributor Author

zilder commented Jan 22, 2019

Btw, thanks to @sumerman for helping to investigate the issue and @alkagin for providing the sample parquet file for the test case.

@majetideepak
Copy link
Contributor

@zilder can you amend your last commit and push again? I committed the binary file. Travis CI should pick that up now.

@zilder zilder force-pushed the fix_parquet_1494 branch 2 times, most recently from 27095da to 66b39cb Compare January 22, 2019 15:22
@majetideepak
Copy link
Contributor

@zilder looks like you need to run clang-format

/home/travis/build/apache/arrow/cpp/src/parquet/statistics-test.cc had clang-format style issues
--- /home/travis/build/apache/arrow/cpp/src/parquet/statistics-test.cc
+++ /home/travis/build/apache/arrow/cpp/src/parquet/statistics-test.cc (after clang format)
@@ -777,7 +777,8 @@
 TEST(TestStatisticsMinMax, Unsigned) {
   std::string dir_string(test::get_data_dir());
   std::stringstream ss;
-  ss << dir_string << "/" << "binary.parquet";
+  ss << dir_string << "/"
+     << "binary.parquet";

@zilder
Copy link
Contributor Author

zilder commented Jan 22, 2019

just did and pushed new version

Copy link
Contributor

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM. I will merge after appveyor finishes. Thanks for this contribution!

@wesm wesm force-pushed the fix_parquet_1494 branch from 66b39cb to c5ba5fc Compare January 23, 2019 00:25
@wesm
Copy link
Member

wesm commented Jan 23, 2019

@majetideepak has someone flipped your commit bit for arrow? Apologies if not, Jacques may have to do it

@wesm wesm changed the title PARQUET-1494: Recognize statistics built with UNSIGNED sort order by parquet-mr 1.10.0 onwards PARQUET-1494: [C++] Recognize statistics built with UNSIGNED sort order by parquet-mr 1.10.0 onwards Jan 23, 2019
@majetideepak
Copy link
Contributor

@wesm I do not have the commit bit for Arrow. I will send an email to Jacques. Can you take care of merging this? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants