PARQUET-545: Improve API to support decimal type#65
PARQUET-545: Improve API to support decimal type#65majetideepak wants to merge 15 commits intoapache:masterfrom
Conversation
bff1f4b to
844a3ab
Compare
|
@wesm I don't see much to do for |
| TEST(FLBAEncodeDecode, TestEncodeDecode) { | ||
| schema::NodePtr node; | ||
| node = schema::PrimitiveNode::MakeFLBA("name", Repetition::OPTIONAL, | ||
| flba_length, LogicalType::UTF8); |
There was a problem hiding this comment.
We should have a single API to create Primitive nodes.
|
I just looked at |
|
I commented on https://issues.apache.org/jira/browse/PARQUET-545. Impala has a lot of decimal-related code, but it is mostly used for computations in the runtime. I'm not sure if having container types for decimal data (and stuff like coercing to/from double -- that would be useful IMHO) would be helpful overall. Being able to print decimals would be nice. It's a can of worms though |
|
@wesm I agree. I just commented on https://issues.apache.org/jira/browse/PARQUET-545 too. In this patch, I want to extend the ColumnDescriptor API to be able to extract the |
src/parquet/schema/descriptor.h
Outdated
|
|
||
| bool is_required() const { | ||
| return max_definition_level_ == 0; | ||
| return primitive_node_->is_required(); |
There was a problem hiding this comment.
We should be using the Type values set in the PrimitiveNode.
src/parquet/schema/descriptor.h
Outdated
| bool is_repeated() const { | ||
| return max_repetition_level_ > 0; | ||
| return primitive_node_->is_repeated(); | ||
| } |
There was a problem hiding this comment.
These changes have fundamentally changed the nature of their results -- they can't be used like they were in the scanner any more, because you could have required types with definition level > 0:
optional group bag
repeated group list
required int32 item
I suggest using the max repetition/definition levels in the scanner to avoid this issue
There was a problem hiding this comment.
makes sense. I did not think about the groups. Will revert this.
There was a problem hiding this comment.
Since this didn't cause test failures, it may be worth trying to add a failing test case
There was a problem hiding this comment.
We don't have any groups tests yet. This code works for all flat schemas.
There was a problem hiding this comment.
I will add some GroupNode tests in the descriptor tests
e7b7ad2 to
0f0e559
Compare
| ASSERT_THROW(descr_.Init(node), ParquetException); | ||
| } | ||
|
|
||
| TEST_F(TestSchemaDescriptor, DescriptorRepetitionValues) { |
There was a problem hiding this comment.
Tests for repetition values of schemas with GroupNodes
|
The changes are complete. I will wait for the test coverage report. |
|
Cool I can review in a little bit |
| reinterpret_cast<TypedScanner<FLBAType::type_num>* >(scanner_.get()); | ||
| ASSERT_EQ(10, scanner->descr()->type_precision()); | ||
| ASSERT_EQ(2, scanner->descr()->type_scale()); | ||
| ASSERT_EQ(FLBA_LENGTH, scanner->descr()->type_length()); |
There was a problem hiding this comment.
Test for ColumnDescriptor API extensions in this patch.
There was a problem hiding this comment.
These should probably go in their own test case.
|
Looks like we're blocked on Travis builds again for the next few hours. Let me know on these relatively minor issues and I'll look again a bit later. I'm going to close out the remaining clean-up patches as quickly as I can |
|
Also, the PR title needs to be changed to |
|
I am making these changes now. |
|
+1, thank you! |
|
@wesm Thanks! |
|
Yes, that would be a good idea -- I only tested the lowest tier of encoding. Can you open a JIRA? |
|
This issue title needs to start with |
|
+1 |
This PR also