-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5759] Supports add column on mor table with log #7915
[HUDI-5759] Supports add column on mor table with log #7915
Conversation
Could you please add some uts? |
@alexeykudinkin @xiarixiaoyao @leesf Could you please help review this pr? |
@stream2000 Thanks for providing UT test case. As per our communication, an UT is added which would fail before but pass now. |
c520ca4
to
3609b74
Compare
Commits squashed. |
@@ -204,4 +204,48 @@ class TestUpdateTable extends HoodieSparkSqlTestBase { | |||
} | |||
}) | |||
} | |||
|
|||
test("Test Add Column and Update Table") { | |||
withTempDir { tmp => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qidian99
thanks for your contribution
I ran this UT directly in the master branch, expected to fail but finally succeeded
could you pls check your UT thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the timely reply. I changed the UT to manually set partition pruning to true.
@stream2000 and I both tested on master branch and the test will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qidian99 can you please paste the whole stacktrace? Would like to understand better what exactly is failing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you pasted the stacktrace failing when you query your data via server. Can you please paste the stacktrace of this particular test failing?
I want to better understand which operation is failing in this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qidian99 only non-partitioned tables has this problem?
3609b74
to
7f2456c
Compare
@qidian99 |
} else { | ||
fieldBuilder.noDefault() | ||
} | ||
|
||
fieldsAssembler.name(f.name).`type`(fieldAvroType).noDefault() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to remove this line now, right?
@@ -202,6 +202,13 @@ private[sql] object SchemaConverters { | |||
st.foreach { f => | |||
val fieldAvroType = | |||
toAvroType(f.dataType, f.nullable, f.name, childNameSpace) | |||
val fieldBuilder = fieldsAssembler.name(f.name).`type`(fieldAvroType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is actually borrowed from Spark, and we try to avoid any changes to such code to make sure we're not diverging from Spark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think i understand why you believe this is an appropriate fix for the issue you're observing:
- Spark's schemas don't have defaults at all
- In case Avro schema's field is nullable doesn't entail that it should have null as default value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what i understand so far the issue is not in the conversion, but in the fact that we're not handling schema evolution properly in HoodieAvroDataBlock
-- whenever we decode a record from an existing data block we should make sure that any nullable field has actually null as default value so that Avro reader is able to decode the data in case this particular field is not present
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @alexeykudinkin , we should not change the code of SchemaConverters.scala, this is the bug of logscanner
@@ -204,4 +204,48 @@ class TestUpdateTable extends HoodieSparkSqlTestBase { | |||
} | |||
}) | |||
} | |||
|
|||
test("Test Add Column and Update Table") { | |||
withTempDir { tmp => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qidian99 can you please paste the whole stacktrace? Would like to understand better what exactly is failing
Here's the stacktrace when I tried to add a column named
|
7f2456c
to
6b3cafb
Compare
6b3cafb
to
52ff32a
Compare
@qidian99 |
Hi, I have updated the cases so that we can reproduce the error in the master branch. The key change is that producing the log before adding the column |
@xiarixiaoyao Could we close this PR since #8026 is merged? |
yes, |
Change Logs
Hudi do not support add column on mor table with log when
extractPartitionValuesFromPartitionPath
is turned on.If partition pruning is enable, it will pruned out default values from avro schema, which is due to incorrect conversion logic in SchemaConverters::toAvroType. This PR fixed the issue by adding a default null value when converting StructType to Avro FieldType.
Thus, HoodieBaseRelation::convertToAvroSchema has the correct behavior and the mor table is queryable.
Impact
None.
Risk level (write none, low medium or high below)
Low.
Documentation Update
None.
Contributor's checklist