[HUDI-6474] Added support for reading tables evolved using comprehensive schema evolution on Flink#9133
Conversation
…ive schema evolution on Flink Co-authored-by: hbgstc123 <hbgstc123@gmail.com>
|
@danny0405 Can you please help to review this? Background:
This PR is basically supplementing this feature to reduce the feature gap between what Spark<>Flink can support. Thanks |
| default: | ||
| } | ||
| return null; | ||
| throw new IllegalArgumentException(String.format("Unsupported conversion for %s => %s", fromType, toType)); |
There was a problem hiding this comment.
Do not throw RuntimeException in nested calling code path, it is very obscure for the invoker to get the perception of exceptions. Either throws a checked exception or return null as of before.
There was a problem hiding this comment.
Modified them to be checked exceptions, not sure if i did it correctly, please take a look, thank you.
| Object[] objects = new Object[array.size()]; | ||
| for (int i = 0; i < array.size(); i++) { | ||
| Object fromObject = ArrayData.createElementGetter(fromType).getElementOrNull(array, i); | ||
| // need to handle nulls to prevent NullPointerException in #getConversion() |
There was a problem hiding this comment.
No need to create element getter for each step of the for-loop?
| final Map<Object, Object> result = new HashMap<>(); | ||
| for (int i = 0; i < map.size(); i++) { | ||
| Object keyObject = ArrayData.createElementGetter(keyType).getElementOrNull(map.keyArray(), i); | ||
| Object fromObject = ArrayData.createElementGetter(fromValueType).getElementOrNull(map.valueArray(), i); |
| // note: InternalSchema.merge guarantees that the schema to be read fromType is orientated in the same order as toType | ||
| // hence, we can match types by position as it is guaranteed that it is referencing the same field | ||
| List<LogicalType> fromChildren = fromType.getChildren(); | ||
| List<LogicalType> toChildren = toType.getChildren(); |
| GenericRowData rowData = new GenericRowData(toType.getChildren().size()); | ||
| for (int i = 0; i < toChildren.size(); i++) { | ||
| Object fromVal = RowData.createFieldGetter(fromChildren.get(i), i).getFieldOrNull(row); | ||
| Object toVal; |
There was a problem hiding this comment.
caution for the performance, because you are constructing the field getter for each row->row conversion.
There was a problem hiding this comment.
Any suggestions on how we can work around this? Don't think this is avoidable as of now.
There was a problem hiding this comment.
Maybe add a wrapper class like what we to in RowDataToAvroConverters for row data and avro conversion.
There was a problem hiding this comment.
Done, please take a look.
I also added a check in the testCastNestedRow case in TestCastMap to test INT -> INT NOT NULL conversions as I noticed we are not handling such cases in the past.
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/CastMap.java
Outdated
Show resolved
Hide resolved
|
We might need to update the checkstyle plugin from 3.0.0 to 3.1.0 due to this bug: https://issues.apache.org/jira/browse/MCHECKSTYLE-347 I will submit a PR for this. |
|
@danny0405 Can you please help to take a look at this, thank you! |
|
6474.patch.zip |
@danny0405 Done! |
…volution on Flink
PR is co-authored by @hbgstc123.
The current Hudi comprehensive Schema evolution for Flink Reads do not support complex types for:
This PR is basically supplementing this feature to reduce the feature gap between what Spark<>Flink can support.
Change Logs
Added support for reading tables that were evolved using Hudi's comprehensive/full schema evolution
Impact
None
Risk level (write none, low medium or high below)
None
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist