New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT]hudi[0.13.1] on flink[1.16.2], after bulk_insert & bucket_index, get int96 exception when flink trigger compaction #9804
Comments
INT96 is a legacy type type for timestamp(9) type that has precision greater than 6, did you check your table for timestamp data types? |
I did used decimal(24,12), if I insert the offline data by stream upsert, then where will be no exception |
I wonder if there is something wrong with the parquet schema selection used by bulk_insert |
The |
current now, I used the pom: groupId org.apache.hudi Did this patch really work in 0.13.1, or are we not describing the same problem? |
Hmm, the write works here, it is the offline compactor throws exception. It seems the The option can be set up with |
now I changed the pom to version-0.14.0, but how could I use this option(parquet.avro.readInt96AsFixed) in Flink online compaction? |
Did you check your |
It should be like this: |
here it is: |
Yeah, I checked the code for schema resolving, the Here is the code snippet: hudi/hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java Line 193 in b77286f
|
I debug runMerge() of BaseMergeHelper.java , the writer schema is timestamp-millis , but the reader schema is int96, the schema in hoodie.properties is timestamp-millis |
I guess the parquet file produced by bulk_insert contains int96, how could I do not write int96 but timestamp-millis |
INT96 comes from decimal type not timestamp(3). Can you remove the option |
never change schema, but if remove the option, will back to the exception java.lang.IllegalArgumentException: INT96 is deprecated. As interim enable READ_INT96_AS_FIXED flag to read as byte array, where is the debug code point? |
if I do not use bulk_insert, the compaction is well-done |
You first error strack trace indicates that you were doing an offline compaction job, and the exception is thrown when the |
I set : |
I return back to flink-1.15.4 and hudi-0.12.3 get new exception: |
I think timestamp(0) type is the culprit, in PR https://github.com/apache/hudi/pull/8418/files, we adapter to timestamp(3) and timestamp(6) but any precision other than that is written as INT96, which causes the error, so we need to support that separately. cc @voonhous , can you support that? |
I try try timestamp(3) |
does 0.12.3 has this pr? |
0.12.3 timestamp(3) worked!!!!!!!! |
[offline insert]
CREATE TABLE source_table(
id DECIMAL(20, 0),
company_id BIGINT,
shareholder_id STRING,
shareholder_entity_type SMALLINT,
shareholder_name_id BIGINT,
investment_ratio_total DECIMAL(24, 12),
is_controller SMALLINT,
is_ultimate SMALLINT,
is_big_shareholder SMALLINT,
is_controlling_shareholder SMALLINT,
equity_holding_path STRING,
create_time TIMESTAMP(0),
update_time TIMESTAMP(0),
is_deleted SMALLINT,
op_ts as CAST(NOW() AS TIMESTAMP(0)),
PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector' = 'jdbc',
'url' = 'jdbc:mysql://xx.xx.xx.xx:3306/xxxx',
'table-name' = 'xxxx',
'username' = 'xxxx',
'password' = 'xxxx',
'scan.partition.column' = 'id',
'scan.partition.num' = '40000',
'scan.partition.lower-bound' = '1',
'scan.partition.upper-bound' = '400000000',
'scan.fetch-size' = '1024'
);
create table ratio_path_company(
id DECIMAL(20, 0),
company_id BIGINT,
shareholder_id STRING,
shareholder_entity_type SMALLINT,
shareholder_name_id BIGINT,
investment_ratio_total DECIMAL(24, 12),
is_controller SMALLINT,
is_ultimate SMALLINT,
is_big_shareholder SMALLINT,
is_controlling_shareholder SMALLINT,
equity_holding_path STRING,
create_time TIMESTAMP(0),
update_time TIMESTAMP(0),
is_deleted SMALLINT,
op_ts TIMESTAMP(0),
PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector' = 'hudi',
'path' = 'obs://hadoop-obs/ods_hudi/ratio_path_company004',
'table.type' = 'MERGE_ON_READ',
-- cdc
'changelog.enabled' = 'true',
-- index
'index.type' = 'bucket',
'hoodie.bucket.index.num.buckets' = '256',
-- write
'write.operation' = 'bulk_insert',
'write.bulk_insert.shuffle_input' = 'false',
'write.bulk_insert.sort_input' = 'false',
'write.tasks' = '128',
'write.precombine' = 'true',
'precombine.field' = 'op_ts',
-- compaction
'compaction.schedule.enabled' = 'false',
'compaction.async.enabled' = 'false',
-- clean
'clean.async.enabled' = 'false'
);
insert into ratio_path_company select * from source_table;
[online insert]
CREATE TABLE source_table (
id DECIMAL(20, 0),
company_id BIGINT,
shareholder_id STRING,
shareholder_entity_type SMALLINT,
shareholder_name_id BIGINT,
investment_ratio_total DECIMAL(24, 12),
is_controller SMALLINT,
is_ultimate SMALLINT,
is_big_shareholder SMALLINT,
is_controlling_shareholder SMALLINT,
equity_holding_path STRING,
create_time TIMESTAMP(0),
update_time TIMESTAMP(0),
is_deleted SMALLINT,
op_ts TIMESTAMP(3) METADATA FROM 'value.ingestion-timestamp' VIRTUAL,
PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector' = 'kafka',
'topic' = 'e1d4c.json.prism_shareholder_path.ratio_path_company',
'properties.bootstrap.servers' = 'xxxx:9092',
'properties.group.id' = 'demo-job',
'scan.startup.mode' = 'earliest-offset',
-- canal
'format' = 'canal-json',
'canal-json.ignore-parse-errors' = 'true',
'canal-json.encode.decimal-as-plain-number' = 'true'
);
create table ratio_path_company(
id DECIMAL(20, 0),
company_id BIGINT,
shareholder_id STRING,
shareholder_entity_type SMALLINT,
shareholder_name_id BIGINT,
investment_ratio_total DECIMAL(24, 12),
is_controller SMALLINT,
is_ultimate SMALLINT,
is_big_shareholder SMALLINT,
is_controlling_shareholder SMALLINT,
equity_holding_path STRING,
create_time TIMESTAMP(0),
update_time TIMESTAMP(0),
is_deleted SMALLINT,
op_ts TIMESTAMP(0),
PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector' = 'hudi',
'path' = 'obs://hadoop-obs/ods_hudi/ratio_path_company004',
'table.type' = 'MERGE_ON_READ',
-- cdc
'changelog.enabled' = 'true',
-- index
'index.type' = 'bucket',
'hoodie.bucket.index.num.buckets' = '256',
-- write
'write.tasks' = '8',
'write.task.max.size' = '512',
'write.batch.size' = '12',
'write.merge.max_memory' = '28',
'write.log_block.size' = '128',
'write.precombine' = 'true',
'precombine.field' = 'op_ts',
-- compaction
'compaction.tasks' = '8',
'compaction.schedule.enabled' = 'true',
'compaction.async.enabled' = 'false',
'compaction.max_memory' = '128',
'compaction.delta_commits' = '3',
-- clean
'clean.async.enabled' = 'true',
'clean.policy' = 'KEEP_LATEST_BY_HOURS',
'clean.retain_hours' = '72'
);
insert into ratio_path_company select * from source_table;
got exception:
Caused by: java.lang.IllegalArgumentException: INT96 is deprecated. As interim enable READ_INT96_AS_FIXED flag to read as byte array.
at org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:316) ~[hudi-1.0.jar:?]
at org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:298) ~[hudi-1.0.jar:?]
at org.apache.parquet.schema.PrimitiveType$PrimitiveTypeName$7.convert(PrimitiveType.java:341) ~[hudi-1.0.jar:?]
at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:297) ~[hudi-1.0.jar:?]
at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:275) ~[hudi-1.0.jar:?]
at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:264) ~[hudi-1.0.jar:?]
at org.apache.hudi.common.table.TableSchemaResolver.convertParquetSchemaToAvro(TableSchemaResolver.java:293) ~[hudi-1.0.jar:?]
at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:116) ~[hudi-1.0.jar:?]
at org.apache.hudi.util.CompactionUtil.inferChangelogMode(CompactionUtil.java:145) ~[hudi-1.0.jar:?]
at org.apache.hudi.sink.compact.HoodieFlinkCompactor$AsyncCompactionService.(HoodieFlinkCompactor.java:180) ~[hudi-1.0.jar:?]
at org.apache.hudi.sink.compact.HoodieFlinkCompactor.main(HoodieFlinkCompactor.java:75) ~[hudi-1.0.jar:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_302]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_302]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_302]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_302]
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355) ~[hudi-1.0.jar:?]
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) ~[hudi-1.0.jar:?]
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:98) ~[hudi-1.0.jar:?]
at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:301) ~[hudi-1.0.jar:?]
The text was updated successfully, but these errors were encountered: