Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] Flink cant read metafield '_hoodie_commit_time' #8371

Closed
LinMingQiang opened this issue Apr 3, 2023 · 9 comments
Closed

[SUPPORT] Flink cant read metafield '_hoodie_commit_time' #8371

LinMingQiang opened this issue Apr 3, 2023 · 9 comments
Assignees
Labels
flink Issues related to flink streaming

Comments

@LinMingQiang
Copy link
Contributor

LinMingQiang commented Apr 3, 2023

Tips before filing an issue
Flink : 1.15.1
Hudi : master

CREATE TABLE ITTestMetaField(
_hoodie_commit_time STRING,
id STRING PRIMARY KEY NOT ENFORCED
)
WITH (
'index.type'='BUCKET',
'payload.class'='org.apache.hudi.common.model.PartialUpdateAvroPayload',
'precombine.field'='ts',
'table.type' = 'MERGE_ON_READ'
)

Stacktrace

Caused by: java.lang.NullPointerException
	at org.apache.flink.formats.parquet.vector.reader.AbstractColumnReader.readToVector(AbstractColumnReader.java:160)
	at org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.nextBatch(ParquetColumnarRowSplitReader.java:312)
	at org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.ensureBatch(ParquetColumnarRowSplitReader.java:288)
	at org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.reachedEnd(ParquetColumnarRowSplitReader.java:267)
	at org.apache.hudi.table.format.ParquetSplitRecordIterator.hasNext(ParquetSplitRecordIterator.java:42)
	at org.apache.hudi.table.format.mor.MergeOnReadInputFormat$BaseFileOnlyFilteringIterator.hasNext(MergeOnReadInputFormat.java:563)
	at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.reachedEnd(MergeOnReadInputFormat.java:264)
	at org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:89)
	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67)
	at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:332)

There's no problem querying other metafields

@LinMingQiang
Copy link
Contributor Author

截屏2023-04-03 22 14 53

@LinMingQiang
Copy link
Contributor Author

截屏2023-04-03 22 00 36

@danny0405
Copy link
Contributor

Seems a bug, could you fire a PR and fix it?

@voonhous
Copy link
Member

voonhous commented Apr 5, 2023

We've encountered similar issues around this code recently.

we can't seem to reproduce your issue, is it possible to provide a minimal example of your table so i can trigger this bug locally to see if the issue we are encountering is the same?

i.e. possible to share your hudi table file?

@LinMingQiang
Copy link
Contributor Author

LinMingQiang commented Apr 5, 2023

id STRING PRIMARY KEY NOT ENFORCED,
name STRING,
age bigint,
ts string,
`par` STRING 
) PARTITIONED BY (`par`) 
 WITH (
'index.type'='BUCKET',
'payload.class'='org.apache.hudi.common.model.PartialUpdateAvroPayload',
'precombine.field'='ts',
'changelog.enabled'='false',
'compaction.delta_commits'='1',
'compaction.async.enabled'='true',
'write.tasks'='1',
'hoodie.bucket.index.num.buckets'='1',
'compaction.schedule.enable'='true',
'table.type' = 'MERGE_ON_READE',
'hoodie.datasource.write.hive_style_partitioning'='true',
'hive_sync.partition_extractor_class'='org.apache.hudi.hive.HiveStylePartitionValueExtractor',
'path' = 'file:///Users/hunter/workspace/hudipr/HUDI-6032/hudi-debug/hudi-debug-flink/target/HUDI_6032',
'connector' = 'hudi'
)


insert into HUDI_6032(id, name,age, ts, par)  values('id1','name1',1, 'ts1','par1'),('id1','name2',2, 'ts2','par1')


CREATE TABLE HUDI_6032(
_hoodie_commit_time STRING,
id STRING PRIMARY KEY NOT ENFORCED 
) PARTITIONED BY (id) 
 WITH (
...
)

@danny0405 danny0405 added flink Issues related to flink streaming labels Apr 13, 2023
@danny0405 danny0405 self-assigned this Apr 13, 2023
@Coco0201
Copy link

现在可以读取_hoodie_commit_time了么?flink1.13读取hudi0.13.1还是报错Caused by: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: SQL parse failed. Encountered "_hoodie_commit_time" @LinMingQiang

@danny0405
Copy link
Contributor

Did you declare the _hoodie_commit_time as a schema field in your table?

@Coco0201
Copy link

Did you declare the _hoodie_commit_time as a schema field in your table?

Yes.My java codes as follows:
tabEnv.executeSql(“create table cdc_hudi(_hoodie_commit_time string) with (...)”)

@Coco0201
Copy link

Did you declare the _hoodie_commit_time as a schema field in your table?

I found the comma which is in the DDL of my flink table was forgotten.So there is no problem while reading metafields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flink Issues related to flink streaming
Projects
Archived in project
Development

No branches or pull requests

4 participants