Skip to content

Spark structured streaming didnt work after upgrade from hudi 0.11 to 0.13 #16381

@hudi-bot

Description

@hudi-bot

We have Spark structured streaming job writing data in hudi format. After we made an upgrade from hudi 0.11.0 to hudi 0.13.0, the streaming app doesn't write data to existing hudi table. The streaming app started successfully, triggered listing job but didn't trigger any other job to compact, clean , write data , etc. No errors in Spark UI nor Stdout/Stderr logs. When running the streaming application to write to new s3 location (hudie table), everything works fine.  We use append output mode and 30 seconds trigger processing time. 

Here are hudi configurations used (confiscated some values with xxx): 

'hoodie.datasource.write.table.type': 'MERGE_ON_READ',
'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.CustomKeyGenerator',
'hoodie.datasource.write.precombine.field': 'xxx',
'hoodie.datasource.write.partitionpath.field': 'xxx:SIMPLE',
'hoodie.embed.timeline.server': False,
'hoodie.index.type': 'BLOOM',
'hoodie.parquet.compression.codec': 'snappy',
'hoodie.clean.async': True,
'hoodie.clean.max.commits': 5,
'hoodie.parquet.max.file.size': 125829120,
'hoodie.parquet.small.file.limit': 104857600,
'hoodie.parquet.block.size': 125829120,
'hoodie.metadata.enable': True,
'hoodie.metadata.validate': True,
'hoodie.datasource.write.hive_style_partitioning': True,
'hoodie.datasource.hive_sync.support_timestamp': True,
'hoodie.datasource.hive_sync.jdbcurl': "xxx",
'hoodie.datasource.hive_sync.username': 'xxx',
'hoodie.datasource.hive_sync.password': 'xxx',
'hoodie.datasource.hive_sync.partition_fields': 'xxx',
'hoodie.datasource.hive_sync.enable': True,
'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor',
'hoodie.avro.schema.external.transformation': True,
'hoodie.avro.schema.validate': True,
'hoodie.table.name', 'xxx'
'hoodie.datasource.write.table.name', 'xxx'
'hoodie.datasource.write.recordkey.field', 'xxx'
'hoodie.datasource.hive_sync.database', 'xxx'
'hoodie.datasource.hive_sync.table', 'xxx'
'hoodie.datasource.write.operation', 'upsert'

JIRA info

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions