Skip to content

Relax need of ordering/precombine field for tables with autogenerated record keys for DeltaStreamer #16346

@hudi-bot

Description

@hudi-bot

[https://github.com//issues/10233]

 

NOW=$(date '+%Y%m%dt%H%M%S')
${SPARK_HOME}/bin/spark-submit \
--jars ${path_prefix}/jars/${SPARK_V}/hudi-spark${SPARK_VERSION}-bundle_2.12-${HUDI_VERSION}.jar \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
${path_prefix}/jars/${SPARK_V}/hudi-utilities-slim-bundle_2.12-${HUDI_VERSION}.jar \
--target-base-path ${path_prefix}/testcases/stocks/data/target/${NOW} \
--target-table stocks${NOW} \
--table-type COPY_ON_WRITE \
--base-file-format PARQUET \
--props ${path_prefix}/testcases/stocks/configs/hoodie.properties \
--source-class org.apache.hudi.utilities.sources.JsonDFSSource \
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
--hoodie-conf hoodie.deltastreamer.schemaprovider.source.schema.file=${path_prefix}/testcases/stocks/data/schema_without_ts.avsc \
--hoodie-conf hoodie.deltastreamer.schemaprovider.target.schema.file=${path_prefix}/testcases/stocks/data/schema_without_ts.avsc \
--op UPSERT \
--spark-master yarn \
--hoodie-conf hoodie.deltastreamer.source.dfs.root=${path_prefix}/testcases/stocks/data/source_without_ts \
--hoodie-conf hoodie.datasource.write.partitionpath.field=date \
--hoodie-conf hoodie.datasource.write.keygenerator.type=SIMPLE \
--hoodie-conf hoodie.datasource.write.hive_style_partitioning=false \
--hoodie-conf hoodie.metadata.enable=true

JIRA info


Comments

29/May/24 22:44;shivnarayan;For auto record key gen, you need to set operation type to "INSERT". Can you give that a try. ;;;

Metadata

Metadata

Assignees

Labels

area:writerWrite client and core write operationsfrom-jirapriority:criticalProduction degraded; pipelines stalledtype:bugBug reports and fixes

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions