-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
What do the below two properties mean? What can be the possible values of these properties?
hoodie.deltastreamer.schemaprovider.registry.baseUrl
hoodie.deltastreamer.schemaprovider.registry.urlSuffix
EMR Version: emr-5.32.0
Hudi Version: 0.6.0
Spark Version: Spark 2.4.7
Spark submit command:
spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer --packages org.apache.spark:spark-avro_2.11:2.4.7 /usr/lib/hudi/hudi-utilities-bundle.jar --props s3://config-private-qa/datalake/hudi-properties/kafka-source.properties --config-folder s3://config-private-qa/datalake/hudi-properties/table_ingestion/ --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider --source-class org.apache.hudi.utilities.sources.AvroKafkaSource --source-ordering-field impresssiontime --base-path-prefix s3://aws-dms-qa/s3-raw-data-dms/icici/ --target-table icici --op BULK_INSERT --table-type COPY_ON_WRITE
Exception Raised:
Exception in thread "main" java.lang.IllegalArgumentException: Property hoodie.deltastreamer.schemaprovider.registry.baseUrl not found
at org.apache.hudi.common.config.TypedProperties.checkKey(TypedProperties.java:42)
at org.apache.hudi.common.config.TypedProperties.getString(TypedProperties.java:47)
at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.populateSchemaProviderProps(HoodieMultiTableDeltaStreamer.java:149)
at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.populateTableExecutionContextList(HoodieMultiTableDeltaStreamer.java:128)
at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.<init>(HoodieMultiTableDeltaStreamer.java:78)
at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.main(HoodieMultiTableDeltaStreamer.java:201)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
One more question in addition to the above query, in the multiple table ingestion, is it necessary to use kafka. In our usecase, we are not using kafka anywhere but still it is asking for kafka details while running.