Skip to content

Getting an Exception Property hoodie.deltastreamer.schemaprovider.registry.baseUrl not found #2829

@manishbol

Description

@manishbol

What do the below two properties mean? What can be the possible values of these properties?

hoodie.deltastreamer.schemaprovider.registry.baseUrl
hoodie.deltastreamer.schemaprovider.registry.urlSuffix
EMR Version: emr-5.32.0
Hudi Version: 0.6.0
Spark Version: Spark 2.4.7

Spark submit command:
spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer --packages org.apache.spark:spark-avro_2.11:2.4.7 /usr/lib/hudi/hudi-utilities-bundle.jar --props s3://config-private-qa/datalake/hudi-properties/kafka-source.properties --config-folder s3://config-private-qa/datalake/hudi-properties/table_ingestion/ --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider --source-class org.apache.hudi.utilities.sources.AvroKafkaSource --source-ordering-field impresssiontime --base-path-prefix s3://aws-dms-qa/s3-raw-data-dms/icici/ --target-table icici --op BULK_INSERT --table-type COPY_ON_WRITE

Exception Raised:

Exception in thread "main" java.lang.IllegalArgumentException: Property hoodie.deltastreamer.schemaprovider.registry.baseUrl not found
	at org.apache.hudi.common.config.TypedProperties.checkKey(TypedProperties.java:42)
	at org.apache.hudi.common.config.TypedProperties.getString(TypedProperties.java:47)
	at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.populateSchemaProviderProps(HoodieMultiTableDeltaStreamer.java:149)
	at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.populateTableExecutionContextList(HoodieMultiTableDeltaStreamer.java:128)
	at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.<init>(HoodieMultiTableDeltaStreamer.java:78)
	at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.main(HoodieMultiTableDeltaStreamer.java:201)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

One more question in addition to the above query, in the multiple table ingestion, is it necessary to use kafka. In our usecase, we are not using kafka anywhere but still it is asking for kafka details while running.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions