-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Description:
We are attempting to test Deltastreamer on local files and apply some transformations before writing the data into the destination table. However, we keep encountering a SQL-related error as mentioned in the title.
I am running the HoodieDeltaStreamer with the SqlQueryBasedTransformer on Spark version 3.3.2 and Apache Hudi version 0.13.0. When running the Spark submit command, I am encountering the following error:
To Reproduce
Steps to reproduce the behavior:
- Download data from https://drive.google.com/drive/folders/1BwNEK649hErbsWcYLZhqCWnaXFX3mIsg and utility package from https://mvnrepository.com/artifact/org.apache.hudi/hudi-utilities-bundle_2.12/0.13.0
- create required folders
- running spark-submit command
spark-submit \ --conf spark.jars=/home/kumar/hudi-utilities-bundle_2.12-0.13.0.jar \ --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ --conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \ --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog \ --conf spark.sql.hive.convertMetastoreParquet=false \ --conf mapreduce.fileoutputcommitter.marksuccessfuljobs=false \ --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer /home/kumar/hudi-utilities-bundle_2.12-0.13.0.jar \ --enable-sync \ --source-ordering-field replicadmstimestamp \ --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \ --target-table invoice \ --target-base-path /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data \ --table-type COPY_ON_WRITE \ --op UPSERT \ --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator \ --hoodie-conf hoodie.datasource.write.recordkey.field=invoiceid \ --hoodie-conf hoodie.deltastreamer.source.dfs.root=file:////home/kumar/hudi_project/deltastreamer_file_transformer/sample_data_files \ --hoodie-conf hoodie.datasource.write.precombine.field=replicadmstimestamp \ --hoodie-conf hoodie.database.name=metastore \ --hoodie-conf hoodie.datasource.hive_sync.enable=true \ --hoodie-conf hoodie.datasource.hive_sync.table=invoice \ --transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \ --hoodie-conf hoodie.deltastreamer.transformer.sql="SELECT * ,extract(year from replicadmstimestamp) as year, extract(month from replicadmstimestamp) as month, extract(day from replicadmstimestamp) as day FROM <SRC> a;" \ --hoodie-conf hoodie.datasource.write.partitionpath.field=year,month,day \ --hoodie-conf hoodie.datasource.hive_sync.partition_fields=year,month,day \
Environment Description
-
Hudi version : 0.13
-
Spark version : 3.3.2
-
Hive version : 3.1.3
-
Hadoop version : 3.3.2
-
Storage (HDFS/S3/GCS..) : Local
-
Running on Docker? (yes/no) : no
Additional context
Added required jars in spark-defaults.conf file
spark.jars.packages io.delta:delta-core_2.12:2.3.0,org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0,org.apache.spark:spark-avro_2.12:3.3.2
Stacktrace
:: loading settings :: url = jar:file:/home/kumar/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/kumar/.ivy2/cache
The jars for the packages stored in: /home/kumar/.ivy2/jars
io.delta#delta-core_2.12 added as a dependency
org.apache.hudi#hudi-spark3.3-bundle_2.12 added as a dependency
org.apache.spark#spark-avro_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-91f79469-e32d-4f8c-8dc0-3be403ce410d;1.0
confs: [default]
found io.delta#delta-core_2.12;2.3.0 in central
found io.delta#delta-storage;2.3.0 in central
found org.antlr#antlr4-runtime;4.8 in central
found org.apache.hudi#hudi-spark3.3-bundle_2.12;0.13.0 in central
found org.apache.spark#spark-avro_2.12;3.3.2 in central
found org.tukaani#xz;1.9 in central
found org.spark-project.spark#unused;1.0.0 in central
:: resolution report :: resolve 616ms :: artifacts dl 29ms
:: modules in use:
io.delta#delta-core_2.12;2.3.0 from central in [default]
io.delta#delta-storage;2.3.0 from central in [default]
org.antlr#antlr4-runtime;4.8 from central in [default]
org.apache.hudi#hudi-spark3.3-bundle_2.12;0.13.0 from central in [default]
org.apache.spark#spark-avro_2.12;3.3.2 from central in [default]
org.spark-project.spark#unused;1.0.0 from central in [default]
org.tukaani#xz;1.9 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 7 | 0 | 0 | 0 || 7 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-91f79469-e32d-4f8c-8dc0-3be403ce410d
confs: [default]
0 artifacts copied, 7 already retrieved (0kB/23ms)
23/05/01 13:23:10 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/05/01 13:23:10 WARN SchedulerConfGenerator: Job Scheduling Configs will not be in effect as spark.scheduler.mode is not set to FAIR at instantiation time. Continuing without scheduling configs
23/05/01 13:23:10 INFO SparkContext: Running Spark version 3.3.2
23/05/01 13:23:10 INFO ResourceUtils: ==============================================================
23/05/01 13:23:10 INFO ResourceUtils: No custom resources configured for spark.driver.
23/05/01 13:23:10 INFO ResourceUtils: ==============================================================
23/05/01 13:23:10 INFO SparkContext: Submitted application: delta-streamer-invoice
23/05/01 13:23:11 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
23/05/01 13:23:11 INFO ResourceProfile: Limiting resource is cpu
23/05/01 13:23:11 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/05/01 13:23:11 INFO SecurityManager: Changing view acls to: kumar
23/05/01 13:23:11 INFO SecurityManager: Changing modify acls to: kumar
23/05/01 13:23:11 INFO SecurityManager: Changing view acls groups to:
23/05/01 13:23:11 INFO SecurityManager: Changing modify acls groups to:
23/05/01 13:23:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kumar); groups with view permissions: Set(); users with modify permissions: Set(kumar); groups with modify permissions: Set()
23/05/01 13:23:11 INFO deprecation: mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec
23/05/01 13:23:11 INFO deprecation: mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
23/05/01 13:23:11 INFO deprecation: mapred.output.compression.type is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type
23/05/01 13:23:11 INFO Utils: Successfully started service 'sparkDriver' on port 43729.
23/05/01 13:23:11 INFO SparkEnv: Registering MapOutputTracker
23/05/01 13:23:11 INFO SparkEnv: Registering BlockManagerMaster
23/05/01 13:23:12 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/05/01 13:23:12 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/05/01 13:23:12 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/05/01 13:23:12 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-227cf5db-3cef-4bff-9c4b-17f81aa8aa1b
23/05/01 13:23:12 INFO MemoryStore: MemoryStore started with capacity 366.3 MiB
23/05/01 13:23:12 INFO SparkEnv: Registering OutputCommitCoordinator
23/05/01 13:23:12 INFO Utils: Successfully started service 'SparkUI' on port 8090.
23/05/01 13:23:12 INFO SparkContext: Added JAR file:///home/kumar/hudi-utilities-bundle_2.12-0.13.0.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/hudi-utilities-bundle_2.12-0.13.0.jar with timestamp 1682947390924
23/05/01 13:23:12 INFO SparkContext: Added JAR file:///home/kumar/.ivy2/jars/io.delta_delta-core_2.12-2.3.0.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/io.delta_delta-core_2.12-2.3.0.jar with timestamp 1682947390924
23/05/01 13:23:12 INFO SparkContext: Added JAR file:///home/kumar/.ivy2/jars/org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.0.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.0.jar with timestamp 1682947390924
23/05/01 13:23:12 INFO SparkContext: Added JAR file:///home/kumar/.ivy2/jars/org.apache.spark_spark-avro_2.12-3.3.2.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/org.apache.spark_spark-avro_2.12-3.3.2.jar with timestamp 1682947390924
23/05/01 13:23:12 INFO SparkContext: Added JAR file:///home/kumar/.ivy2/jars/io.delta_delta-storage-2.3.0.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/io.delta_delta-storage-2.3.0.jar with timestamp 1682947390924
23/05/01 13:23:12 INFO SparkContext: Added JAR file:///home/kumar/.ivy2/jars/org.antlr_antlr4-runtime-4.8.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/org.antlr_antlr4-runtime-4.8.jar with timestamp 1682947390924
23/05/01 13:23:12 INFO SparkContext: Added JAR file:///home/kumar/.ivy2/jars/org.tukaani_xz-1.9.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/org.tukaani_xz-1.9.jar with timestamp 1682947390924
23/05/01 13:23:12 INFO SparkContext: Added JAR file:///home/kumar/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/org.spark-project.spark_unused-1.0.0.jar with timestamp 1682947390924
23/05/01 13:23:12 INFO SparkContext: The JAR file:/home/kumar/hudi-utilities-bundle_2.12-0.13.0.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/hudi-utilities-bundle_2.12-0.13.0.jar has been added already. Overwriting of added jar is not supported in the current version.
23/05/01 13:23:12 INFO Executor: Starting executor ID driver on host hudi-vm.internal.cloudapp.net
23/05/01 13:23:12 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): ''
23/05/01 13:23:12 INFO Executor: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/io.delta_delta-storage-2.3.0.jar with timestamp 1682947390924
23/05/01 13:23:12 INFO TransportClientFactory: Successfully created connection to hudi-vm.internal.cloudapp.net/10.1.0.4:43729 after 49 ms (0 ms spent in bootstraps)
23/05/01 13:23:12 INFO Utils: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/io.delta_delta-storage-2.3.0.jar to /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/fetchFileTemp6875321353190243857.tmp
23/05/01 13:23:12 INFO Executor: Adding file:/tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/io.delta_delta-storage-2.3.0.jar to class loader
23/05/01 13:23:12 INFO Executor: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.0.jar with timestamp 1682947390924
23/05/01 13:23:12 INFO Utils: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.0.jar to /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/fetchFileTemp7911131019630374440.tmp
23/05/01 13:23:14 INFO Executor: Adding file:/tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.0.jar to class loader
23/05/01 13:23:14 INFO Executor: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/hudi-utilities-bundle_2.12-0.13.0.jar with timestamp 1682947390924
23/05/01 13:23:14 INFO Utils: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/hudi-utilities-bundle_2.12-0.13.0.jar to /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/fetchFileTemp3321144361109611039.tmp
23/05/01 13:23:16 INFO Executor: Adding file:/tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/hudi-utilities-bundle_2.12-0.13.0.jar to class loader
23/05/01 13:23:16 INFO Executor: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/io.delta_delta-core_2.12-2.3.0.jar with timestamp 1682947390924
23/05/01 13:23:16 INFO Utils: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/io.delta_delta-core_2.12-2.3.0.jar to /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/fetchFileTemp738084884534076528.tmp
23/05/01 13:23:17 INFO Executor: Adding file:/tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/io.delta_delta-core_2.12-2.3.0.jar to class loader
23/05/01 13:23:17 INFO Executor: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.apache.spark_spark-avro_2.12-3.3.2.jar with timestamp 1682947390924
23/05/01 13:23:17 INFO Utils: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.apache.spark_spark-avro_2.12-3.3.2.jar to /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/fetchFileTemp6464200383284256273.tmp
23/05/01 13:23:17 INFO Executor: Adding file:/tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/org.apache.spark_spark-avro_2.12-3.3.2.jar to class loader
23/05/01 13:23:17 INFO Executor: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.antlr_antlr4-runtime-4.8.jar with timestamp 1682947390924
23/05/01 13:23:17 INFO Utils: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.antlr_antlr4-runtime-4.8.jar to /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/fetchFileTemp5710292961021659934.tmp
23/05/01 13:23:17 INFO Executor: Adding file:/tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/org.antlr_antlr4-runtime-4.8.jar to class loader
23/05/01 13:23:17 INFO Executor: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.tukaani_xz-1.9.jar with timestamp 1682947390924
23/05/01 13:23:17 INFO Utils: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.tukaani_xz-1.9.jar to /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/fetchFileTemp5943086360298064670.tmp
23/05/01 13:23:17 INFO Executor: Adding file:/tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/org.tukaani_xz-1.9.jar to class loader
23/05/01 13:23:17 INFO Executor: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.spark-project.spark_unused-1.0.0.jar with timestamp 1682947390924
23/05/01 13:23:17 INFO Utils: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.spark-project.spark_unused-1.0.0.jar to /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/fetchFileTemp2308988093976298818.tmp
23/05/01 13:23:17 INFO Executor: Adding file:/tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/org.spark-project.spark_unused-1.0.0.jar to class loader
23/05/01 13:23:17 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40465.
23/05/01 13:23:17 INFO NettyBlockTransferService: Server created on hudi-vm.internal.cloudapp.net:40465
23/05/01 13:23:17 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/05/01 13:23:17 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, hudi-vm.internal.cloudapp.net, 40465, None)
23/05/01 13:23:17 INFO BlockManagerMasterEndpoint: Registering block manager hudi-vm.internal.cloudapp.net:40465 with 366.3 MiB RAM, BlockManagerId(driver, hudi-vm.internal.cloudapp.net, 40465, None)
23/05/01 13:23:17 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, hudi-vm.internal.cloudapp.net, 40465, None)
23/05/01 13:23:17 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, hudi-vm.internal.cloudapp.net, 40465, None)
23/05/01 13:23:18 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
23/05/01 13:23:18 WARN DFSPropertiesConfiguration: Properties file file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
23/05/01 13:23:18 INFO UtilHelpers: Adding overridden properties to file properties.
23/05/01 13:23:18 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
23/05/01 13:23:18 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data
23/05/01 13:23:18 INFO HoodieTableConfig: Loading table properties from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data/.hoodie/hoodie.properties
23/05/01 13:23:18 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data
23/05/01 13:23:18 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
23/05/01 13:23:18 INFO SharedState: Warehouse path is 'file:/home/kumar/hudi_project/deltastreamer_file_transformer/spark-warehouse'.
23/05/01 13:23:21 INFO HoodieDeltaStreamer: Creating delta streamer with configs:
hoodie.auto.adjust.lock.configs: true
hoodie.database.name: metastore
hoodie.datasource.hive_sync.enable: true
hoodie.datasource.hive_sync.partition_fields: year,month,day
hoodie.datasource.hive_sync.table: invoice
hoodie.datasource.write.keygenerator.class: org.apache.hudi.keygen.ComplexKeyGenerator
hoodie.datasource.write.partitionpath.field: year,month,day
hoodie.datasource.write.precombine.field: replicadmstimestamp
hoodie.datasource.write.reconcile.schema: false
hoodie.datasource.write.recordkey.field: invoiceid
hoodie.deltastreamer.source.dfs.root: [file:////home/kumar/hudi_project/deltastreamer_file_transformer/sample_data_files](file://home/kumar/hudi_project/deltastreamer_file_transformer/sample_data_files)
hoodie.deltastreamer.transformer.sql: SELECT * ,extract(year from replicadmstimestamp) as year, extract(month from replicadmstimestamp) as month, extract(day from replicadmstimestamp) as day FROM <SRC> a;
23/05/01 13:23:21 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data
23/05/01 13:23:21 INFO HoodieTableConfig: Loading table properties from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data/.hoodie/hoodie.properties
23/05/01 13:23:21 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data
23/05/01 13:23:21 INFO HoodieActiveTimeline: Loaded instants upto : Optional.empty
23/05/01 13:23:21 INFO DFSPathSelector: Using path selector org.apache.hudi.utilities.sources.helpers.DFSPathSelector
23/05/01 13:23:21 INFO HoodieDeltaStreamer: Delta Streamer running only single round
23/05/01 13:23:21 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data
23/05/01 13:23:21 INFO HoodieTableConfig: Loading table properties from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data/.hoodie/hoodie.properties
23/05/01 13:23:21 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data
23/05/01 13:23:21 INFO HoodieActiveTimeline: Loaded instants upto : Optional.empty
23/05/01 13:23:21 INFO DeltaSync: Checkpoint to resume from : Optional.empty
23/05/01 13:23:21 INFO DFSPathSelector: Root path => [file:////home/kumar/hudi_project/deltastreamer_file_transformer/sample_data_files](file://home/kumar/hudi_project/deltastreamer_file_transformer/sample_data_files) source limit => 9223372036854775807
23/05/01 13:23:21 INFO InMemoryFileIndex: It took 64 ms to list leaf files for 2 paths.
23/05/01 13:23:23 INFO SparkContext: Starting job: parquet at ParquetDFSSource.java:55
23/05/01 13:23:23 INFO DAGScheduler: Got job 0 (parquet at ParquetDFSSource.java:55) with 1 output partitions
23/05/01 13:23:23 INFO DAGScheduler: Final stage: ResultStage 0 (parquet at ParquetDFSSource.java:55)
23/05/01 13:23:23 INFO DAGScheduler: Parents of final stage: List()
23/05/01 13:23:23 INFO DAGScheduler: Missing parents: List()
23/05/01 13:23:23 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at parquet at ParquetDFSSource.java:55), which has no missing parents
23/05/01 13:23:23 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 106.0 KiB, free 366.2 MiB)
23/05/01 13:23:23 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 38.2 KiB, free 366.2 MiB)
23/05/01 13:23:23 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on hudi-vm.internal.cloudapp.net:40465 (size: 38.2 KiB, free: 366.3 MiB)
23/05/01 13:23:23 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1513
23/05/01 13:23:24 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at parquet at ParquetDFSSource.java:55) (first 15 tasks are for partitions Vector(0))
23/05/01 13:23:24 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0
23/05/01 13:23:24 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (hudi-vm.internal.cloudapp.net, executor driver, partition 0, PROCESS_LOCAL, 4449 bytes) taskResourceAssignments Map()
23/05/01 13:23:24 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
23/05/01 13:23:25 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1606 bytes result sent to driver
23/05/01 13:23:25 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1696 ms on hudi-vm.internal.cloudapp.net (executor driver) (1/1)
23/05/01 13:23:25 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
23/05/01 13:23:25 INFO DAGScheduler: ResultStage 0 (parquet at ParquetDFSSource.java:55) finished in 2.676 s
23/05/01 13:23:25 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
23/05/01 13:23:25 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
23/05/01 13:23:25 INFO DAGScheduler: Job 0 finished: parquet at ParquetDFSSource.java:55, took 2.834363 s
23/05/01 13:23:26 INFO BlockManagerInfo: Removed broadcast_0_piece0 on hudi-vm.internal.cloudapp.net:40465 in memory (size: 38.2 KiB, free: 366.3 MiB)
23/05/01 13:23:30 INFO SqlQueryBasedTransformer: Registering tmp table : HOODIE_SRC_TMP_TABLE_a1962c9c_edde_45ce_88db_a73c519c000d
23/05/01 13:23:30 INFO DeltaSync: Shutting down embedded timeline server
23/05/01 13:23:30 INFO HoodieDeltaStreamer: Shut down delta streamer
23/05/01 13:23:30 INFO SparkUI: Stopped Spark web UI at http://hudi-vm.internal.cloudapp.net:8090/
23/05/01 13:23:30 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/05/01 13:23:31 INFO MemoryStore: MemoryStore cleared
23/05/01 13:23:31 INFO BlockManager: BlockManager stopped
23/05/01 13:23:31 INFO BlockManagerMaster: BlockManagerMaster stopped
23/05/01 13:23:31 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
23/05/01 13:23:31 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(Lscala/PartialFunction;)Lorg/apache/spark/sql/catalyst/plans/logical/LogicalPlan;
at org.apache.spark.sql.hudi.analysis.HoodiePruneFileSourcePartitions.apply(HoodiePruneFileSourcePartitions.scala:42)
at org.apache.spark.sql.hudi.analysis.HoodiePruneFileSourcePartitions.apply(HoodiePruneFileSourcePartitions.scala:40)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:91)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:126)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:122)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:118)
at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:136)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:154)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:151)
at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:204)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:249)
at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:218)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:92)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:89)
at org.apache.spark.sql.Dataset.withPlan(Dataset.scala:3887)
at org.apache.spark.sql.Dataset.createOrReplaceTempView(Dataset.scala:3455)
at org.apache.hudi.utilities.transform.SqlQueryBasedTransformer.apply(SqlQueryBasedTransformer.java:63)
at org.apache.hudi.utilities.transform.ChainedTransformer.apply(ChainedTransformer.java:50)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.lambda$fetchFromSource$0(DeltaSync.java:495)
at org.apache.hudi.common.util.Option.map(Option.java:108)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:495)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:460)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:364)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:206)
at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:204)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:573)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
23/05/01 13:23:31 INFO ShutdownHookManager: Shutdown hook called
23/05/01 13:23:31 INFO ShutdownHookManager: Deleting directory /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135
23/05/01 13:23:31 INFO ShutdownHookManager: Deleting directory /tmp/spark-7c0ada38-e349-47fb-bf8d-9f7c288f4e24```
Metadata
Metadata
Assignees
Labels
Type
Projects
Status