-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed as not planned
Closed as not planned
Copy link
Labels
Description
- build sparksession
val spark = SparkSession
.builder()
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
.config("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog")
.config("spark.sql.catalog.spark_catalog.type", "hive")
.config(SQLConf.PARTITION_OVERWRITE_MODE.key, "dynamic")
.config("spark.hadoop.hive.metastore.uris", "thrift://ip:port")
.config("hive.exec.dynamic.partition", "true")
.config("hive.exec.dynamic.partition.mode", "nonstrict")
.appName("test-iceberg")
.master("local[*]")
.enableHiveSupport()
.getOrCreate()
- use hive catalog
val v2catlog = spark.sessionState.catalogManager.v2SessionCatalog.asInstanceOf[SparkSessionCatalog[SparkCatalog]]
v2catlog.createTable(id, schema, Spark3Util.toTransforms(spec.build()), immutableMap)
df
.writeStream()
.outputMode("append")
.format("iceberg")
.option("checkpointLocation", checkpointPath)
.option("path", locationPath)
error log:
Caused by: org.apache.hadoop.ipc.RemoteException: File does not exist: /path/metadata/version-hint.text
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
SparkSessionCatalog create table, that sync hive metastore. But report an error. Refer to the above
- use hadoop catalog
val tables = new HadoopTables(spark.sparkContext.hadoopConfiguration)
tables.create(schema, spec, sort, immutableMap, path)
df
.writeStream()
.outputMode("append")
.format("iceberg")
.option("checkpointLocation", checkpointPath)
.option("path", locationPath)
spark struct streaming works normally, but I need to synchronize the meta information to hive metastroe, Is there any way?