-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Hello,
I am following the steps here to load data from an existing HUDI table using spark-sql shell. https://hudi.apache.org/docs/0.11.0/quick-start-guide#create-table
Specifically, the section "Create Table for an existing Hudi Table" with the following tip:
You don't need to specify schema and any properties except the partitioned columns if existed. Hudi can automatically recognize the schema and configurations.
To Reproduce
Steps to reproduce the behavior:
- Begin spark-sql shell with spark-sql --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.0 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
- execute set hoodie.schema.on.read.enable=true;
- next run
create table if not exists myTable location 's3://uri/*/*/*'; - see error:
2023-08-18 22:13:20,417 [WARN] (main) org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog: A Hive serde table will be created as there is no table provider specified. You can set spark.sql.legacy.createHiveTableByDefault to false so that native data source table will be created instead.
2023-08-18 22:13:20,445 [WARN] (main) org.apache.hadoop.hive.ql.session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
Error in query: Unable to infer the schema. The schema specification is required to create the table `default`.`myTableThatDoesntExist`.
Based on the documentation, I would expect the above command to successfully create my table.
Further, if I specify my schema and properties like so:
create table table (
col1 string,
col2 string,
col3 string,
col4 double,
col5 string
) using hudi
tblproperties (
type = 'cow',
primaryKey = 'col1,col2,col3',
preCombineField = 'col4'
)
partitioned by (col4, col5)
location 's3://uri/*/*/*';
I receive this error:
Error in query: Specified schema in create table statement is not equal to the table schema.You should not specify the schema for an exist table:
I'm wondering what the exact steps are to load a table in the spark-sql shell with Hudi 11.0 on spark 3.2.1. Thank you.
Environment Description
-
Hudi version : 11.0
-
Spark version : 3.2.1
-
Hive version : 3.1.3
-
Hadoop version : 3.2.1
-
Storage (HDFS/S3/GCS..) : s3
-
Running on Docker? (yes/no) : no
Metadata
Metadata
Assignees
Labels
Type
Projects
Status