[SUPPORT] Can not create a Path from an empty string on unpartitioned table #2797

vansimonsen · 2021-04-09T18:09:16Z

Describe the problem you faced

Issue trying to create unpartitioned tables to hive metastore (in aws glue data catalog) using hudi (Tested on 0.6.0, 0.7.0 and 0.8.0 )
Using hudi on AWS EMR, with pyspark
Previous fix is implemented on new versions, but it continues failing
Hudi config for unpartitioned tables

hudiConfig = {
   "hoodie.datasource.write.precombine.field": <column>,
   "hoodie.datasource.write.recordkey.field": _PRIMARY_KEY_COLUMN,
   "hoodie.datasource.write.keygenerator.class": 'org.apache.hudi.keygen.NonpartitionedKeyGenerator',
   "hoodie.datasource.hive_sync.partition_extractor_class": 'org.apache.hudi.hive.NonPartitionedExtractor',
   "hoodie.datasource.write.hive_style_partitioning": "true",
   "className": "org.apache.hudi",
   "hoodie.datasource.hive_sync.use_jdbc": "false",
   "hoodie.consistency.check.enabled": "true",
   "hoodie.datasource.hive_sync.database": DB_NAME,
   "hoodie.datasource.hive_sync.enable": "true",
   "hoodie.datasource.hive_sync.support_timestamp": "true",
}

To Reproduce

Steps to reproduce the behavior:

Run hudi with hive integration
Try to create an unpartitioned table, with config previously specified

Expected behavior

The table would be created without throw the exception, without any partition or default partitionpath

Environment Description

Hudi version : 0.6.0, 0.7.0 and 0.8.0
Spark version : 2.4.7
Hive version : Aws glue data catalog integration on EMR
Hadoop version : Amazon Hadoop distribution
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no

Stacktrace

org.apache.hudi.hive.HoodieHiveSyncException: Failed to get update last commit time synced to 20210407181606
   at org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:496)
   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:150)
   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
   at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:355)
   at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:403)
   at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:399)
   at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
   at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:399)
   at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:460)
   at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:217)
   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
   at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
   at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173)
   at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169)
   at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197)
   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194)
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169)
   at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114)
   at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112)
   at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
   at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
   at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83)
   at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94)
   at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
   at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178)
   at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93)
   at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200)
   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92)
   at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
   at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291)
   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:498)
   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   at py4j.Gateway.invoke(Gateway.java:282)
   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   at py4j.commands.CallCommand.execute(CallCommand.java:79)
   at py4j.GatewayConnection.run(GatewayConnection.java:238)
   at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Can not create a Path from an empty string
   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:168)
   at org.apache.hadoop.fs.Path.<init>(Path.java:180)
   at org.apache.hadoop.hive.metastore.Warehouse.getDatabasePath(Warehouse.java:172)
   at org.apache.hadoop.hive.metastore.Warehouse.getTablePath(Warehouse.java:184)
   at org.apache.hadoop.hive.metastore.Warehouse.getFileStatusesForUnpartitionedTable(Warehouse.java:520)
   at org.apache.hadoop.hive.metastore.MetaStoreUtils.updateUnpartitionedTableStatsFast(MetaStoreUtils.java:180)
   at com.amazonaws.glue.shims.AwsGlueSparkHiveShims.updateTableStatsFast(AwsGlueSparkHiveShims.java:62)
   at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.alterTable(GlueMetastoreClientDelegate.java:552)
   at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table(AWSCatalogMetastoreClient.java:400)
   at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table(AWSCatalogMetastoreClient.java:385)
   at org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:494)
   ... 46 more

The text was updated successfully, but these errors were encountered:

aditiwari01 · 2021-04-10T11:46:47Z

Issue (#2801) might be a duplicate.

However while creating an unpartitioned table, my dataframe.write is getting succeeded but I am not able to query the data via hive. Although spark read are working fine for me though. (Testing via spark shell and I am using jdbc to connect to hive)

n3nash · 2021-04-13T05:56:00Z

@vansimonsen Can you check the issue that @aditiwari01 is pointing to and check if you are using the correct KeyGenerators as well as PartitionValueExtractor (check here -> https://hudi.apache.org/docs/configurations.html#HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY) ?

Additionally, this looks like the basePath might not have been correctly registered to Glue. Let me know after you check these configs, if they don't work, this may be a legit bug

ismailsimsek · 2021-04-14T13:49:11Z

its might be related to missing Glue database s3 path, the field is named "Amazon S3 path"(lakeformation) or "Location"(glue) in aws console

as far as i see at one point in code it is tryiong to construct a path like : getDatabasePath + tableName
in my case it was creating: s3://MyBucketMytable because of missing /. at the end of the database Location

n3nash · 2021-04-15T06:26:55Z

@ismailsimsek Are you saying it was fixed after you fixed the databasePath / location in your glue metastore to include / ? Is the / expected always at the end of the path ? If yes, we can probably put in that fix in hudi hive sync.

@vansimonsen Can you check if this is the root cause for you ?

n3nash · 2021-06-04T06:31:59Z

@ismailsimsek @vansimonsen Closing this due to inactivity, please re-open it or open a new one if you need further assistance.

pranotishanbhag · 2021-06-10T08:12:19Z

I am facing the same issue. Please can you share the fix. I am using Hudi version 0.8.

vansimonsen mentioned this issue Apr 9, 2021

[SUPPORT] java.lang.IllegalArgumentException: Can not create a Path from an empty string on non partitioned COW table #2294

Closed

nsivabalan added the awaiting-community-help label Apr 10, 2021

n3nash added awaiting-user-response meta-sync and removed awaiting-community-help labels Apr 13, 2021

ismailsimsek mentioned this issue Apr 14, 2021

[SUPPORT] "Failed to get update last commit time synced to 20200804071144" #1909

Closed

n3nash added this to In progress in GI Tracker Board Apr 22, 2021

n3nash closed this as completed Jun 4, 2021

GI Tracker Board automation moved this from In progress to Done Jun 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] Can not create a Path from an empty string on unpartitioned table #2797

[SUPPORT] Can not create a Path from an empty string on unpartitioned table #2797

vansimonsen commented Apr 9, 2021 •

edited

aditiwari01 commented Apr 10, 2021 •

edited

n3nash commented Apr 13, 2021

ismailsimsek commented Apr 14, 2021

n3nash commented Apr 15, 2021 •

edited

n3nash commented Jun 4, 2021

pranotishanbhag commented Jun 10, 2021

[SUPPORT] Can not create a Path from an empty string on unpartitioned table #2797

[SUPPORT] Can not create a Path from an empty string on unpartitioned table #2797

Comments

vansimonsen commented Apr 9, 2021 • edited

aditiwari01 commented Apr 10, 2021 • edited

n3nash commented Apr 13, 2021

ismailsimsek commented Apr 14, 2021

n3nash commented Apr 15, 2021 • edited

n3nash commented Jun 4, 2021

pranotishanbhag commented Jun 10, 2021

vansimonsen commented Apr 9, 2021 •

edited

aditiwari01 commented Apr 10, 2021 •

edited

n3nash commented Apr 15, 2021 •

edited