[SUPPORT] "Failed to get update last commit time synced to 20200804071144" #1909

mingujotemp · 2020-08-04T07:19:36Z

Describe the problem you faced

HUDI 0.5.0 (using on EMR)

I encounter org.apache.hudi.hive.HoodieHiveSyncException: Failed to get update last commit time synced to 20200804071144 when I try to write a non-partitioned table on Glue(S3) using HUDI.

To Reproduce

Steps to reproduce the behavior:

create a pyspark dataframe
Write a new df by runnning with the following options

hudi_options = {
  'hoodie.table.name': tableName,
  'hoodie.datasource.write.recordkey.field': 'id',
  'hoodie.index.type': 'BLOOM',
  'hoodie.datasource.write.partitionpath.field': '',
  'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.NonpartitionedKeyGenerator',
  'hoodie.datasource.write.table.name': tableName,
  'hoodie.datasource.write.operation': 'upsert',
  'hoodie.datasource.write.precombine.field': 'updated_at',
  'hoodie.upsert.shuffle.parallelism': 2, 
  'hoodie.insert.shuffle.parallelism': 2,
  'hoodie.bulkinsert.shuffle.parallelism': 10,
  'hoodie.datasource.hive_sync.database': databaseName,
  'hoodie.datasource.hive_sync.table': tableName,
  'hoodie.datasource.hive_sync.enable': 'true',
  'hoodie.datasource.hive_sync.assume_date_partitioning': 'false',
  'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.NonPartitionedExtractor',
  'hoodie.datasource.hive_sync.partition_fields': '',
}
df.write.format("org.apache.hudi"). \
  options(**hudi_options). \
  mode("overwrite"). \
  save(basePath)

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Hudi version : 0.5.0
Spark version : 2.4.4
Hive version : 3.1.2 (Using Glue)
Hadoop version : 3.2.1-amzn-0
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no

Additional context

using the following jars
hudi-spark-bundle-0.5.0-incubating-amzn-1.jar
hudi-hive-bundle-0.5.0-incubating-amzn-1.jar
hudi-hadoop-mr-bundle-0.5.0-incubating-amzn-1.jar
spark-avro_2.12-2.4.4.jar
installed on EMR 6.0.0

Stacktrace

20/08/04 07:11:50 WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist
Traceback (most recent call last):
  File "<stdin>", line 5, in <module>
  File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 738, in save
    self._jwrite.save(path)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o273.save.
: org.apache.hudi.hive.HoodieHiveSyncException: Failed to get update last commit time synced to 20200804071144
	at org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:667)
	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:109)
	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:67)
	at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:236)
	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:156)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Can not create a Path from an empty string
	at org.apache.hadoop.fs.Path.checkPathArg(Path.java:172)
	at org.apache.hadoop.fs.Path.<init>(Path.java:184)
	at org.apache.hadoop.hive.metastore.Warehouse.getDatabasePath(Warehouse.java:172)
	at org.apache.hadoop.hive.metastore.Warehouse.getTablePath(Warehouse.java:184)
	at org.apache.hadoop.hive.metastore.Warehouse.getFileStatusesForUnpartitionedTable(Warehouse.java:520)
	at org.apache.hadoop.hive.metastore.MetaStoreUtils.updateUnpartitionedTableStatsFast(MetaStoreUtils.java:180)
	at com.amazonaws.glue.shims.AwsGlueSparkHiveShims.updateTableStatsFast(AwsGlueSparkHiveShims.java:75)
	at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.alterTable(GlueMetastoreClientDelegate.java:538)
	at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table(AWSCatalogMetastoreClient.java:374)
	at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table(AWSCatalogMetastoreClient.java:359)
	at org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:665)
	... 35 more

The text was updated successfully, but these errors were encountered:

bvaradar · 2020-08-04T15:01:18Z

Traceback (most recent call last):
  File "<stdin>", line 5, in <module>
  File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 738, in save
    self._jwrite.save(path)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o273.save.

It looks like hive-conf is not set correctly.

mingujotemp · 2020-08-05T02:14:36Z

@bvaradar could you elaborate more? which part on hive-conf are you describing? is it hive-conf.xml on emr or hive configuration for hudi?

bvaradar · 2020-08-06T04:26:04Z

It appears like hive-site.xml may not be set correctly. Hive metastore client is not able to find hive.server2.thrift.url from config.

bvaradar · 2020-08-18T12:34:08Z

Closing this ticket. Please reopen if this issue is specific to Hudi

ismailsimsek · 2021-04-14T13:52:23Z

probably related to #2797 (comment)

bvaradar closed this as completed Aug 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] "Failed to get update last commit time synced to 20200804071144" #1909

[SUPPORT] "Failed to get update last commit time synced to 20200804071144" #1909

mingujotemp commented Aug 4, 2020 •

edited

bvaradar commented Aug 4, 2020

mingujotemp commented Aug 5, 2020

bvaradar commented Aug 6, 2020

bvaradar commented Aug 18, 2020

ismailsimsek commented Apr 14, 2021

[SUPPORT] "Failed to get update last commit time synced to 20200804071144" #1909

[SUPPORT] "Failed to get update last commit time synced to 20200804071144" #1909

Comments

mingujotemp commented Aug 4, 2020 • edited

bvaradar commented Aug 4, 2020

mingujotemp commented Aug 5, 2020

bvaradar commented Aug 6, 2020

bvaradar commented Aug 18, 2020

ismailsimsek commented Apr 14, 2021

mingujotemp commented Aug 4, 2020 •

edited