Skip to content

[SUPPORT] Recreate deleted metadata table #7533

@szingerpeter

Description

@szingerpeter

Hy,

I'm using Hudi CLI version 1.0; hudi version 0.11.0; Spark version 3.2.1-amzn-0 and Hive version 3.1.3-amzn-0.
After rolling back a table I was facing the issue described in #4747

Caused by: java.io.FileNotFoundException: No such file or directory 's3://...

Thereafter, following the recommendation on #4747, I deleted manually the metadata folder under s3://<table_path>/.hoodie/metadata, which solved the problem.

After upserting into the table, the metadata s3://<table_path>/.hoodie/metadata gets recreated. However, after querying the data via spark and beeline, it only returns the entries, which have been upserted in the last operation (~40M rows) and not any previous data (~2B rows). If i delete s3://<table_path>/.hoodie/metadata again, then both spark and beeline returns all the historical data and the newly inserted data.

I tried using hudi cli's metadata create command, but it fails with:

29990 [Spring Shell] ERROR org.apache.spark.SparkContext  - Error initializing SparkContext.
java.lang.ClassCastException: org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterMetricsRequestProto cannot be cast to com.google.protobuf.Message
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
        at com.sun.proxy.$Proxy46.getClusterMetrics(Unknown Source)
        at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:271)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy47.getClusterMetrics(Unknown Source)
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:631)
        at org.apache.spark.deploy.yarn.Client.$anonfun$submitApplication$1(Client.scala:181)
        at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
        at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
        at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:65)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:181)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:582)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at org.apache.hudi.cli.utils.SparkUtil.initJavaSparkConf(SparkUtil.java:117)
        at org.apache.hudi.cli.commands.MetadataCommand.initJavaSparkContext(MetadataCommand.java:367)
        at org.apache.hudi.cli.commands.MetadataCommand.create(MetadataCommand.java:128)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:216)
        at org.springframework.shell.core.SimpleExecutionStrategy.invoke(SimpleExecutionStrategy.java:68)
        at org.springframework.shell.core.SimpleExecutionStrategy.execute(SimpleExecutionStrategy.java:59)
        at org.springframework.shell.core.AbstractShell.executeCommand(AbstractShell.java:134)
        at org.springframework.shell.core.JLineShell.promptLoop(JLineShell.java:533)
        at org.springframework.shell.core.JLineShell.run(JLineShell.java:179)
        at java.lang.Thread.run(Thread.java:750)

is there a way of recreating the metadata table of an existing hudi table such that it will reference historical data as well?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions