-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Hy,
I'm using Hudi CLI version 1.0; hudi version 0.11.0; Spark version 3.2.1-amzn-0 and Hive version 3.1.3-amzn-0.
After rolling back a table I was facing the issue described in #4747
Caused by: java.io.FileNotFoundException: No such file or directory 's3://...
Thereafter, following the recommendation on #4747, I deleted manually the metadata folder under s3://<table_path>/.hoodie/metadata, which solved the problem.
After upserting into the table, the metadata s3://<table_path>/.hoodie/metadata gets recreated. However, after querying the data via spark and beeline, it only returns the entries, which have been upserted in the last operation (~40M rows) and not any previous data (~2B rows). If i delete s3://<table_path>/.hoodie/metadata again, then both spark and beeline returns all the historical data and the newly inserted data.
I tried using hudi cli's metadata create command, but it fails with:
29990 [Spring Shell] ERROR org.apache.spark.SparkContext - Error initializing SparkContext.
java.lang.ClassCastException: org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterMetricsRequestProto cannot be cast to com.google.protobuf.Message
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy46.getClusterMetrics(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:271)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy47.getClusterMetrics(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:631)
at org.apache.spark.deploy.yarn.Client.$anonfun$submitApplication$1(Client.scala:181)
at org.apache.spark.internal.Logging.logInfo(Logging.scala:57)
at org.apache.spark.internal.Logging.logInfo$(Logging.scala:56)
at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:65)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:181)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:582)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at org.apache.hudi.cli.utils.SparkUtil.initJavaSparkConf(SparkUtil.java:117)
at org.apache.hudi.cli.commands.MetadataCommand.initJavaSparkContext(MetadataCommand.java:367)
at org.apache.hudi.cli.commands.MetadataCommand.create(MetadataCommand.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:216)
at org.springframework.shell.core.SimpleExecutionStrategy.invoke(SimpleExecutionStrategy.java:68)
at org.springframework.shell.core.SimpleExecutionStrategy.execute(SimpleExecutionStrategy.java:59)
at org.springframework.shell.core.AbstractShell.executeCommand(AbstractShell.java:134)
at org.springframework.shell.core.JLineShell.promptLoop(JLineShell.java:533)
at org.springframework.shell.core.JLineShell.run(JLineShell.java:179)
at java.lang.Thread.run(Thread.java:750)
is there a way of recreating the metadata table of an existing hudi table such that it will reference historical data as well?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status