Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] Hoodie table not found in path Unable to find a hudi table for the user provided paths. #2282

Closed
wosow opened this issue Nov 26, 2020 · 5 comments

Comments

@wosow
Copy link

wosow commented Nov 26, 2020

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

  • If you have triaged this as a bug, then file an issue directly.

An error occurred when I used Hudi-0.6.0 to integrate Spark-2.4.4 to write data to Hudi and synchronize Hive, as follows

20/11/26 14:22:51 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution/json.
20/11/26 14:22:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@bd93bc3{/SQL/execution/json,null,AVAILABLE,@spark}
20/11/26 14:22:51 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /static/sql.
20/11/26 14:22:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4e67cfe1{/static/sql,null,AVAILABLE,@spark}
20/11/26 14:22:52 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.228.86.12:42864) with ID 3
20/11/26 14:22:52 INFO state.StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
20/11/26 14:22:52 INFO storage.BlockManagerMasterEndpoint: Registering block manager lake03:40372 with 8.4 GB RAM, BlockManagerId(3, lake03, 40372, None)
20/11/26 14:22:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://nameservice], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, spark_hadoop_conf.xml, file:/opt/modules/spark-2.4.4/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1481461246_1, ugi=root (auth:SIMPLE)]]]
20/11/26 14:22:52 INFO hudi.DataSourceUtils: Getting table path..
20/11/26 14:22:52 INFO util.TablePathUtils: Getting table path from path : hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/837b6714-40b3-4a00-bcf5-97a6f33d2af7.parquet
Exception in thread "main" org.apache.hudi.exception.TableNotFoundException: Hoodie table not found in path Unable to find a hudi table for the user provided paths.
at org.apache.hudi.DataSourceUtils.getTablePath(DataSourceUtils.java:120)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:72)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:51)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at com.ws.hudi.wdt.cow.StockOutOrder$.stockOutOrderIncUpdate(StockOutOrder.scala:104)
at com.ws.hudi.wdt.cow.StockOutOrder$.main(StockOutOrder.scala:41)
at com.ws.hudi.wdt.cow.StockOutOrder.main(StockOutOrder.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/11/26 14:22:52 INFO spark.SparkContext: Invoking stop() from shutdown hook
20/11/26 14:22:52 INFO server.AbstractConnector: Stopped Spark@76b224cd{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/11/26 14:22:52 INFO ui.SparkUI: Stopped Spark web UI at http://lake03:4040
20/11/26 14:22:52 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
20/11/26 14:22:52 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
20/11/26 14:22:52 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
20/11/26 14:22:52 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
20/11/26 14:22:52 INFO cluster.YarnClientSchedulerBackend: Stopped
20/11/26 14:22:55 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/11/26 14:22:55 INFO memory.MemoryStore: MemoryStore cleared
20/11/26 14:22:55 INFO storage.BlockManager: BlockManager stopped
20/11/26 14:22:55 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
20/11/26 14:22:55 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/11/26 14:22:55 INFO spark.SparkContext: Successfully stopped SparkContext
20/11/26 14:22:55 INFO util.ShutdownHookManager: Shutdown hook called


Environment Description

  • Hudi version :
    hudi-0.6.0
  • Spark version :
    spark-2.4.4
  • Hive version :
    hive-2.3.1
  • Hadoop version :
    hadoop-2.7.5
  • Storage (HDFS/S3/GCS..) :
    HDFS
  • Running on Docker? (yes/no) :
    no
@wosow wosow changed the title [SUPPORT] [SUPPORT] Hoodie table not found in path Unable to find a hudi table for the user provided paths. Nov 26, 2020
@bvaradar
Copy link
Contributor

bvaradar commented Nov 30, 2020

It looks like the error is happening during loading the data at hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/837b6714-40b3-4a00-bcf5-97a6f33d2af7.parquet

Can you check if hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/ is a hudi table. Do you see hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/.hoodie folder ?

Can you list the entire folder hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125 and attach ?

@wosow
Copy link
Author

wosow commented Dec 8, 2020

the entire folder (hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125) as follows

Permission Owner Group Size Last Modified Replication Block Size Name
drwxr-xr-x root supergroup 0 B 2020/11/25 下午4:18:26 0 0 B .metadata
drwxr-xr-x root supergroup 0 B 2020/11/25 下午4:19:01 0 0 B .signals
-rw-r--r-- root supergroup 10.27 MB 2020/11/25 下午4:19:00 1 128 MB 231939a9-ebe4-4a2b-9338-badf75ee9f49.parquet

the question is that when i using 0.5.3 it is ok , 0.6.0 is not work

hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125 is the destination of sqoop import not the hudi table directory

@bvaradar
Copy link
Contributor

bvaradar commented Dec 8, 2020

@wosow : If this is a plain parquet dataset, you should be reading like spark.read.parquet("hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/*") and not use hudi format.

@wosow
Copy link
Author

wosow commented Dec 9, 2020

@wosow : If this is a plain parquet dataset, you should be reading like spark.read.parquet("hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/*") and not use hudi format.

thank you ,i will try

@bvaradar
Copy link
Contributor

@wosow : Please reopen if you are still stuck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants