-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIXED] Presto cannot query hudi table #1329
Comments
@bhasudha can you please help out here |
@popart The stack trace you showed looks like somehow a partition metafile (".hoodie_partition_metadata") is created in the table path. If this file is present, Hudi tries to read the partition depth from this file by searchign for the key "partitionDepth" in that metafile. An exception "Could not find partitionDepth in partition metafile" is thrown in this case. Can you quickly check if this partition metafile is present in your table base path. If that is the case, we need to dig why that is created even though you chose a non partitioned table. Also, just wanted to check did you mean presto version 0.227 ? Also, from the stack trace ( that you posted specific to Presto ), it looks liek this is coming from earlier Hudi version 0.5.0-incubating. Can you also confirm if this is true for further debugging ? Do you mind creating a Jira issue with these details ? |
HI Bhavani! Thank you for taking a look. I filed https://issues.apache.org/jira/browse/HUDI-614. Correct the Presto version is .227. I tried running the spark-shell with both the Hudi 0.5.0 and the 0.5.1 jars, but got the same result. The EMR version has Hudi 0.5.0 installed, and I didn't specify anything different when running presto-cli, so I'd assume Presto is using the 0.5.0 version. I do see the .hoodie_partition_metadata file in my S3 table path. |
@bhasudha let's look at this more closely and confirm whats going on here? This stack trace indicates, just ipf.getSplits() being called.. and thus its general code. We do have tests around querying non-partitioned tables.. So need to reproduce this in docker setup or sth and go from therE? |
Update: This problem does not occur in the docker environment. In the docker demo env, I was able to create a non-partitioned table in Spark (saved to hdfs), use run_sync_tool.sh to sync it to hive, and then query it successfully from presto. (It still made the .hoodie_partition_metadata file though). |
I found the problem. We had client-side encryption configured for Spark & Hive using EMRFS, but not for Presto. |
Describe the problem you faced
I made a non-partitioned Hudi table using Spark. I was able to query it with Spark & Hive, but when I tried querying it with Presto, I received the error
Could not find partitionDepth in partition metafile
.To Reproduce
Steps to reproduce the behavior:
Expected behavior
Presto should return a count of all the rows. Other Presto queries should succeed.
Environment Description
EMR version: emr-5.28.0
Hudi version : 0.5.1-incubating, 0.5.0-incubating
Spark version : 2.4.4
Hive version : 2.3.6
Hadoop version : 2.8.5
Presto version: 0.227
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no
Stacktrace
Included in "Steps to reproduce".
Additional Info
When I used one of the columns as a partition column, I was able to query the table in Spark using
spark.read.format("org.apache.hudi").load(basePath + "/*")
. However, querying it in Hive resulted in:Querying it in presto-cli returned 0 rows.
The text was updated successfully, but these errors were encountered: