Skip to content

Hudi table not queryable by SQL on Databricks Spark #15720

@hudi-bot

Description

@hudi-bot

Customer: I’ve tried this with 0.12.2 and still receive the same error. does the table format version also need to be updated? i.e. we’re writing with Hudi 0.11.1 using EMR but reading from Databricks using Hudi 0.12.2 and Spark 3.3.

 

What have been tried so far on 0.12.2:

!https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/274c@2x.png! SparkSQL

so just tried Spark SQL and doesn’t work (different issue)
SET hoodie.file.index.enable=false
select count() from validated_sales;
returns 0 count but no errors
2. !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/2705@2x.png! when running via pyspark
%python
df = spark.read.format('hudi')
.load('s3:///validated_sales/
//')
df.count()
all is good with 0.12.2 Hudi and Databricks 11.3 (spark 3.3).
3. !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/274c@2x.png! without the wildcard in pyspark
%python
df = spark.read.format('hudi')
.load('s3:///validated_sales')
df.count()
count = 0
4. !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/2705@2x.png! without wildcard but with recursive option set in pyspark
%python
df = spark.read.format('hudi')
.option("recursiveFileLookup","true")
.load('s3:///validated_sales')
df.count()
count = 250k 

JIRA info

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions