-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Customer: I’ve tried this with 0.12.2 and still receive the same error. does the table format version also need to be updated? i.e. we’re writing with Hudi 0.11.1 using EMR but reading from Databricks using Hudi 0.12.2 and Spark 3.3.
What have been tried so far on 0.12.2:
!https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/274c@2x.png! SparkSQL
so just tried Spark SQL and doesn’t work (different issue)
SET hoodie.file.index.enable=false
select count() from validated_sales;
returns 0 count but no errors
2. !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/2705@2x.png! when running via pyspark
%python
df = spark.read.format('hudi')
.load('s3:///validated_sales///')
df.count()
all is good with 0.12.2 Hudi and Databricks 11.3 (spark 3.3).
3. !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/274c@2x.png! without the wildcard in pyspark
%python
df = spark.read.format('hudi')
.load('s3:///validated_sales')
df.count()
count = 0
4. !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/2705@2x.png! without wildcard but with recursive option set in pyspark
%python
df = spark.read.format('hudi')
.option("recursiveFileLookup","true")
.load('s3:///validated_sales')
df.count()
count = 250k
JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-5609
- Type: Bug
- Fix version(s):
- 1.1.0