-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT] Exception on snapshot query on MOR table (hudi 0.6.0) #2285
Comments
but array doubleDictionaryContent only has 3000 elements, it caused the problem |
Wondering if parquet version has anything to do here. Can you check if the hadoop installation has 1.10.1 parquet bundles ? |
@bvaradar
we also tried to see if there is any difference with official jar on these amazon built jars JYI: by using read optimized query, we can have all values |
cc @umehrot2 : Wondering why there is parquet-hadoop-bundle-1.6.0.jar along with parquet-hadoop-1.10.1-spark-amzn-1.jar. Wouldn't they cause conflict ? |
@bvaradar |
@n3nash : Can you look at this ? @zherenyu831 : As the integration tests are passing with for compaction, I am suspecting this is still has to do with parquet version mismatch. Would it be possible to replicate this using docker setup : https://hudi.apache.org/docs/docker_demo.html ? |
I have the same sporadic issue, using standard Spark 2.4.7 distribution and Hudi 0.6:
the only workaround we found is to disable VectorizedReader:
|
@zherenyu831 : can you please respond with any updates on your end. |
cc @garyli1019 as well |
@bvaradar |
@n3nash : would you be able to help in here. |
@zherenyu831 : few quick questions as we triage the issue.
|
We faced the problem by using hudi 0.6.0, didn't tried with hudi 0.7.0
Not actually, because it only happens on reading while compaction of huge table. |
@vinothchandar @n3nash @bvaradar : One of the customer mentioned that disabling vectorized reader fixed the issue for them. Hope it should be fine? And, do we need to make a note of this in faq or somewhere? |
I see lot of general spark issues reported like this. making this |
@zherenyu831 : We made some fixes to spillablemap thats been used in compaction path and should help w/ large datasets. Can you try giving it a try. else, feel free to close it out if its not an issue anymore. thanks! |
Closing since the fix has since been landed |
Tips before filing an issue
Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
A clear and concise description of the problem.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Hudi version : 0.6.0
Spark version : 2.4.4
Hive version : not using
Hadoop version : 3.2.1
Storage (HDFS/S3/GCS..) : s3
Running on Docker? (yes/no) : no
Additional context
Add any other context about the problem here.
Stacktrace
The text was updated successfully, but these errors were encountered: