-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT] Invalid number of file groups for partition:column_stats #7657
Comments
I have came across the same problem using 0.12.0 version. I have set hoodie.metadata.index.bloom.filter.enable=false these configs to false and it helped me to bypass this error. |
Thanks for the work around @BalaMahesh. Will try this out. |
Hi @alexeykudinkin , seems the metadata table does not bootstrap correctly for column_stats partition, can you check whether this is fixed in master ? |
@nsivabalan can you please take a look into this one? |
there are some issues wrt some of the metadata configs. Some of the metadata table configs are not meant to be overridden. --hoodie-conf hoodie.metadata.compact.max.delta.commits=10 Please do not override any of these configs. I have created a follow up ticket to not expose these configs Can you restart metadata from scratch w/ right set of configs and let us know how it goes. |
@nsivabalan - wrt to metadata indexing, we can enable this as async job within delta streamer job with below config right ? hoodie.metadata.enable=true |
@BalaMahesh : nope. those async indexes does not run along w/ deltastreamer yet. deltastreame continous mode only supports async compaction and async clustering as of latest master. |
@BalaMahesh : any updates here please |
@nsivabalan - we disable metadata all together to avoid the issues. |
We need to try to reproduce this with enable metadata with master if its still the issue. |
I also faced this issues with hudi 0.13.1 |
@njalan Thanks. Can you please post your table configurations please. |
@ad1happy2go is it because I just upgraded from 0.7 to 0.13.1? I am using default config. |
@njalan That can be the cause. When did you hit this error? Is it while writing after upgrading to 0.13.1. |
@ad1happy2go I just enabled column stats after hudi upgrades. I just tested two tables and one is working fine and another is getting this error |
@njalan Sorry for the delay here. In case you remember can you let us know more about the issue with table which failed. |
TKS! |
@ocean-zhc @pushpavanthar We were able to reproduce this issue with 0.12.X and 0.13.X version. Thanks to Jessie. But with 0.14.X release, this issue is fixed. |
Problem:
When DeltaStreamer running in continuous mode is killed and resumed, below error is thrown.
Steps to reproduce the behavour:
Expected behavior
The job is supposed to resume when restarted without any problem.
Environment Description
Hudi version : 0.11.1
Spark version : 3.1.1
Hive version : 3.1.2
Hadoop version :
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : No
Let me know if you need more info.
Thanks in advance.
The text was updated successfully, but these errors were encountered: