New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't group files with different partition specs in BaseRewriteManifests #480
Comments
The downside is that this could potentially result in many small manifests. For example, we have a bucketed table with 10k buckets, which is additionally partitioned by date and hour. So for a week of data we'd have 10k * 24 * 7 = 1.68 million manifests. What we do with RewriteManifests today is combine files into groups of four buckets and ignore date and hour, which results in 10k/4 = 2.5k manifests for a week of data. Performance with file pruning is good even though the manifests have multiple partition specs. |
@bryanck I mean a partitioning strategy in general by partition spec, which seems to be the same in your case. What will happen in case of partition schema evolution? Manifests are supposed to have files written with the same partition spec:
|
I also raised a question similar to what you describe in #481. |
Right, I was thinking partition, not partition spec. In that case, then I agree it makes sense. |
I agree with this. A manifest can only be written for one partition spec, so this is definitely a bug! |
@aokolnychyi trying to understand the problem and thinking loud for solution. so, would the fix be to group manifests by partition spec before cluster by
Manifest files grouped by partition spec is done in One more interesting point, Not sure if my understanding is correct or not, So that is also part of the problem ? |
@manishmalhotrawork, I think you got the problem correctly. We need to take into account that there might be multiple partition specs in a table. We can get a map of partition specs from |
@manishmalhotrawork, do you want to submit a PR? |
@aokolnychyi yes, let me submit PR. thanks. |
I think we should not combine files with potentially different partition specs into the same manifest in
BaseRewriteManifests
as we in do inMergingSnapshotProducer
.The text was updated successfully, but these errors were encountered: