You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
There might be a potential issue with syncPartitions flow in HiveSyncTool. Basically we get the isDropPartition based on the latest commit metadata here and later on use this flag to delete all the partitions coming in the variable partitionStoragePartitions here. Consider the scenario where the below actions are done in subsequent commits without syncing to hive metastore -
upsert
drop_partition
This case would result in dropping all the partitions affected in both the above commits which is not the desired behaviour.
Expected behavior
Partitions to drop should be picked by checking individual commit's metadata since last sync with metastore and not by checking only the latest commit metadata.
The text was updated successfully, but these errors were encountered:
I think you are right, and I found a new problem, I found that the active timeline is used here to find the changed partitions, if archive happens before sync, will there be some omissions?
@pratyakshsharma@Zouxxyy This is an issue which should be fixed. Thanks for pointing it out. Created HUDI-4832 to track this. We will get this fixed in upcoming release (0.12.1).
Tips before filing an issue
Have you gone through our FAQs? yes
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
There might be a potential issue with syncPartitions flow in HiveSyncTool. Basically we get the isDropPartition based on the latest commit metadata here and later on use this flag to delete all the partitions coming in the variable partitionStoragePartitions here. Consider the scenario where the below actions are done in subsequent commits without syncing to hive metastore -
This case would result in dropping all the partitions affected in both the above commits which is not the desired behaviour.
Expected behavior
Partitions to drop should be picked by checking individual commit's metadata since last sync with metastore and not by checking only the latest commit metadata.
The text was updated successfully, but these errors were encountered: