Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT]: Potential bug with HiveSyncTool #6578

Closed
pratyakshsharma opened this issue Sep 3, 2022 · 4 comments
Closed

[SUPPORT]: Potential bug with HiveSyncTool #6578

pratyakshsharma opened this issue Sep 3, 2022 · 4 comments
Assignees
Labels
meta-sync priority:critical production down; pipelines stalled; Need help asap.

Comments

@pratyakshsharma
Copy link
Contributor

Tips before filing an issue

  • Have you gone through our FAQs? yes

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

There might be a potential issue with syncPartitions flow in HiveSyncTool. Basically we get the isDropPartition based on the latest commit metadata here and later on use this flag to delete all the partitions coming in the variable partitionStoragePartitions here. Consider the scenario where the below actions are done in subsequent commits without syncing to hive metastore -

  • upsert
  • drop_partition

This case would result in dropping all the partitions affected in both the above commits which is not the desired behaviour.

Expected behavior

Partitions to drop should be picked by checking individual commit's metadata since last sync with metastore and not by checking only the latest commit metadata.

@pratyakshsharma
Copy link
Contributor Author

@nsivabalan @codope

Please confirm if I am missing anything here.

@Zouxxyy
Copy link
Contributor

Zouxxyy commented Sep 4, 2022

I think you are right, and I found a new problem, I found that the active timeline is used here to find the changed partitions, if archive happens before sync, will there be some omissions?

@nsivabalan nsivabalan added priority:critical production down; pipelines stalled; Need help asap. meta-sync labels Sep 4, 2022
@codope
Copy link
Member

codope commented Sep 12, 2022

@pratyakshsharma @Zouxxyy This is an issue which should be fixed. Thanks for pointing it out. Created HUDI-4832 to track this. We will get this fixed in upcoming release (0.12.1).

@codope
Copy link
Member

codope commented Sep 13, 2022

Folks, I'm closing the issue. I've triaged and fixed it in #6662
Can you please review the patch? Let's take the discussion there.

@codope codope closed this as completed Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta-sync priority:critical production down; pipelines stalled; Need help asap.
Projects
Archived in project
Development

No branches or pull requests

4 participants