Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix repeated key misses slowing down Version Controlled datasets #1458

Merged
merged 6 commits into from
Jan 27, 2022

Conversation

AbhinavTuli
Copy link
Contributor

🚀 🚀 Pull Request

Checklist:

  • My code follows the style guidelines of this project and the Contributing document
  • I have commented my code, particularly in hard-to-understand areas
  • I have kept the coverage-rate up
  • I have performed a self-review of my own code and resolved any problems
  • I have checked to ensure there aren't any other open Pull Requests for the same change
  • I have described and made corresponding changes to the relevant documentation
  • New and existing unit tests pass locally with my changes

Changes

Closes #1456
New commits were created without a CommitChunkSet (until any new data was added), leading to repeated key misses that slow down iteration. If any data was added to the commit, this wasn't a problem.

This PR changes the logic to create a new empty CommitChunkSet every time a new commit/branch is created, to avoid these repeated key misses. It also tries to fix existing datasets that don't have a CommitChunkSet in any of their commits by adding it (if write access is available).

@hoshimura
Copy link
Contributor

@AbhinavTuli it works! hope it gets released soon 👯

@AbhinavTuli AbhinavTuli merged commit 07ded2c into main Jan 27, 2022
@AbhinavTuli AbhinavTuli deleted the vc/fix branch January 27, 2022 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] 10x slowdown of dataset on s3 provider when using version control features
4 participants