-
Notifications
You must be signed in to change notification settings - Fork 625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensor/group deletion perf fix + corruption proofing #1404
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1404 +/- ##
==========================================
- Coverage 92.28% 92.20% -0.08%
==========================================
Files 175 175
Lines 13826 13825 -1
==========================================
- Hits 12759 12748 -11
- Misses 1067 1077 +10
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add Jira ticket to this PR.
Add followup Jira to garbage collect dangling chunks.
hub/core/dataset/dataset.py
Outdated
@@ -369,16 +369,20 @@ def delete_tensor(self, name: str, large_ok: bool = False): | |||
) | |||
return | |||
|
|||
delete_tensor(name, self.storage, self.version_state) | |||
initial_autoflush = self.storage.autoflush |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to put this in a try-finally with self.storage.autoflush
reset in the finally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do that in other places too. Should be changed everywhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed everywhere.
hub/core/dataset/dataset.py
Outdated
tensors = [ | ||
posixpath.join(name, tensor) for tensor in self[name]._all_tensors_filtered | ||
] | ||
|
||
meta.groups = list(filter(lambda g: not g.startswith(name), meta.groups)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactor!
hub/core/dataset/dataset.py
Outdated
@@ -420,30 +424,22 @@ def delete_group(self, name: str, large_ok: bool = False): | |||
) | |||
return | |||
|
|||
initial_autoflush = self.storage.autoflush |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment re: resetting self.storage.autoflush
🚀 🚀 Pull Request
Checklist:
coverage-rate
upChanges