-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Storage Cleaner] Add unsharding to storage cleaner #381
Conversation
I'll break this into 2 (core unsharding functionality and legacy checkpoint hacks). Unless you review it before then. |
ec97838
to
6067be4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one suggestion
scripts/storage_cleaner.py
Outdated
result = subprocess.run( | ||
["python", str(unsharding_config.unshard_script_path), sharding_input_dir, sharding_output_dir], | ||
check=False, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unshard.py
script is simple enough that I would be okay with copying that logic into here instead of shelling out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I avoided doing that in the fear that unshard.py
could change over time, but I guess now that the main logic is moved into checkpointing classes that should be less of a problem. I'll work on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a compromise, I 'modified' the unsharder so that I could call it directly (just renamed the unsharder's main
to unshard
), and then changed my code to call the unsharder directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm, python doesn't work that way so I'll do this as you said.
…rage-cleaner-unsharding-2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
At long last, this PR adds unsharding capabilities to the storage cleaner. Setting the unsharding destination to a cloud location is supported, but Google storage upload is not implemented at this time.
PR Train: