Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Storage Cleaner] Add unsharding to storage cleaner #381

Merged
merged 17 commits into from
Dec 11, 2023

Conversation

2015aroras
Copy link
Collaborator

@2015aroras 2015aroras commented Nov 22, 2023

@2015aroras
Copy link
Collaborator Author

I'll break this into 2 (core unsharding functionality and legacy checkpoint hacks). Unless you review it before then.

Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one suggestion

Comment on lines 822 to 825
result = subprocess.run(
["python", str(unsharding_config.unshard_script_path), sharding_input_dir, sharding_output_dir],
check=False,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unshard.py script is simple enough that I would be okay with copying that logic into here instead of shelling out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I avoided doing that in the fear that unshard.py could change over time, but I guess now that the main logic is moved into checkpointing classes that should be less of a problem. I'll work on this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a compromise, I 'modified' the unsharder so that I could call it directly (just renamed the unsharder's main to unshard), and then changed my code to call the unsharder directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvm, python doesn't work that way so I'll do this as you said.

Base automatically changed from shanea/storage-cleaner-download-upload to main December 7, 2023 18:52
Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@2015aroras 2015aroras merged commit 1ede949 into main Dec 11, 2023
10 checks passed
@2015aroras 2015aroras deleted the shanea/storage-cleaner-unsharding-2 branch December 11, 2023 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants