Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We can no longer run on a single CPU #118

Closed
dirkgr opened this issue Apr 27, 2023 · 2 comments
Closed

We can no longer run on a single CPU #118

dirkgr opened this issue Apr 27, 2023 · 2 comments
Assignees
Labels
difficulty/medium May take a day project/model Related to modeling decisions and implementations severity/should Something that should be implemented/fixed

Comments

@dirkgr
Copy link
Member

dirkgr commented Apr 27, 2023

We can't run in a debugger anymore.

@dirkgr dirkgr added project/model Related to modeling decisions and implementations severity/should Something that should be implemented/fixed difficulty/medium May take a day labels Apr 27, 2023
@epwalsh
Copy link
Member

epwalsh commented Apr 27, 2023

What were you trying to debug? Sure, we could make the training run on a single CPU (or GPU) but that adds complexity and new code paths. E.g. we can't use FSDP for non-distributed training so we'd have to have a separate checkpointing mechanism.

@dumitrac
Copy link
Contributor

Marking the items prior to Feb 29th as "closed".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty/medium May take a day project/model Related to modeling decisions and implementations severity/should Something that should be implemented/fixed
Projects
None yet
Development

No branches or pull requests

3 participants