Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to add 4-bit AdamW and 4-bit SGD #775

Open
LiutongZhou opened this issue Sep 17, 2023 · 4 comments
Open

Request to add 4-bit AdamW and 4-bit SGD #775

LiutongZhou opened this issue Sep 17, 2023 · 4 comments

Comments

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@LiutongZhou
Copy link
Author

raise

@NicolasMejiaPetit
Copy link

NicolasMejiaPetit commented Feb 19, 2024

Raise,

edit: I am currently no where near good enough at programming to do this, but it would be pretty cool to use some of the bnb paged 8bit adamw code, and 4bit adamw code and make a 4bit paged adamw. It would lower training requirements even lower gb cards, than a current implementation.

It could be possible to full fine tune a 7b model with 4bit optimizer, with a 24 gig card. With gradient accumulation, Based off this chart.
IMG_3013

@younesbelkada younesbelkada reopened this Feb 19, 2024
@NicolasMejiaPetit
Copy link

If there happens to be a branch, or PR for this, I’d love to see it! Could you share a link?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants