Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch qlinear #116

Merged
merged 6 commits into from
May 30, 2023
Merged

Pytorch qlinear #116

merged 6 commits into from
May 30, 2023

Conversation

qwopqwop200
Copy link
Collaborator

Changed to automatically switch to pytorch backend when trainable is enabled in cuda backend.
This trains at roughly 1.3x (70 minutes --> 90 minutes) slower pace than triton, but has the advantage of not using triton.

Copy link
Collaborator

@PanQiWei PanQiWei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! By this users can train quantized model on windows, the slowdown is acceptable I think 👍

@PanQiWei PanQiWei merged commit 93698e0 into peft_integration May 30, 2023
@qwopqwop200 qwopqwop200 deleted the pytorch-qlinear branch May 30, 2023 16:13
@TheBloke
Copy link
Contributor

TheBloke commented May 30, 2023

So does this mean that before this PR, all quantisation was using Triton even with use_triton=False ?

Or was this only the case for the new PEFT?

Anyway, great work if it's adding more support for Windows users!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants