Pytorch qlinear #116

qwopqwop200 · 2023-05-30T15:09:55Z

Changed to automatically switch to pytorch backend when trainable is enabled in cuda backend.
This trains at roughly 1.3x (70 minutes --> 90 minutes) slower pace than triton, but has the advantage of not using triton.

PanQiWei

Great! By this users can train quantized model on windows, the slowdown is acceptable I think 👍

TheBloke · 2023-05-30T16:34:37Z

So does this mean that before this PR, all quantisation was using Triton even with use_triton=False ?

Or was this only the case for the new PEFT?

Anyway, great work if it's adding more support for Windows users!

qwopqwop200 added 6 commits May 30, 2023 23:40

change if trainable backend pytorch

5274313

change if trainable backend pytorch

dfd9dc0

remove log

33809a8

remove log

0f2841c

add warning

c381958

remove raise

b1a8cc2

PanQiWei approved these changes May 30, 2023

View reviewed changes

PanQiWei merged commit 93698e0 into peft_integration May 30, 2023

qwopqwop200 deleted the pytorch-qlinear branch May 30, 2023 16:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pytorch qlinear #116

Pytorch qlinear #116

qwopqwop200 commented May 30, 2023

PanQiWei left a comment

TheBloke commented May 30, 2023 •

edited

Loading

Pytorch qlinear #116

Pytorch qlinear #116

Conversation

qwopqwop200 commented May 30, 2023

PanQiWei left a comment

Choose a reason for hiding this comment

TheBloke commented May 30, 2023 • edited Loading

TheBloke commented May 30, 2023 •

edited

Loading