CPU: add torch.compile for F.double_quant and F.quantize_4bit #1238

Xia-Weiwen · 2024-06-05T03:57:18Z

We didn't use torch.compile for these two functions because of some issue. These issues are now fixed in PyTorch.
Using torch.compile brings performance gain and also avoid some issues during finetuning.

Titus-von-Koeller · 2024-06-05T09:25:29Z

Cool! Is this ready? Because it's still marked as draft.

Xia-Weiwen · 2024-06-05T09:27:27Z

Cool! Is this ready? Because it's still marked as draft.

Thanks. I think @jiqing-feng will collect some data before we mark it ready.

jiqing-feng · 2024-06-06T07:30:39Z

Hi @Titus-von-Koeller

For quantize_4bit_impl, it only happens in the load model, so these are no difference with nf4/fp4 performance before and after we applied torch.compile.

For double_quant_impl, I have tested it on llama-7b inference and opt-6.7b finetune;found no difference in performance.

We plan to apply the torch.compile in quantization op but didn't integrate it into the first version; this is because the issue in the comment:Don't use a torch.compile for now due to PyTorch issue https://github.com/pytorch/pytorch/issues/124382. Now the issue has been fixed, so we can apply torch.compile again. The torch.compile will be optimized constantly in pytorch :)

Thx!

Titus-von-Koeller · 2024-06-06T08:53:00Z

Thanks @jiqing-feng for explaining. Then it's ready to merge, @Xia-Weiwen, right?

Xia-Weiwen · 2024-06-06T08:56:27Z

Thanks @jiqing-feng for explaining. Then it's ready to merge, @Xia-Weiwen, right?

Yes. Please. Thanks.

CPU: add torch.compile for F.double_quant and F.quantize_4bit

63a761f

Xia-Weiwen marked this pull request as ready for review June 6, 2024 08:56

Titus-von-Koeller merged commit 517eaf2 into bitsandbytes-foundation:multi-backend-refactor Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

CPU: add torch.compile for F.double_quant and F.quantize_4bit #1238

CPU: add torch.compile for F.double_quant and F.quantize_4bit #1238

Uh oh!

Xia-Weiwen commented Jun 5, 2024

Uh oh!

Titus-von-Koeller commented Jun 5, 2024

Uh oh!

Xia-Weiwen commented Jun 5, 2024

Uh oh!

jiqing-feng commented Jun 6, 2024 •

edited

Loading

Uh oh!

Titus-von-Koeller commented Jun 6, 2024

Uh oh!

Xia-Weiwen commented Jun 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

CPU: add torch.compile for F.double_quant and F.quantize_4bit #1238

CPU: add torch.compile for F.double_quant and F.quantize_4bit #1238

Uh oh!

Conversation

Xia-Weiwen commented Jun 5, 2024

Uh oh!

Titus-von-Koeller commented Jun 5, 2024

Uh oh!

Xia-Weiwen commented Jun 5, 2024

Uh oh!

jiqing-feng commented Jun 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Titus-von-Koeller commented Jun 6, 2024

Uh oh!

Xia-Weiwen commented Jun 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jiqing-feng commented Jun 6, 2024 •

edited

Loading