Skip to content

Conversation

@Xia-Weiwen
Copy link

We didn't use torch.compile for these two functions because of some issue. These issues are now fixed in PyTorch.
Using torch.compile brings performance gain and also avoid some issues during finetuning.

@Titus-von-Koeller
Copy link
Collaborator

Cool! Is this ready? Because it's still marked as draft.

@Xia-Weiwen
Copy link
Author

Cool! Is this ready? Because it's still marked as draft.

Thanks. I think @jiqing-feng will collect some data before we mark it ready.

@jiqing-feng
Copy link
Contributor

jiqing-feng commented Jun 6, 2024

Hi @Titus-von-Koeller

For quantize_4bit_impl, it only happens in the load model, so these are no difference with nf4/fp4 performance before and after we applied torch.compile.

For double_quant_impl, I have tested it on llama-7b inference and opt-6.7b finetune;found no difference in performance.

We plan to apply the torch.compile in quantization op but didn't integrate it into the first version; this is because the issue in the comment:Don't use a torch.compile for now due to PyTorch issue https://github.com/pytorch/pytorch/issues/124382. Now the issue has been fixed, so we can apply torch.compile again. The torch.compile will be optimized constantly in pytorch :)

Thx!

@Titus-von-Koeller
Copy link
Collaborator

Thanks @jiqing-feng for explaining. Then it's ready to merge, @Xia-Weiwen, right?

@Xia-Weiwen Xia-Weiwen marked this pull request as ready for review June 6, 2024 08:56
@Xia-Weiwen
Copy link
Author

Thanks @jiqing-feng for explaining. Then it's ready to merge, @Xia-Weiwen, right?

Yes. Please. Thanks.

@Titus-von-Koeller Titus-von-Koeller merged commit 517eaf2 into bitsandbytes-foundation:multi-backend-refactor Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants