Skip to content

Regarding Hugginface bitsandbytes issue in inferencing #123866

Discussion options

You must be logged in to vote

QLoRA for Summarization can benefit from quantization libraries like bitsandbytes. Also u r ryt ,while typical fine-tuning with quantization isn't recommended. But I can provide you with sources to begin or understand :

  1. LLM By Examples — Use Bitsandbytes for Quantization [medium.com] (for Hugging Face integration) if you are using hugging face transforms u can take help from huggingface.co
  2. Quantization - Hugging Face [Hugging Face Quantization ON huggingface.co] (for general quantization techniques)

Also you can explore libraries like post-training quantization . After training, use bitsandbytes to quantize the weights of the trained model for deployment. This reduces model size and imp…

Replies: 1 comment 6 replies

Comment options

You must be logged in to vote
6 replies
@Shorya22
Comment options

@Aditya020224
Comment options

Answer selected by Shorya22
@Shorya22
Comment options

@Aditya020224
Comment options

@Shorya22
Comment options

@Aditya020224
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API and Webhooks Discussions and conversations related to APIs or Webhooks Question
2 participants