Regarding Hugginface bitsandbytes issue in inferencing #123866

Shorya22 · 2024-05-15T03:54:06Z

Shorya22
May 15, 2024

Hello All,

I have done a simple project which is Text Summarization and I used T5 Model and I successfully fine tuned my model and also inference it on notebook.i have installed bitsandbytes, accelerate,trl and all.

But when push it on hugginface and inference it on server less API. I get error "No package metadata was found for bitsandbytes".

Please suggest me how I can resolve this issue.

Answered by Aditya020224

May 16, 2024

QLoRA for Summarization can benefit from quantization libraries like bitsandbytes. Also u r ryt ,while typical fine-tuning with quantization isn't recommended. But I can provide you with sources to begin or understand :

LLM By Examples — Use Bitsandbytes for Quantization [medium.com] (for Hugging Face integration) if you are using hugging face transforms u can take help from huggingface.co
Quantization - Hugging Face [Hugging Face Quantization ON huggingface.co] (for general quantization techniques)

Also you can explore libraries like post-training quantization . After training, use bitsandbytes to quantize the weights of the trained model for deployment. This reduces model size and imp…

View full answer

Aditya020224 · 2024-05-15T13:31:28Z

Aditya020224
May 15, 2024

Hello All,

I have done a simple project which is Text Summarization and I used T5 Model and I successfully fine tuned my model and also inference it on notebook.i have installed bitsandbytes, accelerate,trl and all.

But when push it on hugginface and inference it on server less API. I get error "No package metadata was found for bitsandbytes".

Please suggest me how I can resolve this issue.

👇👇

The error "No package metadata found for bitsandbytes" means the serverless API can't find the "bitsandbytes" package you used during fine-tuning.
You can fix it as :

Bitsandbytes not on Server: "bitsandbytes" likely isn't installed in the serverless environment where inference happens. Hugging Face serverless APIs typically don't include every possible package.
Solution: Remove "bitsandbytes" from your code or find an alternative package available in the serverless environment. Check Hugging Face documentation for supported libraries.

6 replies

Shorya22 May 15, 2024
Author

But I need to apply bitsandbytes for quantization because I'm applying QLoRA for Summarization.

So how I can do that ?

Aditya020224 May 16, 2024

QLoRA for Summarization can benefit from quantization libraries like bitsandbytes. Also u r ryt ,while typical fine-tuning with quantization isn't recommended. But I can provide you with sources to begin or understand :

LLM By Examples — Use Bitsandbytes for Quantization [medium.com] (for Hugging Face integration) if you are using hugging face transforms u can take help from huggingface.co
Quantization - Hugging Face [Hugging Face Quantization ON huggingface.co] (for general quantization techniques)

Also you can explore libraries like post-training quantization . After training, use bitsandbytes to quantize the weights of the trained model for deployment. This reduces model size and improves inference speed.

thats all if my answer can solve your issue then accept my answer that will be the help for me !

Answer selected by Shorya22

Shorya22 May 16, 2024
Author

Perfect. I understood.

I have one more doubt regarding Text Summarization LLM fine tuning. Which is related to Preprocessing and all.

When I do LLM fine tuning with LLaMA-2 that's easy to do like I use model from huggingface and use CasualLM for text generation. I have done QLoRA perfectly.and I also use Chat_templete which is simple instruction Template and just map my data with that template and use tokenizer on pretrained model. And I used that tokenizer in SFTTrainer means there I don't need to do any preprocessing related to it because I see SFTTrainer handles this thing itself only just need to create chat_template prompt and map it.

But in Summarization it's not working give unwanted results in inferencing

My LLaMA-2 Text generation code: https://www.kaggle.com/code/shorya22/fine-tune-llama-2-llm-model-with-peft-qlora

Please help

Aditya020224 May 16, 2024

While the code you shared provides a good starting point for text generation with LLaMA-2, it might require adjustments for text summarization tasks. By incorporating preprocessing steps and exploring summarization techniques, you can improve the quality of your summaries. Likely u can explore libraries like 'Gensim' or transformers for extractive summarization functionalities.

[ Incorporate Preprocessing
Explore Summarization Techniques
Extractive Summarization
Abstractive Summarization
Fine-tune with Summarization Dataset ]

by the way in code firstly install :- from nltk.tokenize import sent_tokenize . Make sure you have these libraries installed for preprocessing.
then now create a new function for preprocessing the text data :-
def preprocess_text(text):
"""
It preprocesses the input text for summarization.

Args:
text:

Returns:
"""
sentences = sent_tokenize(text)
tokenizer = # Your tokenizer object from the fine-tuned model
tokens = tokenizer(sentences, return_tensors="pt")
return tokens

now in the training loop where you prepare the data for training, modify the code to include the preprocess_text function:-
input_ids = []
attention_mask = []

for text, summary in zip(text, summary):
preprocessed_text = preprocess_text(text)

input_ids.append(preprocessed_text['input_ids'])
attention_mask.append(preprocessed_text['attention_mask'])

rest is your training code .

Lastly, update the inference code to preprocess the new text before feeding it to the model :-
new_text_tokens = preprocess_text(new_text)
generated_summary = model.generate(**new_text_tokens)

and remember to replace ur tokenizer object from the fine-tuned model with the actual tokenizer u have obtained after fine-tuning your LLaMA-2 model.

this is just some changes from my side you can refer the summarization techniques which will further enhance ur ability to generate summaries with LLaMA-2.

Hope so it helps you well and all the best with ur coding journey😃✌

Shorya22 May 16, 2024
Author

Yes right. I will follow same.

Because I was confused because I am doing same thing what did in LLaMA and that perfectly trained. But I see Summarization is different I need to generate inputs and attention mask and all and Preprocess like you suggest.

Thanks for solution.

Aditya020224 May 16, 2024

is my pleasure to provide solution for problems in which i have knowledge,
by hearing that helped you feels good ✌👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Community

Regarding Hugginface bitsandbytes issue in inferencing #123866

{{title}}

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

GitHub Community

Regarding Hugginface bitsandbytes issue in inferencing #123866

Shorya22 May 15, 2024

Replies: 1 comment · 6 replies

Aditya020224 May 15, 2024

Shorya22 May 15, 2024 Author

Aditya020224 May 16, 2024

Shorya22 May 16, 2024 Author

Aditya020224 May 16, 2024

Shorya22 May 16, 2024 Author

Aditya020224 May 16, 2024

Shorya22
May 15, 2024

Replies: 1 comment 6 replies

Aditya020224
May 15, 2024

Shorya22 May 15, 2024
Author

Shorya22 May 16, 2024
Author

Shorya22 May 16, 2024
Author