<center><h2>ALTeGraD 2023<br>Lab Session 4: NLP Frameworks</h2> 07 / 11 / 2023<br> Dr. G. Shang, H. Abdine<br><br>


<b>Student name:</b> Balthazar Neveu

</center>
<font color='gray'>

<font color="gray">

# <b>Part 2: Finetuning $BLOOM-560m$ using HuggingFace's Transfromers</b>
In this part, we will fintune [BLOOM-560m](https://huggingface.co/bigscience/bloom-560m) on a question/answer dataset. 

We will equally use LoRA and quantization during the finetuning.

## <b>Preparing the environment and installing libraries:<b>
</font>

In [2]:
import json
import os
from pprint import pprint

import bitsandbytes as bnb
import pandas as pd
import torch
import torch.nn as nn
import transformers
from datasets import load_dataset

from peft import (
    LoraConfig,
    PeftConfig,
    PeftModel,
    get_peft_model,
    prepare_model_for_kbit_training,
)
from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/user/mambaforge/envs/llm_bloom/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/user/mambaforge/envs/llm_bloom/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
CUDA SETUP: Loading binary /home/user/mambaforge/envs/llm_bloom/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...


  warn("The installed version of bitsandbytes was compiled without GPU support. "
  from .autonotebook import tqdm as notebook_tqdm


<font color="gray">

## <b>Loading the model and the tokenizer:</b>
In this section, we will load the BLOOM model while using the BitsAndBytes library for quantization.

</font>

[Bloom 560m](https://huggingface.co/bigscience/bloom-560m)

## Task 6: BitsAndBytes configuration [4 bits quantization](https://huggingface.co/blog/4bit-transformers-bitsandbytes)

In [None]:
MODEL_NAME = "bigscience/bloom-560m"
# Task 6
bnb_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="auto",
    trust_remote_code=True,
    quantization_config=bnb_config
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token


## Task 7: Trainable parameters

In [None]:

def print_trainable_parameters(model):

    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        # fill the gap: get the number of trainable parameters: trainable_params
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [None]:
config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

<font color="gray">

## <b>Test the model before finetuning:<b>

</font>


[Hugging Face Transformers: Generation configuration](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)


In [None]:
prompt = "<human>: Comment je peux créer un compte?  \n<assistant>:  " 
# fill the gap, prompt of the format: "<human>: Comment je peux créer un compte?  \n <assistant>:", with an empty response from the assistant
print(prompt)


generation_config = model.generation_config
generation_config.max_new_tokens = 200
generation_config.temperature = 0.7
generation_config.top_p = 0.7
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id


In [None]:
%%time
device = "cuda:0"

encoding = tokenizer(prompt, return_tensors="pt").to(device)
with torch.inference_mode(): # Inference, not training.
    outputs = model.generate(
        input_ids=encoding.input_ids,
        attention_mask=encoding.attention_mask,
        generation_config=generation_config,
    )
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

<font color="gray">

# Q/A dataset from Hugging face

</font>

In [3]:
data = load_dataset("OpenLLM-France/Tutoriel", data_files="ecommerce-faq-fr.json")
pd.DataFrame(data["train"])

Downloading data: 100%|██████████| 24.9k/24.9k [00:00<00:00, 592kB/s]
Downloading data files: 100%|██████████| 1/1 [00:00<00:00, 20.57it/s]
Extracting data files: 100%|██████████| 1/1 [00:00<00:00, 499.44it/s]
Generating train split: 79 examples [00:00, 10029.36 examples/s]


Unnamed: 0,answer,question
0,"Pour créer un compte, cliquez sur le bouton ""S...",Comment puis-je créer un compte ?
1,Nous acceptons les principales cartes de crédi...,Quels sont les modes de paiement acceptés ?
2,Vous pouvez suivre votre commande en vous conn...,Comment puis-je suivre ma commande ?
3,Notre politique de retour vous permet de renvo...,Quelle est votre politique de retour ?
4,Vous pouvez annuler votre commande si elle n'a...,Puis-je annuler ma commande ?
...,...,...
74,"Si un produit est listé comme ""épuisé"" mais di...",Puis-je commander un produit s'il est listé co...
75,"Oui, vous pouvez retourner un produit acheté a...",Puis-je retourner un produit acheté avec une c...
76,Si un produit n'est pas disponible dans la cou...,Puis-je demander un produit s'il n'est pas dis...
77,"Si un produit est listé comme ""bientôt disponi...",Puis-je commander un produit s'il est listé co...


## Task 8: Generate prompts
[Huggin face Transformers generation](https://huggingface.co/docs/transformers/v4.35.0/en/main_classes/text_generation#transformers.GenerationMixin.generate)

In [None]:
def generate_prompt(data_point):
    return # fill the gap, transform the data into prompts of the format: "<human>: question?  \n <assistant>: response"


def generate_and_tokenize_prompt(data_point):
    full_prompt = generate_prompt(data_point)
    tokenized_full_prompt = tokenizer(full_prompt, padding=True, truncation=True)
    return tokenized_full_prompt

data = data["train"].shuffle().map(generate_and_tokenize_prompt)

In [None]:
OUTPUT_DIR = "experiments"

training_args = transformers.TrainingArguments(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    learning_rate=2e-4,
    fp16=True,
    save_total_limit=3,
    logging_steps=1,
    output_dir=OUTPUT_DIR,
    max_steps=80,
    optim="paged_adamw_8bit",
    lr_scheduler_type="cosine",
    warmup_ratio=0.05,
    report_to="tensorboard",
)

trainer = transformers.Trainer(
    model=model,
    train_dataset=data,
    args=training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

model.config.use_cache = False
trainer.train()

## <b>Test the model after the finetuning:<b>

In [4]:
%%time
device = "cuda:0"

encoding = tokenizer(prompt, return_tensors="pt").to(device)
with torch.inference_mode():
    outputs = model.generate(
        input_ids=encoding.input_ids,
        attention_mask=encoding.attention_mask,
        generation_config=generation_config,
    )
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

NameError: name 'tokenizer' is not defined

In [None]:
def generate_response(question: str) -> str:
    prompt = # fill the gap, transform the data into prompts of the format: "<human>: question?  \n <assistant>: " with an empty response
    encoding = tokenizer(prompt, return_tensors="pt").to(device)
    with torch.inference_mode():
        outputs = model.generate(
            input_ids=encoding.input_ids,
            attention_mask=encoding.attention_mask,
            generation_config=generation_config,
        )
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    assistant_start = "<assistant>:"
    response_start = response.find(assistant_start)
    return response[response_start + len(assistant_start) :].strip()

In [None]:
prompt = "Puis-je retourner un produit s'il s'agit d'un article en liquidation ou en vente finale ?"
print('-', prompt,'\n')
print(generate_response(prompt))

prompt = "Que se passe-t-il lorsque je retourne un article en déstockage ?"
print('\n\n\n-', prompt, '\n')
print(generate_response(prompt))

print('\n\n\n-', prompt, '\n')
prompt = "Comment puis-je savoir quand je recevrai ma commande ?"
print(generate_response(prompt))