# **Intelligent ChatBot with LLM**

This Notebook represents the developement of an interactive ChatBot system by Finetunning Llama 13B on alpaca-farm Dataset.

Link to **Dataset alpaca-farm** : https://huggingface.co/datasets/tatsu-lab/alpaca_farm?row=0

##### **Installations**

In [1]:
! pip install accelerate peft bitsandbytes git+https://github.com/huggingface/transformers trl py7zr auto-gptq optimum


Collecting git+https://github.com/huggingface/transformers
  Cloning https://github.com/huggingface/transformers to /tmp/pip-req-build-ohezghd2
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /tmp/pip-req-build-ohezghd2
  Resolved https://github.com/huggingface/transformers to commit e0c3cee17085914bbe505c159beeb8ae39bc37dd
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [None]:
pip install datasets



##### **Credentials**

In [None]:
from huggingface_hub import notebook_login
notebook_login()


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

##### **Imports**

In [None]:
import torch
from datasets import load_dataset, Dataset
from peft import LoraConfig, AutoPeftModelForCausalLM, prepare_model_for_kbit_training, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig, TrainingArguments
from trl import SFTTrainer
import os


##### **Import Dataset**

P.S : this dataset is transformed into a pandas dataset for easier use

In [None]:
data = load_dataset("tatsu-lab/alpaca_farm",split="val")
data_df = data.to_pandas()
data_df = data_df[:5000]
data_df["text"] = data_df[["input", "instruction", "output"]].apply(lambda x: "###Human: " + x["instruction"] + " " + x["input"] + " ###Assistant: "+ x["output"], axis=1)
data = Dataset.from_pandas(data_df)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


Downloading builder script:   0%|          | 0.00/7.26k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/8.77M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/4.36M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/8.75M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/892k [00:00<?, ?B/s]

{'preference': /root/.cache/huggingface/datasets/downloads/664f5dee9b356eb0291ebcca509e6b250f32d2e392375b486fe6808b41d53bce (origin=https://huggingface.co/datasets/tatsu-lab/alpaca_farm/resolve/main/./alpaca_instructions/preference.json), 'sft': /root/.cache/huggingface/datasets/downloads/58ccba05b04503b0da817bca463806c4d3d8fead73424fbbc2b94342e8bf0c64 (origin=https://huggingface.co/datasets/tatsu-lab/alpaca_farm/resolve/main/./alpaca_instructions/sft.json), 'unlabeled': /root/.cache/huggingface/datasets/downloads/2fc6ec6cb7124b6a9e443d688e1d763fd90d7d4afccdc3ae61d812af4df08919 (origin=https://huggingface.co/datasets/tatsu-lab/alpaca_farm/resolve/main/./alpaca_instructions/unlabeled.json), 'val': /root/.cache/huggingface/datasets/downloads/4620d9bef08bea6d3f61d25df51a993c6ccf1cb0f2f4a8f3e335c9d5af2aa9fc (origin=https://huggingface.co/datasets/tatsu-lab/alpaca_farm/resolve/main/./alpaca_instructions/val.json)}


Generating sft split: 0 examples [00:00, ? examples/s]

Generating preference split: 0 examples [00:00, ? examples/s]

Generating unlabeled split: 0 examples [00:00, ? examples/s]

Generating val split: 0 examples [00:00, ? examples/s]

In [None]:
data[0]

{'instruction': 'Given the following input, construct a creative story.',
 'input': 'A magic bow and arrow',
 'output': "Once upon a time, there lived a young girl named Alexa who was gifted with an incredible magical bow and arrow. Whenever she pulled back her bow and let an arrow fly, wherever it landed, something extraordinary happened. Flowers bloomed, oceans calmed, and the sun shone brighter. Alexa's bow and arrow were so powerful, that it could make even the most impossible things possible. One day, during a great storm, Alexa used her magical bow and arrow to bring calm and harmony to her hometown. She was praised and celebrated by the whole town, and she soon became a symbol of hope and peace.",
 'text': "###Human: Given the following input, construct a creative story. A magic bow and arrow ###Assistant: Once upon a time, there lived a young girl named Alexa who was gifted with an incredible magical bow and arrow. Whenever she pulled back her bow and let an arrow fly, wherever

##### **Finetuning Llama**

In [None]:
tokenizer = AutoTokenizer.from_pretrained("TheBloke/Llama-2-13B-Chat-GPTQ")
tokenizer.pad_token = tokenizer.eos_token


quantization_config_loading = GPTQConfig(bits=4, disable_exllama=True, tokenizer=tokenizer)
model = AutoModelForCausalLM.from_pretrained(
                            "TheBloke/Llama-2-13B-Chat-GPTQ",
                            quantization_config=quantization_config_loading,
                            device_map="auto"
                        )


model.config.use_cache=False
model.config.pretraining_tp=1
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)


peft_config = LoraConfig(
    r=16, lora_alpha=16, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM", target_modules=["q_proj", "v_proj"]
)
model = get_peft_model(model, peft_config)


training_arguments = TrainingArguments(
        output_dir="Llama-finetuned-Farmalpaca",
        per_device_train_batch_size=8,
        gradient_accumulation_steps=1,
        optim="paged_adamw_32bit",
        learning_rate=2e-4,
        lr_scheduler_type="cosine",
        save_strategy="epoch",
        logging_steps=100,
        num_train_epochs=1,
        max_steps=250,
        fp16=True,
        push_to_hub=True
)


trainer = SFTTrainer(
        model=model,
        train_dataset=data,
        peft_config=peft_config,
        dataset_text_field="text",
        args=training_arguments,
        tokenizer=tokenizer,
        packing=False,
        max_seq_length=512
)


trainer.train()

tokenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Using `disable_exllama` is deprecated and will be removed in version 4.37. Use `use_exllama` instead and specify the version with `exllama_config`.The value of `use_exllama` will be overwritten by `disable_exllama` passed in `GPTQConfig` or stored in your config file.


config.json:   0%|          | 0.00/837 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/7.26G [00:00<?, ?B/s]

The cos_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class
The sin_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class


generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


Step,Training Loss
100,1.3794
200,1.2389




TrainOutput(global_step=250, training_loss=1.2871568450927735, metrics={'train_runtime': 2243.2952, 'train_samples_per_second': 0.892, 'train_steps_per_second': 0.111, 'total_flos': 430231775477760.0, 'train_loss': 1.2871568450927735, 'epoch': 1.0})

##### **Testing Model**

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
! cp -r /content/Llama-finetuned-Farmalpaca /content/drive/MyDrive/

cp: cannot stat '/content/Llama-finetuned-Farmalpaca': No such file or directory


In [4]:
from peft import AutoPeftModelForCausalLM
from transformers import GenerationConfig
from transformers import AutoTokenizer
import torch

In [5]:
tokenizer = AutoTokenizer.from_pretrained("/content/drive/MyDrive/Llama-finetuned-Farmalpaca")

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


In [6]:
model = AutoPeftModelForCausalLM.from_pretrained(
    "/content/drive/MyDrive/Llama-finetuned-Farmalpaca",
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map="cuda")

generation_config = GenerationConfig(
    do_sample=True,
    top_k=1,
    temperature=0.1,
    max_new_tokens=500,
    pad_token_id=tokenizer.eos_token_id
)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model.safetensors:   0%|          | 0.00/7.26G [00:00<?, ?B/s]

The cos_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class
The sin_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class


generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

In [7]:
inputs = tokenizer("""###Human: Write a script for a YouTube video exploring the history and cultural significance of Pandas. ###Assistant: """, return_tensors="pt").to("cuda")

In [8]:
import time
st_time = time.time()
outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
print(time.time()-st_time)


###Human: Write a script for a YouTube video exploring the history and cultural significance of Pandas. ###Assistant: 

Hello and welcome to this video exploring the history and cultural significance of Pandas. 

Pandas are one of the most iconic animals in the world, and their unique black and white markings have made them a beloved symbol of China. But what do we really know about these fascinating creatures? 

In this video, we'll take a closer look at the history and cultural significance of Pandas, and explore how they have become an integral part of Chinese culture. From their origins in the mountains of China to their modern-day conservation efforts, we'll delve into the fascinating world of Pandas and uncover the secrets behind their enduring popularity. 

So, let's get started and explore the history and cultural significance of Pandas! 

(Insert video footage of Pandas)

As you can see, Pandas are truly one of the most fascinating animals in the world. But what makes them so 

In [11]:
inputs = tokenizer("""###Human: Name all the highest mountains in the world ###Assistant: """, return_tensors="pt").to("cuda")

In [10]:
st_time = time.time()
outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
print(time.time()-st_time)

###Human: Name all the highest mountains in the world ###Assistant: 1. Mount Everest (Nepal/China)
2. K2 (Pakistan/China)
3. Kangchenjunga (Nepal/India)
4. Lhotse (Nepal/China)
5. Makalu (Nepal/China)
6. Cho Oyu (Nepal/China)
7. Dhaulagiri (Nepal)
8. Manaslu (Nepal)
9. Nanga Parbat (Pakistan)
10. Annapurna (Nepal)
11. Gasherbrum (Pakistan/China)
12. Shishapangma (China)
13. Xixiabangma (China)
14. Yala Peak (Nepal)
15. Masherbrum (Pakistan)
16. Rakaposhi (Pakistan)
17. Diran (Pakistan)
18. Spantik (Pakistan)
19. Skil Brum (Pakistan)
20. Momhil Sar (Pakistan)
21. Batura Sar (Pakistan)
22. Rupal Sar (Pakistan)
23. Sia Kangri (Pakistan)
24. Masherbrum (Pakistan)
25. Nanga Parbat (Pakistan)
26. Rupal Peak (Pakistan)
27. Sia Kangri (Pakistan)
28. Masherbrum (Pakistan)
29. Nanga Parbat (Pakistan)
30. Rupal Peak (Pakistan)
31. Sia Kangri (Pakistan)
32. Masherbrum (Pakistan)
33. Nanga Parbat (Pakistan)
34. Rupal Peak (Pakistan)
35. Sia Kangri (Pakistan)
36. Masherbrum (Pakistan)
37. Nanga Parb

###### **German Language**

In [14]:
inputs = tokenizer("""###Human: Nenne alle besten F1-Spieler in Deutschland ###Assistant: """, return_tensors="pt").to("cuda")

In [15]:
st_time = time.time()
outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
print(time.time()-st_time)

###Human: Nenne alle besten F1-Spieler in Deutschland ###Assistant: 1. Sebastian Vettel
2. Lewis Hamilton
3. Nico Rosberg
4. Daniel Ricciardo
5. Max Verstappen
6. Kimi Räikkönen
7. Sergio Perez
8. Carlos Sainz Jr.
9. Romain Grosjean
10. Felipe Massa
11. Marcus Ericsson
12. Charles Leclerc
13. Pierre Gasly
14. Lance Stroll
15. Esteban Ocon
16. Kevin Magnussen
17. Jolyon Palmer
18. Daniil Kvyat
19. Marcus Armstrong
20. Mick Schumacher
21. Antonio Giovinazzi
22. Alexander Albon
23. George Russell
24. Robert Kubica
25. Lando Norris
26. Jack Aitken
27. Nicholas Latifi
28. Callum Ilott
29. Jake Dennis
30. Jamie Chadwick
31. Sophia Floersch
32. Tatiana Calderón
33. Beitske Visser
34. Vicky Piria
35. Doriane Pin
36. Nina Hertz
37. Ayla Ågren
38. Tess Hofer
39. Nina Watts
40. Sarah Bovy
41. Mia Stellberg
42. Tiffany Valtier
43. Tara Coughlan
44. Tilly Ramsay
45. Sophie Longin
46. Lily Jones
47. Emily Linscott
48. Lily Webb
49. Emily Flynn
50. Lily Hodgkinson
51. Emily Watts
52. Lily Brennan
53.