## **Generation based Chatbot with Llama**

In [1]:
import pandas as pd
import re
from tqdm.notebook import tqdm
import tensorflow as tf
from sklearn.model_selection import train_test_split

### **Fine-tuning Llama-7b**

In [3]:
!pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops wandb

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


### **Dataset**

In [4]:
!gdown --id 1OrtWVYzMEcCauJgP06kbaFuqRAL4UVEd

Downloading...
From: https://drive.google.com/uc?id=1OrtWVYzMEcCauJgP06kbaFuqRAL4UVEd
To: /content/reddit_conversation.csv
100% 7.96M/7.96M [00:00<00:00, 124MB/s]


In [5]:
df = pd.read_csv('reddit_conversation.csv')

In [6]:
df.head()

Unnamed: 0.1,Unnamed: 0,0,1,2
0,0,What kind of phone(s) do you guys have?,I have a pixel. It's pretty great. Much better...,Does it really charge all the way in 15 min?
1,1,I have a pixel. It's pretty great. Much better...,Does it really charge all the way in 15 min?,"Pretty fast. I've never timed it, but it's und..."
2,2,Does it really charge all the way in 15 min?,"Pretty fast. I've never timed it, but it's und...","cool. I've been thinking of getting one, my ph..."
3,3,What kind of phone(s) do you guys have?,Samsung Galaxy J1. It's my first cell phone an...,What do you think of it? Anything you don't like?
4,4,Samsung Galaxy J1. It's my first cell phone an...,What do you think of it? Anything you don't like?,I love it. I can't think of anything I don't l...


In [7]:
Questions = list()
Answers = list()
for i in tqdm(range(len(df))):
  Q = df['0'][i]
  A = df['1'][i]

  Q = Q.lower()
  A = A.lower()

  Q = re.sub('[\/:;-_+@&!?$()<>.,@#%^&*"]',"",Q)
  A = re.sub('[\/:;-_+@&!?$()<>.,@#%^&*"]',"",A)

  Questions.append(Q)
  Answers.append(A)

  0%|          | 0/56297 [00:00<?, ?it/s]

In [8]:
print(len(Questions))
print(len(Answers))

56297
56297


In [9]:
Questions[:10]

['what kind of phones do you guys have',
 "i have a pixel it's pretty great much better than what i had before ",
 'does it really charge all the way in 15 min',
 'what kind of phones do you guys have',
 "samsung galaxy j1 it's my first cell phone and i've had it for 7 months",
 "what do you think of it anything you don't like",
 'what kind of phones do you guys have',
 "lg optimus v i know it's old",
 'my friend told me to kill myself ',
 "don't kill yourself op"]

In [10]:
Answers[:10]

["i have a pixel it's pretty great much better than what i had before ",
 'does it really charge all the way in 15 min',
 "pretty fast i've never timed it but it's under half an hour ",
 "samsung galaxy j1 it's my first cell phone and i've had it for 7 months",
 "what do you think of it anything you don't like",
 "i love it i can't think of anything i don't like about it",
 "lg optimus v i know it's old",
 "if it does it's job it's good enough",
 "don't kill yourself op",
 "i won't give them the satisfaction "]

In [16]:
data = []
for i in tqdm(range(len(Questions))):
  data.append('Human: ' + str(Questions[i]) + ' ### Assistant: ' + str(Answers[i]))

  0%|          | 0/56297 [00:00<?, ?it/s]

In [17]:
print(len(data))

56297


In [18]:
data[0]

"Human: what kind of phones do you guys have ### Assistant: i have a pixel it's pretty great much better than what i had before "

In [19]:
dataframe = pd.DataFrame({'text': data})

In [20]:
dataframe.shape

(56297, 1)

In [21]:
dataframe.head()

Unnamed: 0,text
0,Human: what kind of phones do you guys have ##...
1,Human: i have a pixel it's pretty great much b...
2,Human: does it really charge all the way in 15...
3,Human: what kind of phones do you guys have ##...
4,Human: samsung galaxy j1 it's my first cell ph...


In [22]:
train, test = train_test_split(dataframe, test_size=0.3, random_state=42)

In [23]:
print(train.shape)
print(test.shape)

(39407, 1)
(16890, 1)


**convert to huggingeface Dataset**

In [24]:
from datasets import Dataset, DatasetDict

In [25]:
train_dataset = Dataset.from_dict(train)
test_dataset = Dataset.from_dict(test)
raw_dataset = DatasetDict({'train': train_dataset, 'test': test_dataset})

In [26]:
raw_dataset

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 39407
    })
    test: Dataset({
        features: ['text'],
        num_rows: 16890
    })
})

In [27]:
raw_dataset['train']

Dataset({
    features: ['text'],
    num_rows: 39407
})

In [28]:
for sample in raw_dataset['train']:
  print(sample)
  break

{'text': 'Human: my sunday is over\n\ne ### Assistant: rip'}


In [29]:
raw_dataset['train'][1]

{'text': "Human: favorite non-word ### Assistant: y'all if that counts and noice reminds me of that key and peele sketch"}

### **Loading the model**

In [30]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer

model_name = "TinyPixel/Llama-2-7B-bf16-sharded"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    trust_remote_code=True
)
model.config.use_cache = False

Downloading (…)lve/main/config.json:   0%|          | 0.00/626 [00:00<?, ?B/s]

Downloading (…)model.bin.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/14 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00014.bin:   0%|          | 0.00/981M [00:00<?, ?B/s]

Downloading (…)l-00002-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00003-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00004-of-00014.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)l-00005-of-00014.bin:   0%|          | 0.00/944M [00:00<?, ?B/s]

Downloading (…)l-00006-of-00014.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)l-00007-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00008-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00009-of-00014.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)l-00010-of-00014.bin:   0%|          | 0.00/944M [00:00<?, ?B/s]

Downloading (…)l-00011-of-00014.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)l-00012-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00013-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00014-of-00014.bin:   0%|          | 0.00/847M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/14 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

In [31]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

Downloading (…)okenizer_config.json:   0%|          | 0.00/676 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

In [32]:
from peft import LoraConfig, get_peft_model

lora_alpha = 16
lora_dropout = 0.1
lora_r = 64

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM"
)

### **Loading the trainer**

In [33]:
from huggingface_hub import login

# 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
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [35]:
from transformers import TrainingArguments, Trainer

output_dir = "./results"
per_device_train_batch_size = 4
gradient_accumulation_steps = 4
optim = "paged_adamw_32bit"
save_steps = 100
logging_steps = 10
learning_rate = 2e-4
max_grad_norm = 0.3
max_steps = 100
warmup_ratio = 0.03
lr_scheduler_type = "constant"

training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
    push_to_hub=True,
    push_to_hub_model_id= "generation-chatbot-Llama2-7B"
)



**Then finally pass everthing to the trainer**

In [37]:
from trl import SFTTrainer

max_seq_length = 512

trainer = SFTTrainer(
    model=model,
    train_dataset=raw_dataset['train'],
    eval_dataset=raw_dataset['test'],
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
)



Map:   0%|          | 0/39407 [00:00<?, ? examples/s]

Map:   0%|          | 0/16890 [00:00<?, ? examples/s]

/content/./results is already a clone of https://huggingface.co/danfarh2000/generation-chatbot-Llama2-7B. Make sure you pull the latest changes with `repo.git_pull()`.


We will also pre-process the model by upcasting the layer norms in float 32 for more stable training

In [38]:
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.float32)

### **Train the model**

api key -> 6b07b27c2691751312a3f10026ac9556794bf10e

In [39]:
trainer.train()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
10,5.9923
20,6.0551
30,4.8998
40,4.463
50,4.4616
60,7.2714
70,3.7763
80,3.7304
90,4.1271
100,3.3423


TrainOutput(global_step=100, training_loss=4.811935997009277, metrics={'train_runtime': 622.3685, 'train_samples_per_second': 2.571, 'train_steps_per_second': 0.161, 'total_flos': 999226193510400.0, 'train_loss': 4.811935997009277, 'epoch': 0.04})

In [52]:
model_to_save = trainer.model.module if hasattr(trainer.model, 'module') else trainer.model  # Take care of distributed/parallel training
model_to_save.save_pretrained("outputs")

In [53]:
lora_config = LoraConfig.from_pretrained('outputs')
model = get_peft_model(model, lora_config)

In [56]:
model.save_pretrained("/content/generation-chatbot-Llama2-7B/")

In [55]:
# save to huggingface
model.push_to_hub("generation-chatbot-Llama2-7B")

CommitInfo(commit_url='https://huggingface.co/danfarh2000/generation-chatbot-Llama2-7B/commit/6dc8bae12602aede5222455a46a77119b41f520e', commit_message='Upload model', commit_description='', oid='6dc8bae12602aede5222455a46a77119b41f520e', pr_url=None, pr_revision=None, pr_num=None)

### **Test the Model**

In [62]:
def run_model(question):
  device = "cuda:0"
  inputs = tokenizer(question, return_tensors="pt").to(device)
  response = model.generate(**inputs, max_new_tokens=10)
  result = tokenizer.decode(response[0], skip_special_tokens=True)
  return result

In [63]:
question = "hello, how are you today?"
run_model(question)

"hello, how are you today?\n nobody is perfect, but i'm not"