<a href="https://colab.research.google.com/github/NischalSuresh/mentalHealth_chatbot/blob/main/chatbot_finetune.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Install necessary libraries

In [1]:
!pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops wandb

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m118.0/118.0 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m27.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m258.1/258.1 kB[0m [31m22.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m30.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m29.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m49.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m59.6 MB/s[0m eta [36m

## Import necesssary packages

In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from datasets import load_dataset

## Load the dataset Counsel_chat from huggingface (https://github.com/nbertagnolli/counsel-chat)

In [3]:
dataset = load_dataset("nbertagnolli/counsel-chat")
dataset = dataset['train'].train_test_split(test_size=0.1)

Downloading readme:   0%|          | 0.00/4.92k [00:00<?, ?B/s]

Repo card metadata block was not found. Setting CardData to empty.


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/4.13M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

In [4]:
dataset

DatasetDict({
    train: Dataset({
        features: ['questionID', 'questionTitle', 'questionText', 'questionLink', 'topic', 'therapistInfo', 'therapistURL', 'answerText', 'upvotes', 'views'],
        num_rows: 2497
    })
    test: Dataset({
        features: ['questionID', 'questionTitle', 'questionText', 'questionLink', 'topic', 'therapistInfo', 'therapistURL', 'answerText', 'upvotes', 'views'],
        num_rows: 278
    })
})

In [5]:
len(dataset['train'])

2497

## Prompt template used in Llama-2 original model :
[INST] \<\<SYS\>\>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
\<\</SYS\>\>
{prompt}[/INST] <br>
 Source:
 - https://huggingface.co/TheBloke/Llama-2-70B-Chat-GGML
 - https://github.com/huggingface/blog/blob/main/llama2.md#fine-tuning-with-peft

 Refer chat_completion fn from generation.py for finetuning template - https://github.com/facebookresearch/llama/blob/main/llama/generation.py#L45

## Format the text prompt to required format

In [8]:
def formatting_prompts_func(counsel_data):
  output_texts = []
  for i in range(len(counsel_data)):
    if(counsel_data['answerText'][i]) is None:
      pass
    else:
      question = (counsel_data['questionText'][i]) if (counsel_data['questionText'][i]) is not None else (counsel_data['questionTitle'][i])
      text = f"<s>You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.\n### Question: {question.strip()}\n### Answer: {(counsel_data['answerText'][i]).strip()}</s>"
      output_texts.append(text)
  return output_texts

## Check data for error (missing entries)

In [None]:
print(len(dataset['train']))
for i in range(len(dataset['train'])):
  try:
    question = (dataset['train']['questionText'][i]) if (dataset['train']['questionText'][i]) is not None else (dataset['train']['questionTitle'][i])
    answer = (dataset['train']['answerText'][i])
  except:
    print("error",i)


In [8]:
out = formatting_prompts_func(dataset['train'])

## check data output

In [9]:
print(out[1])

</s>You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.
### Question: I have been thinking a lot about certain situations and having my worries about what others may think or say.
### Answer: I think it's a great idea that you are asking this question now while it's early on. The first thing I wonder is what age you are.  At different transitions and stages in life, it's really common for anxieties to come out of the woodwork.  First time this takes place at a greater level is during adolescence, because a teen's "job" is to figure out who they are within the context of other people.  Spikes in anxiety also can also occur after high school when you are leaving the nest and heading to college, or the late twenties when it feels like you are expected to have your life figured out and compare to where others are in their lives.  That same scenario can show up in mid-life as well for different reasons depending on the person.  Aside

## checking Llama format from llama-recipes

In [None]:
!pip install llama-recipes

In [None]:
from llama_recipes.utils.dataset_utils import get_preprocessed_dataset
from llama_recipes.configs.datasets import samsum_dataset

temp_dataset = get_preprocessed_dataset(tokenizer, samsum_dataset, 'train')

In [None]:
temp_dataset

In [None]:
tokenizer.decode(temp_dataset[1]['input_ids'])

## Finetune

In [10]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [11]:
model_name = "meta-llama/Llama-2-7b-hf"
bnb_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type = 'nf4',
    bnb_4bit_compute_dtype = torch.float16
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config = bnb_config,
    trust_remote_code = True
)
model.config.use_cache = False

Downloading (…)lve/main/config.json:   0%|          | 0.00/609 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

In [12]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

Downloading (…)okenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [13]:
from peft import LoraConfig, get_peft_model

lora_alpha = 16
lora_dropout = 0.1
lora_r = 64

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM"
)

In [15]:
from transformers import TrainingArguments

output_dir = "./results"
per_device_train_batch_size = 4
gradient_accumulation_steps = 4
optim = "paged_adamw_32bit"
save_steps = 10
logging_steps = 10
learning_rate = 2e-4
max_grad_norm = 0.3
max_steps = 200
warmup_ratio = 0.03
lr_scheduler_type = "constant"

training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
)

In [16]:
from trl import SFTTrainer
max_seq_length = 512
trainer = SFTTrainer(
    model = model,
    train_dataset = dataset['train'],
    peft_config = peft_config,
    formatting_func = formatting_prompts_func,
    max_seq_length = max_seq_length,
    tokenizer = tokenizer,
    args = training_arguments
)



Map:   0%|          | 0/2497 [00:00<?, ? examples/s]

In [17]:
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.float32)

In [18]:
trainer.train()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
10,2.3766
20,2.0092
30,1.7324
40,1.5139
50,1.2377
60,0.9297
70,0.6235
80,0.3923
90,0.2367
100,0.1116


TrainOutput(global_step=200, training_loss=0.5738980039954186, metrics={'train_runtime': 3481.2428, 'train_samples_per_second': 0.919, 'train_steps_per_second': 0.057, 'total_flos': 3.683028775624704e+16, 'train_loss': 0.5738980039954186, 'epoch': 100.0})

In [27]:
sample_question = "My husband left me after 10 years of marriage. What should I do?"
text = f"<s>You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.\n### Question: {sample_question}\n### Answer: "
print(text)
device = "cuda:0"

inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=50)
raw_output = (tokenizer.decode(outputs[0], skip_special_tokens=True))
print(raw_output.strip().split('###')[1:3])

<s>You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.
### Question: My husband left me after 10 years of marriage. What should I do?
### Answer: 




[' Question: My husband left me after 10 years of marriage. What should I do?\n', " Answer: 3 Things to Do First:1. Take care of yourself (and be patient)2. Don't make any decisions yet (except to stick around and keep breathing!)3. Find a way to explore this space with someone you can"]


In [26]:
print(final_output)

[' Question: My husband left me after 10 years of marriage. What should I do?\n', ' Answer: 3 Things to Do When Your Husband Leaves YouAfter your husband has been a witness to you being a person who is trying to be a healthy helpful person. 3. Find another person who will help you describe the type of ways you would like to be supported. There are lots of ways to get support when there has been a long time in a way of life that has been challenging your health. If you go alone it may just be a scary silent journey. Not only']


## Save the adapter to HF Hub

In [24]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [28]:
trainer.push_to_hub("llama2_finetuned_counselChat")

'https://huggingface.co/nischal95/results/tree/main/'