## Check GPU Availability

In [1]:
!nvidia-smi

Fri Nov 24 05:54:02 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   47C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Install required libraries

In [2]:
!pip install trl transformers accelerate git+https://github.com/huggingface/peft.git -Uqqq
!pip install datasets bitsandbytes einops wandb -Uqqq

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m133.9/133.9 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.4/261.4 kB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m521.2/521.2 kB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.8/100.8 kB[0m [31m11.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for peft (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━

## Importing libraries

In [3]:
import torch
from datasets import load_dataset
from transformers import (AutoModelForCausalLM,
                          AutoTokenizer,
                          BitsAndBytesConfig,
                          TrainingArguments,
                          GenerationConfig,
                          LlamaTokenizer,
                          HfArgumentParser,
                          pipeline,
                          logging)
from peft import (
    LoraConfig,
    PeftModel,
    get_peft_model,
    PeftConfig,
    prepare_model_for_kbit_training
)
from trl import SFTTrainer
import warnings
warnings.filterwarnings("ignore")



In [4]:
from huggingface_hub import notebook_login
notebook_login()
# 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

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
# authenticate WandB for logging metrics
import wandb
wandb.login()
# ee311bc313381f6bee4c5412a630fdad1175f3c0

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

## Load custom Mental Health conv dataset

In [None]:
dataset = load_dataset("Amod/mental_health_counseling_conversations", split = "train")

Downloading readme:   0%|          | 0.00/3.02k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/4.79M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

In [None]:
print(dataset['Context'][0])
print(dataset['Response'][0])

I'm going through some things with my feelings and myself. I barely sleep and I do nothing but think about how I'm worthless and how I shouldn't be here.
   I've never tried or contemplated suicide. I've always wanted to fix my issues, but I never get around to it.
   How can I change my feeling of being worthless to everyone?
If everyone thinks you're worthless, then maybe you need to find new people to hang out with.Seriously, the social context in which a person lives is a big influence in self-esteem.Otherwise, you can go round and round trying to understand why you're not worthless, then go back to the same crowd and be knocked down again.There are many inspirational messages you can find in social media.  Maybe read some of the ones which state that no person is worthless, and that everyone has a good purpose to their life.Also, since our culture is so saturated with the belief that if someone doesn't feel good about themselves that this is somehow terrible.Bad feelings are part 

## Model Training

In [None]:
model_name = "HuggingFaceH4/zephyr-7b-beta" # sharded falcon-7b model

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,            # load model in 4-bit precision
    bnb_4bit_quant_type="nf4",    # pre-trained model should be quantized in 4-bit NF format
    bnb_4bit_use_double_quant=True, # Using double quantization as mentioned in QLoRA paper
    bnb_4bit_compute_dtype=torch.bfloat16, # During computation, pre-trained model should be loaded in BF16 format
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config, # Use bitsandbytes config
    device_map="auto",  # Specifying device_map="auto" so that HF Accelerate will determine which GPU to put each layer of the model on
    trust_remote_code=True, # Set trust_remote_code=True to use falcon-7b model with custom code
)

ValueError: ignored

In [None]:
model


MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=2)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRM

In [5]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) # Set trust_remote_code=True
tokenizer.pad_token = tokenizer.eos_token # Setting pad_token same as eos_token

NameError: ignored

# DATASET PREP

In [None]:
import textwrap
import os

In [None]:
# Ignore warnings
logging.set_verbosity(logging.CRITICAL)

# Run text generation pipeline with our base model
# pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=1024)

In [None]:
SYSTEM_PROMPT = """\
    You are a therapist. Below is a description of a person's difficulties.
    Provide a helpful response to the person who is describing their difficulties to you."""

# Remove any common leading whitespace from every line in text.
SYSTEM_PROMPT = textwrap.dedent(SYSTEM_PROMPT).strip()

def format_prompt(message, system_prompt):
    B_INST, E_INST = "[INST]", "[/INST]"
    B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"

    prompt = f"{B_INST} {B_SYS}{system_prompt}{E_SYS} "

    message = str(message).strip()
    prompt += f"{message} {E_INST}"
    prompt = '<s> ' + prompt
    return prompt

In [None]:
prompt = format_prompt(message = dataset['Context'][0], system_prompt = SYSTEM_PROMPT)
prompt

"<s> [INST] <<SYS>>\nYou are a therapist. Below is a description of a person's difficulties.\nProvide a helpful response to the person who is describing their difficulties to you.\n<</SYS>>\n\n I'm going through some things with my feelings and myself. I barely sleep and I do nothing but think about how I'm worthless and how I shouldn't be here.\n   I've never tried or contemplated suicide. I've always wanted to fix my issues, but I never get around to it.\n   How can I change my feeling of being worthless to everyone? [/INST]"

In [None]:
def format_prompt(user_msg, asst_msg, system_prompt):
    B_INST, E_INST = "[INST]", "[/INST]"
    B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"

    prompt = f"{B_INST} {B_SYS}{system_prompt}{E_SYS} "
    user_msg = str(user_msg).strip().replace("\xa0", "")
    asst_msg = str(asst_msg).strip().replace("\xa0", "")
    prompt = '<s> ' + prompt + f"{user_msg} {E_INST} {asst_msg} </s> "

    return prompt

In [None]:
user_msg = dataset['Context'][0]
asst_msg = dataset['Response'][0]
prompt = format_prompt(user_msg, asst_msg, SYSTEM_PROMPT)
prompt

"<s> [INST] <<SYS>>\nYou are a therapist. Below is a description of a person's difficulties.\nProvide a helpful response to the person who is describing their difficulties to you.\n<</SYS>>\n\n I'm going through some things with my feelings and myself. I barely sleep and I do nothing but think about how I'm worthless and how I shouldn't be here.\n   I've never tried or contemplated suicide. I've always wanted to fix my issues, but I never get around to it.\n   How can I change my feeling of being worthless to everyone? [/INST] If everyone thinks you're worthless, then maybe you need to find new people to hang out with.Seriously, the social context in which a person lives is a big influence in self-esteem.Otherwise, you can go round and round trying to understand why you're not worthless, then go back to the same crowd and be knocked down again.There are many inspirational messages you can find in social media. Maybe read some of the ones which state that no person is worthless, and tha

In [None]:
def transform_to_single_column(example):

  user_msg = example['Context']
  asst_msg = example['Response']
  formatted_text = format_prompt(user_msg, asst_msg, SYSTEM_PROMPT)

  return {'text': formatted_text}

In [None]:
# Apply the transformation
transformed_dataset = dataset.map(transform_to_single_column)


Map:   0%|          | 0/3512 [00:00<?, ? examples/s]

In [None]:
print(transformed_dataset['text'][0])

<s> [INST] <<SYS>>
You are a therapist. Below is a description of a person's difficulties.
Provide a helpful response to the person who is describing their difficulties to you.
<</SYS>>

 I'm going through some things with my feelings and myself. I barely sleep and I do nothing but think about how I'm worthless and how I shouldn't be here.
   I've never tried or contemplated suicide. I've always wanted to fix my issues, but I never get around to it.
   How can I change my feeling of being worthless to everyone? [/INST] If everyone thinks you're worthless, then maybe you need to find new people to hang out with.Seriously, the social context in which a person lives is a big influence in self-esteem.Otherwise, you can go round and round trying to understand why you're not worthless, then go back to the same crowd and be knocked down again.There are many inspirational messages you can find in social media. Maybe read some of the ones which state that no person is worthless, and that everyo

# Model Training

In [None]:
# model = prepare_model_for_kbit_training(model)

lora_alpha = 32 # scaling factor for the weight matrices
lora_dropout = 0.05 # dropout probability of the LoRA layers
lora_rank = 32 # dimension of the low-rank matrices

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_rank,
    bias="none",  # setting to 'none' for only training weight params instead of biases
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "v_proj"]
)


peft_model = get_peft_model(model, peft_config)

AttributeError: ignored

In [None]:
output_dir = "./Zephy7b-Beta-sharded-bf16-finetuned-mental-health-conversational-Amod"
per_device_train_batch_size = 4 # reduce batch size by 2x if out-of-memory error
gradient_accumulation_steps = 4  # increase gradient accumulation steps by 2x if batch size is reduced
optim = "paged_adamw_32bit" # activates the paging for better memory management
save_strategy="steps" # checkpoint save strategy to adopt during training
save_steps = 10 # number of updates steps before two checkpoint saves
logging_steps = 10  # number of update steps between two logs if logging_strategy="steps"
learning_rate = 2e-4  # learning rate for AdamW optimizer
max_grad_norm = 0.3 # maximum gradient norm (for gradient clipping)
max_steps = 320        # training will happen for 320 steps
warmup_ratio = 0.03 # number of steps used for a linear warmup from 0 to learning_rate
lr_scheduler_type = "cosine"  # learning rate scheduler

training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
    push_to_hub=True,
)

In [None]:
trainer = SFTTrainer(
    model=peft_model,
    train_dataset=transformed_dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=128,
    tokenizer=tokenizer,
    args=training_arguments,
)

Map:   0%|          | 0/3512 [00:00<?, ? examples/s]

In [None]:
# upcasting the layer norms in torch.bfloat16 for more stable training
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.bfloat16)

In [None]:
peft_model.config.use_cache = False
trainer.train()

{'loss': 2.9172, 'learning_rate': 0.0002, 'epoch': 0.05}
{'loss': 2.55, 'learning_rate': 0.00019948693233918952, 'epoch': 0.09}
{'loss': 2.525, 'learning_rate': 0.00019795299412524945, 'epoch': 0.14}
{'loss': 2.4766, 'learning_rate': 0.00019541392564000488, 'epoch': 0.18}
{'loss': 2.3797, 'learning_rate': 0.00019189578116202307, 'epoch': 0.23}
{'loss': 2.4078, 'learning_rate': 0.00018743466161445823, 'epoch': 0.27}
{'loss': 2.3813, 'learning_rate': 0.00018207634412072764, 'epoch': 0.32}
{'loss': 2.3734, 'learning_rate': 0.0001758758122692791, 'epoch': 0.36}


In [None]:
trainer.push_to_hub()

## Inference Pipeline

In [11]:
# Loading original model
model_name = "TinyPixel/Llama-2-7B-bf16-sharded"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

ValueError: ignored

In [9]:
# Loading PEFT model
PEFT_MODEL = "caffeinatedwoof/Llama-2-7B-bf16-sharded-mental-health-conversational"

config = PeftConfig.from_pretrained(PEFT_MODEL)
peft_base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    return_dict=True,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

peft_model = PeftModel.from_pretrained(peft_base_model, PEFT_MODEL)

peft_tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
peft_tokenizer.pad_token = peft_tokenizer.eos_token

Loading checkpoint shards:   0%|          | 0/14 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/676 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

In [12]:
# Function to generate responses from both original model and PEFT model and compare their answers.
def generate_answer(query):
  system_prompt = """Answer the following question truthfully.
  If you don't know the answer, respond 'Sorry, I don't know the answer to this question.'.
  If the question is too complex, respond 'Kindly, consult a psychiatrist for further queries.'."""

  user_prompt = f"""<HUMAN>: {query}
  <ASSISTANT>: """

  final_prompt = system_prompt + "\n" + user_prompt

  device = "cuda:0"
  dashline = "-".join("" for i in range(50))

  encoding = tokenizer(final_prompt, return_tensors="pt").to(device)
  outputs = model.generate(input_ids=encoding.input_ids, generation_config=GenerationConfig(max_new_tokens=256, pad_token_id = tokenizer.eos_token_id, \
                                                                                                                     eos_token_id = tokenizer.eos_token_id, attention_mask = encoding.attention_mask, \
                                                                                                                     temperature=0.4, top_p=0.6, repetition_penalty=1.3, num_return_sequences=1,))
  text_output = tokenizer.decode(outputs[0], skip_special_tokens=True)

  print(dashline)
  print(f'ORIGINAL MODEL RESPONSE:\n{text_output}')
  print(dashline)

  peft_encoding = peft_tokenizer(final_prompt, return_tensors="pt").to(device)
  peft_outputs = peft_model.generate(input_ids=peft_encoding.input_ids, generation_config=GenerationConfig(max_new_tokens=256, pad_token_id = peft_tokenizer.eos_token_id, \
                                                                                                                     eos_token_id = peft_tokenizer.eos_token_id, attention_mask = peft_encoding.attention_mask, \
                                                                                                                     temperature=0.4, top_p=0.6, repetition_penalty=1.3, num_return_sequences=1,))
  peft_text_output = peft_tokenizer.decode(peft_outputs[0], skip_special_tokens=True)

  print(f'PEFT MODEL RESPONSE:\n{peft_text_output}')
  print(dashline)

## Compare responses between Original model and PEFT model

In [13]:
query = "How can I prevent anxiety and depression?"
generate_answer(query)

-------------------------------------------------
ORIGINAL MODEL RESPONSE:
Answer the following question truthfully.
  If you don't know the answer, respond 'Sorry, I don't know the answer to this question.'.
  If the question is too complex, respond 'Kindly, consult a psychiatrist for further queries.'.
<HUMAN>: How can I prevent anxiety and depression?
  <ASSISTANT>: 1) Eat healthy foods like fruits & vegetables; avoid junk/fast-food items as much possible.  2) Exercise regularly (30 minutes of brisk walking every day).   3) Get adequate sleep at night by going to bed early enough so that your body gets sufficient rest before waking up in morning hours.    4) Avoid smoking or drinking alcoholic beverages because they cause stress on our bodies which leads us towards mental illnesses such as Anxiety Disorders etc., but if someone does smoke then he should quit immediately after reading these lines otherwise it will only make things worse than ever imagined!
5) Try meditation technique

In [14]:
query = "How to take care of mental health?"
generate_answer(query)

-------------------------------------------------
ORIGINAL MODEL RESPONSE:
Answer the following question truthfully.
  If you don't know the answer, respond 'Sorry, I don't know the answer to this question.'.
  If the question is too complex, respond 'Kindly, consult a psychiatrist for further queries.'.
<HUMAN>: How to take care of mental health?
  <ASSISTANT>: 1) Eat well and exercise regularly; 2) Get enough sleep every night (7-8 hours); 3) Avoid stressful situations as much as possible; 4) Spend time with friends or family members who make us feel good about ourselves; 5) Don’t drink alcohol excessively because it can lead to depression in some people!
-------------------------------------------------
PEFT MODEL RESPONSE:
Answer the following question truthfully.
  If you don't know the answer, respond 'Sorry, I don't know the answer to this question.'.
  If the question is too complex, respond 'Kindly, consult a psychiatrist for further queries.'.
<HUMAN>: How to take care of men

In [None]:
query = "What is the warning sign of depression?"
generate_answer(query)

-------------------------------------------------
ORIGINAL MODEL RESPONSE:
Answer the following question truthfully. 
  If you don't know the answer, respond 'Sorry, I don't know the answer to this question.'.
  If the question is too complex, respond 'Kindly, consult a psychiatrist for further queries.'.
  <ASSISTANT>: <HUMAN> is feeling sad, hopeless, or empty. <HUMAN> is sleeping more or less than usual, has little interest or pleasure in doing things, and has lost weight or gained weight without trying. <HUMAN> is having trouble concentrating, remembering things, making decisions, or thinking about death or suicide. <HUMAN> is feeling tired, agitated, or restless. <HUMAN> is drinking too much or using drugs. <HUMAN> is having pain, headaches, cramps, or digestive problems. <HUMAN> is having trouble sleeping or sleeping too much. <HUMAN> is feeling guilty or worthless. <HUMAN> is having thoughts of death or suicide. <HUMAN> is having trouble eating or sleeping. <HUMAN> is talking ab