## Install and Update Dependencies

In [1]:
!pip install bitsandbytes
!pip install accelerate
!pip install --upgrade transformers
!pip install --upgrade peft
!pip install --upgrade datasets

Collecting bitsandbytes
  Downloading bitsandbytes-0.45.0-py3-none-manylinux_2_24_x86_64.whl.metadata (2.9 kB)
Downloading bitsandbytes-0.45.0-py3-none-manylinux_2_24_x86_64.whl (69.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.1/69.1 MB[0m [31m41.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0mm
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.45.0
Collecting accelerate
  Downloading accelerate-1.2.0-py3-none-any.whl.metadata (19 kB)
Collecting huggingface-hub>=0.21.0 (from accelerate)
  Using cached huggingface_hub-0.26.3-py3-none-any.whl.metadata (13 kB)
Collecting safetensors>=0.4.3 (from accelerate)
  Using cached safetensors-0.4.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Downloading accelerate-1.2.0-py3-none-any.whl (336 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m336.3/336.3 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0mta [36m0:00:01[0m
[?25hUsing cache

## Import the Depenedencies

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

## Loading the Model and Tokenizer

The following cells sets  up a lightweight language model (TinyLlama) for natural language processing tasks, with a focuses on Memory Efficiency with 8-bit quantization, maps the model to devices if more than one GPU is being used, and gives the limited hardware the ability to process and generate text and fine-tune the model.


In [3]:
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T", padding_side="right",)
tokenizer.pad_token = tokenizer.eos_token
bnb_config = BitsAndBytesConfig(
  load_in_8bit=True,
)
model = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T", device_map="auto", quantization_config=bnb_config)

tokenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/560 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/4.40G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/129 [00:00<?, ?B/s]

In [4]:
txt = """###SYSTEM: Based on INPUT title generate the prompt for generative model

###INPUT: What does it mean to have a mental illness?

###PROMPT:"""
tokens = tokenizer(txt, return_tensors="pt")['input_ids'].to("cuda")
op = model.generate(tokens, max_new_tokens=200)
print(tokenizer.decode(op[0]))

<s> ###SYSTEM: Based on INPUT title generate the prompt for generative model

###INPUT: What does it mean to have a mental illness?

###PROMPT: What is the difference between a mental illness and a mental disorder?

###PROMPT: What is the difference between a mental disorder and a mental illness?

###PROMPT: What is the difference between a mental disorder and a mental illness?

###PROMPT: What is the difference between a mental illness and a mental disorder?

###PROMPT: What is the difference between a mental disorder and a mental illness?

###PROMPT: What is the difference between a mental illness and a mental disorder?

###PROMPT: What is the difference between a mental disorder and a mental illness?

###PROMPT: What is the difference between a mental illness and a mental disorder?

###PROMPT: What is the difference between a mental disorder and a mental illness?



## Prepare the PEFT Model

The following cells prepares a pre-trained language model that can use parameter-efficient tuning (PEFT) with the assistance of LoRA(Low-Rank Adaptation), that focuses on efficiently finetuning and training a small subset of the modfel parameters in which efficiency and accuracy will not be compromised and sacrficed.

In [5]:
from peft import get_peft_model, LoraConfig, TaskType, prepare_model_for_kbit_training

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

#r value is important as it dictates the size of the model and the amount of finetuning parameters
peft_config = LoraConfig(inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1, peft_type=TaskType.CAUSAL_LM)
model = get_peft_model(model, peft_config)

print(model.print_trainable_parameters())

trainable params: 1,126,400 || all params: 1,101,174,784 || trainable%: 0.1023
None


In [6]:
def format_dataset(data_point):
    prompt = f"""###SYSTEM: Based on INPUT title generate the prompt for generative model

###INPUT: {data_point['Questions']}

###PROMPT: {data_point['Answers']}
"""
    tokens = tokenizer(prompt,
        truncation=True,
        max_length=512,
        padding="max_length",)
    tokens["labels"] = tokens['input_ids'].copy()
    return tokens

## Prepare and Load the Dataset

This prepares the chosen dataset that can be found in the same folder as the ipynb code. The code below also tokenizes and cleans up the dataset for training.

In [None]:
# make sure to cd to the correct directory
%cd mental-health-chatbot/

/home/jovyan/mental-health-chatbot


In [35]:
import pandas as pd

df = pd.read_csv('mentalhealth.csv')

In [37]:
from sklearn.model_selection import train_test_split

train_dataset, test_dataset = train_test_split(df, test_size=0.2)

In [38]:
from datasets import Dataset

# Convert pandas DataFrame to Hugging Face Dataset
train_d = Dataset.from_pandas(train_dataset)
test_d = Dataset.from_pandas(test_dataset)

train_d = train_d.map(format_dataset)
test_d = test_d.map(format_dataset)
print(train_d, test_d)

Map:   0%|          | 0/77 [00:00<?, ? examples/s]

Map:   0%|          | 0/20 [00:00<?, ? examples/s]

Dataset({
    features: ['Question_ID', 'Questions', 'Answers', '__index_level_0__', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 77
}) Dataset({
    features: ['Question_ID', 'Questions', 'Answers', '__index_level_0__', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 20
})


In [40]:
print(tokenizer.decode(train_d[0]['input_ids']))

<s> ###SYSTEM: Based on INPUT title generate the prompt for generative model

###INPUT: What’s the difference between dissociative identity disorder (multiple personality disorder) and schizophrenia?

###PROMPT: Sometimes, people confuse dissociative identity disorder, formerly known as multiple personality disorder, and schizophrenia. Schizophrenia does mean “split mind,” but the name was meant to describe the ‘split’ from reality that you experience during an episode of psychosis, as well as changes in thoughts, emotions, and other functions. Dissociative identity disorder, on the other hand, does cause a split or fragmented understanding of a person’s sense of themselves. 
 Dissociative identity disorder is really more about fragmented identities than many different personalities that develop on their own. Most people see different parts of their being as part of the whole person. For people who experience DID, identity fragments may have very different characteristics, including th

In [41]:
train_d = train_d.remove_columns(['Questions', "Answers"])
test_d = test_d.remove_columns(['Questions', "Answers"])
print(train_d, test_d)

Dataset({
    features: ['Question_ID', '__index_level_0__', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 77
}) Dataset({
    features: ['Question_ID', '__index_level_0__', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 20
})


In [42]:
train_d[0]

{'Question_ID': 7984793,
 '__index_level_0__': 85,
 'input_ids': [1,
  835,
  14816,
  1254,
  12665,
  29901,
  16564,
  373,
  2672,
  12336,
  3611,
  5706,
  278,
  9508,
  363,
  1176,
  1230,
  1904,
  13,
  13,
  2277,
  29937,
  1177,
  12336,
  29901,
  1724,
  30010,
  29879,
  278,
  4328,
  1546,
  766,
  2839,
  1230,
  10110,
  766,
  2098,
  313,
  20787,
  2022,
  2877,
  766,
  2098,
  29897,
  322,
  1364,
  466,
  459,
  13608,
  423,
  29973,
  13,
  13,
  2277,
  29937,
  29925,
  3491,
  7982,
  29901,
  18512,
  29892,
  2305,
  1970,
  1509,
  766,
  2839,
  1230,
  10110,
  766,
  2098,
  29892,
  21510,
  2998,
  408,
  2999,
  2022,
  2877,
  766,
  2098,
  29892,
  322,
  1364,
  466,
  459,
  13608,
  423,
  29889,
  1102,
  466,
  459,
  13608,
  423,
  947,
  2099,
  1346,
  5451,
  3458,
  3995,
  541,
  278,
  1024,
  471,
  6839,
  304,
  8453,
  278,
  5129,
  5451,
  30010,
  515,
  16832,
  393,
  366,
  7271,
  2645,
  385,
  12720,
  310,
  11643,

## Setup the Training Model

Key Terms:
    tokenizer - converts raw text into token IDs that the model can understand
    
    data_collator - prepares batches of data for training. Padding sequences to the same length and handling any other          preprocessing that is needed to prepare a batch of data

    remove_unused_columns=False - We have intentionally prepared the dataset so that this function won't be needed. 

    per_device_train_batch_size=2 - The batch size used per device (GPU or CPU) during training. Determines how many samples are processed before the model's weights are updated

    gradient_checkpointing=True - Drastically reduce memory usage, allowing you to train large models on smaller devices or with larger batch sizes.

    gradient_accumulation_steps=4 - Used when you want to simulate a larger batch size without using too much GPU memory.

    max_steps - also known as epochs

    learning_rate=2.5e-5 -  learning rate for the optimizer

    logging_steps=5 This controls how often training statistics (e.g., loss, accuracy) are logged.

    fp16=True - Using half-precision floating point (FP16) can significantly speed up training and reduce memory usage

    optim="paged_adamw_8bit" - optimizer to use during training

    save_strategy="steps" - Saving checkpoints regularly allows you to restore training in case of interruptions.

    save_steps=50 - Number of steps between each checkpoint save.

    report_to="none" - When you don’t need external monitoring tools or want to save computational resources.

In [14]:
import torch
if torch.cuda.device_count() > 1: 
    model.is_parallelizable = True
    model.model_parallel = True

In [48]:
from transformers import Trainer, TrainingArguments, DataCollatorForLanguageModeling
data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)

trainer = Trainer(
                    model = model, 
                    train_dataset=train_d, 
                    eval_dataset=test_d,
                    tokenizer = tokenizer, 
                    data_collator = data_collator, 

                    args = TrainingArguments(
                        output_dir="./training",
                        remove_unused_columns=False,
                        per_device_train_batch_size=12,
                        max_steps=20,
                        logging_steps=5,
                        fp16=True,
                        save_strategy="steps",     
                        save_steps=50,           
                        report_to = "none",
                        
                ))

  trainer = Trainer(


## Train the Model

The code below sets up the necessary training environment for fine-tuning a language model using the HuggingFace's Trainer method alongside user-chosen parameters that will effectively and efficiently train the model to produce the desired results.

The modes was successfully trained on multiple systems: Kaggle Notebook using the T4 X2 GPU, on a Linux System with an RTX 2060 GPU and on the HSLU's JupyterHub.

In [49]:
trainer.train()



Step,Training Loss
5,1.4498
10,1.5195
15,1.3987
20,1.416


TrainOutput(global_step=20, training_loss=1.4459938049316405, metrics={'train_runtime': 156.087, 'train_samples_per_second': 1.538, 'train_steps_per_second': 0.128, 'total_flos': 719015009845248.0, 'train_loss': 1.4459938049316405, 'epoch': 2.857142857142857})

## Generate Text Responses

The code below shows a prompt that the user can ask the AI Chatbot about mental health. The Chatbot will then output a helpful yet grammar-correct response that the user can understand and learn from. The same with the training method.

Bandi et al (2023) highlights how important user feedback is in evaluating the performance of generative-based AI chatbots, especially those focused on mental health. While technical and numerical metrics help measure how well the chatbot's response are on paper, user-centric evaluations such as how helpful the chatbot is, whether it answers questions clearly, and how satisfied users feel, are more practical metrics.

Since chatbot conversations can change depending on the context, listening to user feedback is key to improving its responses and making sure it meets real-world needs effectively.

https://www.mdpi.com/1999-5903/15/8/260

In [50]:
txt = """###SYSTEM: Based on INPUT title generate the prompt for generative model

###INPUT: What does it mean to have a mental illness?

###PROMPT:"""
tokens = tokenizer(txt, return_tensors="pt")['input_ids'].to("cuda")
op = model.generate(tokens, max_new_tokens=200)
print(tokenizer.decode(op[0]))



<s> ###SYSTEM: Based on INPUT title generate the prompt for generative model

###INPUT: What does it mean to have a mental illness?

###PROMPT: A mental illness is a problem with your mind, your thoughts, your emotions, or your behaviour.

Many people think of mental illness as a problem with your brain. But your brain is just one part of your body. Your brain is made up of billions of cells. Your brain is made up of billions of cells. Your brain is made up of billions of cells. Your brain is made up of billions of cells. Your brain is made up of billions of cells. Your brain is made up of billions of cells. Your brain is made up of billions of cells. Your brain is made up of billions of cells. Your brain is made up of billions of cells. Your brain is made up of billions of cells. Your brain is made up of billions of cells. Your brain is made up of billions of cells. Your brain is made up of billions of cells. Your brain is made up of billions


## Saving the Model

The code below saves the model allowing the user to share and reload the model for future use.

In [53]:
model.save_pretrained("tinyllama_peft_mentalhealth", safe_serialization=False, )

## BLEU score

In the following section, the BLEU score (as explained in the first Jupyter Notebook) for this model is calculated.

In [73]:
pred_answers = []

for question in test_dataset['Questions']:
    txt = """###SYSTEM: Based on INPUT title generate the prompt for generative model

    ###INPUT: {input}

    ###PROMPT:""".format(input=question)
    tokens = tokenizer(txt, return_tensors="pt")['input_ids'].to("cuda")
    op = model.generate(tokens, max_new_tokens=200)
    pred_answers.append(tokenizer.decode(op[0]))



In [74]:
test_dataset.head(1)

Unnamed: 0,Question_ID,Questions,Answers
58,6851366,What's the difference between a psychiatrist a...,A psychiatrist is a medical doctor with extra ...


In [None]:
pred_answers[0]

"<s> ###SYSTEM: Based on INPUT title generate the prompt for generative model\n\n    ###INPUT: What's the difference between a psychiatrist and a registered psychologist?\n\n    ###PROMPT: A psychiatrist is a doctor who specializes in mental health. A registered psychologist is a doctor who has completed a post-graduate program in psychology and is licensed to practice in BC.\n\n    ###PROMPT: A registered psychologist is a doctor who has completed a post-graduate program in psychology and is licensed to practice in BC.\n\n    ###SOURCE: http://www.bccareers.bc.ca/health-professions/psychology/\n\n### Psychology: What is Psychotherapy?\n\nPsychotherapy is a form of treatment that uses talk therapy to help people deal with problems or difficulties. It can be used to help people deal with problems or difficulties that are causing problems in their lives. It can also be used to help people deal with problems or difficulties that are not causing problems in their lives. Psychotherapy can b

In [94]:
expected_answers = []
for _, row in test_dataset.iterrows():
    txt = """###SYSTEM: Based on INPUT title generate the prompt for generative model

    ###INPUT: {input}

    ###PROMPT: {prompt}""".format(input=row['Questions'], prompt=row['Answers'])

    expected_answers.append([txt])

In [95]:
expected_answers[0]

["###SYSTEM: Based on INPUT title generate the prompt for generative model\n\n    ###INPUT: What's the difference between a psychiatrist and a registered psychologist?\n\n    ###PROMPT: A psychiatrist is a medical doctor with extra training in mental health who can choose to prescribe medications. Some use psychotherapy (‘talk therapies’) approaches like cognitive-behavioural therapy to treat mental health problems. Many psychiatrists work at hospitals, clinics, or health centres, and some have a private office. As they are specialist doctors, you will almost always need another doctor’s referral to see a psychiatrist, and fees are covered by MSP. If you have a valid BC Services or CareCard, you do not need to pay to see a psychiatrist. \n A registered psychologist focuses on different talk therapy or counselling approaches to treatments, but they don’t prescribe medication. They have graduate degrees in psychology. There are two different ways to access registered psychologists: the p

In [78]:
!pip install evaluate

import evaluate

bleu = evaluate.load("bleu")

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting evaluate
  Using cached evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Using cached evaluate-0.4.3-py3-none-any.whl (84 kB)
Installing collected packages: evaluate
Successfully installed evaluate-0.4.3


In [96]:
results = bleu.compute(
    predictions=pred_answers, references=expected_answers
)

The achieved BLEU score of 0.119 of this third model is far better than the 0.006 of the second model. It is not particularly impressive, but considering the limited number of samples used, the results are satisfactory.

In [97]:
results

{'bleu': 0.11899107417616166,
 'precisions': [0.4032216160041569,
  0.21337163750326457,
  0.17931215542137044,
  0.1689100026392188],
 'brevity_penalty': 0.5266571518373007,
 'length_ratio': 0.609308215925281,
 'translation_length': 3849,
 'reference_length': 6317}

## Conclusions

When comparing results, the Tiny-Llama model delivers much better responses than both the original and improved generative-based models shown in the first notebook. When comparing the Tiny-Llama model with a QLora Falcon model, the Tiny-Llama model uses far fewer hardware resources and runs easily on a free Kaggle or Google Colab notebook. In contrast, the QLoRA Falcon model, for example, requires a state-of-the-art NVIDIA A100 GPU.

This fine-tuned model based on Tiny-Llama shows that it is possible to create an effective mental health chatbot with only small amounts of data. Despite Llama2 having 6 times as many parameters as Tiny-Llama, its performance is similar (Nguyen, 2024). Furthermore, innovations like LoRA make it easier to design custom, high-performing chatbots for mental health support by reducing the number of trainable parameters.

## Future Improvements

Future improvements could focus on incorporating a larger dataset, leading to less overfitting and the inclusion of a wider range of topics, such as healthcare or other specialized domains. Expanding the dataset would not only enhance the model's scope but also improve its ability to generalize across various use cases. Additionally, further fine-tuning the model could help in minimizing training loss.