## Install and Update Dependencies

In [1]:
!pip install bitsandbytes
!pip install accelerate
!pip install --upgrade transformers
!pip install --upgrade peft
!pip install --upgrade datasets



## Import the Depenedencies

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

  from .autonotebook import tqdm as notebook_tqdm


## Loading the Model and Tokenizer

The following cells sets  up a lightweight language model (TinyLlama) for natural language processing tasks, with a focuses on Memory Efficiency with 8-bit quantization, maps the model to devices if more than one GPU is being used, and gives the limited hardware the ability to process and generate text and fine-tune the model.


In [3]:
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T", padding_side="right",)
tokenizer.pad_token = tokenizer.eos_token
bnb_config = BitsAndBytesConfig(
  load_in_8bit=True,
)
model = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T", device_map="auto", quantization_config=bnb_config)

In [4]:
txt = """###SYSTEM: Based on INPUT title generate the prompt for generative model

###INPUT: What does it mean to have a mental illness?

###PROMPT:"""
tokens = tokenizer(txt, return_tensors="pt")['input_ids'].to("cuda")
op = model.generate(tokens, max_new_tokens=200)
print(tokenizer.decode(op[0]))

<s> ###SYSTEM: Based on INPUT title generate the prompt for generative model

###INPUT: What does it mean to have a mental illness?

###PROMPT: What is the difference between a mental illness and a mental disorder?

###PROMPT: What is the difference between a mental disorder and a mental illness?

###PROMPT: What is the difference between a mental illness and a mental disorder?

###PROMPT: What is the difference between a mental illness and a mental disorder?

###PROMPT: What is the difference between a mental illness and a mental disorder?

###PROMPT: What is the difference between a mental illness and a mental disorder?

###PROMPT: What is the difference between a mental illness and a mental disorder?

###PROMPT: What is the difference between a mental illness and a mental disorder?

###PROMPT: What is the difference between a mental illness and a mental disorder?



## Prepare the PEFT Model

The following cells prepares a pre-trained language model that can use parameter-efficient tuning (PEFT) with the assistance of LoRA(Low-Rank Adaptation), that focuses on efficiently finetuning and training a small subset of the modfel parameters in which efficiency and accuracy will not be compromised and sacrficed.

In [5]:
from peft import get_peft_model, LoraConfig, TaskType, prepare_model_for_kbit_training

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

#r value is important as it dictates the size of the model and the amount of finetuning parameters
peft_config = LoraConfig(inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1, peft_type=TaskType.CAUSAL_LM)
model = get_peft_model(model, peft_config)

print(model.print_trainable_parameters())

trainable params: 1,126,400 || all params: 1,101,174,784 || trainable%: 0.1023
None


In [6]:
def format_dataset(data_point):
    prompt = f"""###SYSTEM: Based on INPUT title generate the prompt for generative model

###INPUT: {data_point['Questions']}

###PROMPT: {data_point['Answers']}
"""
    tokens = tokenizer(prompt,
        truncation=True,
        max_length=512,
        padding="max_length",)
    tokens["labels"] = tokens['input_ids'].copy()
    return tokens

## Prepare and Load the Dataset

This prepares the chosen dataset that can be found in the same folder as the ipynb code. The code below also tokenizes and cleans up the dataset for training.

In [7]:
import pandas as pd

# Assuming you've downloaded the dataset to your Kaggle notebook
df = pd.read_csv('/home/arvi/mental-health-chatbot/mentalhealth.csv')

In [8]:
from datasets import Dataset

# Convert pandas DataFrame to Hugging Face Dataset
dataset = Dataset.from_pandas(df)
print(dataset)
dataset = dataset.map(format_dataset)
print(dataset)

Dataset({
    features: ['Question_ID', 'Questions', 'Answers'],
    num_rows: 97
})


Map: 100%|██████████| 97/97 [00:00<00:00, 425.49 examples/s]

Dataset({
    features: ['Question_ID', 'Questions', 'Answers', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 97
})





In [9]:
print(tokenizer.decode(dataset[0]['input_ids']))

<s> ###SYSTEM: Based on INPUT title generate the prompt for generative model

###INPUT: What does it mean to have a mental illness?

###PROMPT: Mental illnesses are health conditions that disrupt a person's thoughts, emotions, relationships, and daily functioning.
</s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s

In [10]:
dataset = dataset.remove_columns(['Questions', "Answers"])
print(dataset)

Dataset({
    features: ['Question_ID', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 97
})


## Setup the Training Model

Key Terms:
    tokenizer - converts raw text into token IDs that the model can understand
    
    data_collator - prepares batches of data for training. Padding sequences to the same length and handling any other          preprocessing that is needed to prepare a batch of data

    remove_unused_columns=False - We have intentionally prepared the dataset so that this function won't be needed. 

    per_device_train_batch_size=2 - The batch size used per device (GPU or CPU) during training. Determines how many samples are processed before the model's weights are updated

    gradient_checkpointing=True - Drastically reduce memory usage, allowing you to train large models on smaller devices or with larger batch sizes.

    gradient_accumulation_steps=4 - Used when you want to simulate a larger batch size without using too much GPU memory.

    max_steps - also known as epochs

    learning_rate=2.5e-5 -  learning rate for the optimizer

    logging_steps=5 This controls how often training statistics (e.g., loss, accuracy) are logged.

    fp16=True - Using half-precision floating point (FP16) can significantly speed up training and reduce memory usage

    optim="paged_adamw_8bit" - optimizer to use during training

    save_strategy="steps" - Saving checkpoints regularly allows you to restore training in case of interruptions.

    save_steps=50 - Number of steps between each checkpoint save.

    report_to="none" - When you don’t need external monitoring tools or want to save computational resources.

In [11]:
import torch
if torch.cuda.device_count() > 1: 
    model.is_parallelizable = True
    model.model_parallel = True

In [12]:
from transformers import Trainer, TrainingArguments, DataCollatorForLanguageModeling
data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)

trainer = Trainer(
                    model = model, 
                    train_dataset=dataset, 
                    tokenizer = tokenizer, 
                    data_collator = data_collator, 

                    args = TrainingArguments(
                        output_dir="./training",
                        remove_unused_columns=False,
                        per_device_train_batch_size=2,
                        gradient_checkpointing=True,
                        gradient_accumulation_steps=4,
                        max_steps=400,
                        learning_rate=2.5e-5, 
                        logging_steps=5,
                        fp16=True,
                        optim="paged_adamw_8bit",
                        save_strategy="steps",     
                        save_steps=50,           
                        report_to = "none",
                        
                ))

  trainer = Trainer(
max_steps is given, it will override any value given in num_train_epochs


## Train the Model

The code below sets up the necessary training environment for finetuning a language model using the HuggingFace's Trainer method alongside user-chosen parameters that will effectively and efficiently train the model to produce the desired results.

We have trained the model in both the Kaggle Notebook using the T4 X2 GPU and also on a Linux System with an RTX 2060 GPU.

In [None]:
#Kaggle Notebook with T4 X2 GPU
trainer.train()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,2.3353
10,2.1929
15,2.1647
20,2.3562
25,2.1364
30,2.1796
35,2.1469
40,2.2284
45,2.1053
50,2.0361


  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


TrainOutput(global_step=400, training_loss=1.7365498733520508, metrics={'train_runtime': 2143.7545, 'train_samples_per_second': 1.493, 'train_steps_per_second': 0.187, 'total_flos': 1.01807435022336e+16, 'train_loss': 1.7365498733520508, 'epoch': 32.6530612244898})

In [13]:
#Linux System with an RTX 2060 GPU
trainer.train()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Step,Training Loss
5,2.168
10,2.3027
15,2.687
20,2.3546
25,2.5016
30,2.1117
35,2.1921
40,2.39
45,2.1824
50,2.2808




TrainOutput(global_step=400, training_loss=1.8420527970790863, metrics={'train_runtime': 4211.4417, 'train_samples_per_second': 0.76, 'train_steps_per_second': 0.095, 'total_flos': 1.0078936067211264e+16, 'train_loss': 1.8420527970790863, 'epoch': 32.6530612244898})

## Generate Text Responses

The code below shows some prompts that the user can ask the AI Chatbot about mental health. The Chatbot will then output a helpful yet grammar-correct respone that the user can understand and learn from. The same with the training method, we ran it on a Kaggle Notebook with T4 X2 GPU and on a Linux System with an RTX 2060 GPU.

As explained by Bandi et al (2023), highlights how important user feedback is in evaluating the performance of Generative-based AI chatbots, especially those focused on mental health. While technical and numerical metrics help measure how well the chatbot's response on paper,user-centric evaluations such as how helpful the chatbot is, whether it answers questions clearly, and how satisfied users feel are much more practical metric.

Since chatbot conversations can change depending on the context, listening to user feedback is key to improving its responses and making sure it meets real-world needs effectively.

https://www.mdpi.com/1999-5903/15/8/260

In [None]:
#Kaggle Notebook with T4 X2 GPU
txt = """###SYSTEM: Based on INPUT title generate the prompt for generative model

###INPUT: What does it mean to have a mental illness?

###PROMPT:"""
tokens = tokenizer(txt, return_tensors="pt")['input_ids'].to("cuda")
op = model.generate(tokens, max_new_tokens=200)
print(tokenizer.decode(op[0]))



<s> ###SYSTEM: Based on INPUT title generate the prompt for generative model

###INPUT: What does it mean to have a mental illness?

###PROMPT: Mental illness is a term that describes a range of conditions that affect the way people think, feel, and behave. These conditions can be physical, emotional, or mental.

Some mental illnesses are more common than others. For example, depression is more common than anxiety. Other mental illnesses are more common in certain groups of people, such as people who are homeless or people who are incarcerated.

Mental illnesses can be caused by a variety of factors, including:

* Stress
* Lack of sleep
* Lack of exercise
* Lack of social support
* Lack of access to health care
* Lack of access to mental health services
* Lack of access to mental health education
* Lack of access to mental health support
* Lack of access to mental health services
* Lack of access to mental health education
* Lack of access to


In [14]:
#Linux System with an RTX 2060 GPU
txt = """###SYSTEM: Based on INPUT title generate the prompt for generative model

###INPUT: What does it mean to have a mental illness?

###PROMPT:"""
tokens = tokenizer(txt, return_tensors="pt")['input_ids'].to("cuda")
op = model.generate(tokens, max_new_tokens=200)
print(tokenizer.decode(op[0]))



<s> ###SYSTEM: Based on INPUT title generate the prompt for generative model

###INPUT: What does it mean to have a mental illness?

###PROMPT: A mental illness is a condition that affects your thoughts, feelings, and behaviour. It can be a mental health condition, such as depression, anxiety, or bipolar disorder, or a psychological disorder, such as schizophrenia or bipolar disorder.

Mental illnesses can be caused by a number of factors, including genetics, stress, and trauma. Some mental illnesses are more common than others, and some people are more likely to develop a mental illness than others.

Mental illnesses can be treated with medication, therapy, or both. Some people find that medication helps them manage their symptoms, while others find that therapy helps them cope with their symptoms.

Mental illnesses can be difficult to manage, and they can have a huge impact on your life. It’s important to talk to your doctor or mental health professional


## Save the Model

The code below saves the model allowing the user to share and reload the model for future use.

In [15]:
model.save_pretrained("tinyllama_peft_mentalheatlh", safe_serialization=False, )

## Comparison and Conclusion

Comparing the results with the original generative-based model and the improved generative-based model from another system shows that the TinyLlama and PEFT model produced significantly better output responses. When comparing the results of this model with the QLoRA Falcon model using the same dataset, the results were similar. However, the TinyLlama and PEFT with LoRA model requires significantly fewer hardware resources and can be executed on a free Kaggle Notebook, whereas the QLoRA Falcon model, which is recommended for use with the A100 GPU on Google Colab Pro, was unable to run on the free version of Google Colab.

Another aspect addressed in this study is the use of a small database (containing 97 rows), which showed significant results. This suggests that scaling the dataset to a larger size will not only increase its usage potential but also expand the range of topics it can output and communicate with the user.

In conclusion, the TinyLlama model, in conjunction with the PEFT with LoRA model, allows users to efficiently train and fine-tune the model with any text dataset for various machine learning tasks, without the need for high-end GPUs like the A100, which are only available on Google Colab Pro.

## Future Improvements

This study could benefit from future improvements, such as the use of a larger dataset, which would allow for more splits and the inclusion of additional topics, such as mental health or other areas. The model could also benefit from further fine-tuning to reduce training loss.