# Project Name: EmpathAI - AI for Mental Wellness

Team Members Names: Debankitha Basu, Shreevidhya Shambanna, Ziyang Song

**Project Overview:**

Our project aims to develop a mental health chatbot that assists users by providing support and information on common mental health issues. Utilizing natural language processing (NLP) and machine learning (ML), the chatbot will interact with users in a conversational manner, understanding their concerns and offering guidance or resources.


**Data Sources:** [MentalHealthChat Dataset by Hizardev](https://huggingface.co/datasets/hizardev/MentalHealthChat)

**Other Data Sources:** [MentalHealthChat Datasets from huggingface](https://huggingface.co/datasets?sort=trending&search=mental)

### BERT MODEL - for intent classification

In [6]:
tokenized_dataset_copy

Unnamed: 0,text,preprocessed_text,input_ids,token_type_ids,attention_mask,intent_id
0,"\nsince i was a kid, i’ve been afraid of pictu...",since i was a kid i ve been afraid of pictures...,[ 101 2144 1045 2001 1037 4845 1010 10...,[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...,[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1...,5
1,I tried to slit my wrist. I couldn’t put enoug...,i tried to slit my wrist i couldn t put enough...,[ 101 1045 2699 2000 18036 2026 7223 10...,[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...,[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1...,2
2,I hope you feel better. I can relate to the co...,i hope you feel better i can relate to the con...,[ 101 1045 3246 2017 2514 2488 1012 10...,[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...,[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1...,2
3,This summer after I graduated my parents told ...,this summer after i graduated my parents told ...,[ 101 2023 2621 2044 1045 3852 2026 30...,[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...,[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1...,5
4,I've had it pretty bad this year. I've been in...,i ve had it pretty bad this year i ve been in ...,[ 101 1045 1005 2310 2018 2009 3492 29...,[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...,[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1...,6
...,...,...,...,...,...,...
9995,About 1 1/2 years ago I had such a strong urge...,about 1 1 2 years ago i had such a strong urge...,[ 101 2055 1015 1015 1013 1016 2086 32...,[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...,[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1...,5
9996,I often get mentally stuck in remembering thin...,i often get mentally stuck in remembering thin...,[ 101 1045 2411 2131 10597 5881 1999 103...,[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...,[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1...,0
9997,<s>[INST] <<SYS>>\nYou are a helpful and joyou...,s inst sys you are a helpful and joyous mental...,[ 101 1026 1055 1028 1031 16021 2102 10...,[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...,[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1...,7
9998,Everytime my boyfriend says he was hanging out...,everytime my boyfriend says he was hanging out...,[ 101 2296 7292 2026 6898 2758 2002 20...,[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...,[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1...,8


In [10]:

encodings = {
    'input_ids': tokenized_dataset_copy['input_ids'].tolist(),
    'attention_mask': tokenized_dataset_copy['attention_mask'].tolist(),
    'token_type_ids': tokenized_dataset_copy['token_type_ids'].tolist(),
}
labels = tokenized_dataset_copy['intent_id'].tolist()

# Split the dataset
train_size = int(0.8 * len(labels))
val_size = len(labels) - train_size

train_dataset = IntentClassificationDataset({k: v[:train_size] for k, v in encodings.items()}, labels[:train_size])
val_dataset = IntentClassificationDataset({k: v[train_size:] for k, v in encodings.items()}, labels[train_size:])

train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=4)

In [13]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from transformers import EvalPrediction

def compute_metrics(eval_pred: EvalPrediction):
    """Compute metrics for intent classification."""
    # Extract predictions and labels from the evaluation prediction
    logits, labels = eval_pred
    predictions = logits.argmax(axis=-1)

    # Calculate accuracy
    accuracy = accuracy_score(labels, predictions)

    # Calculate precision, recall, and F1 score (weighted)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, predictions, average='weighted')

    return {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1': f1,
    }

In [14]:
num_train_epochs = 3  # Define the number of epochs


In [17]:
from transformers import AdamW, get_linear_schedule_with_warmup

optimizer = AdamW(model.parameters(), lr=5e-5)

scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=500, num_training_steps=len(train_loader) * num_train_epochs)





In [None]:
from sklearn.metrics import accuracy_score
from transformers import BertForSequenceClassification
import numpy as np

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')


model_path = 'bert-base-uncased'
model = BertForSequenceClassification.from_pretrained(model_path, num_labels=num_clusters)
model.to(device)

for epoch in range(num_train_epochs):
    model.train()
    train_progress_bar = tqdm(train_loader, desc=f'Epoch {epoch+1}/{num_train_epochs} [Training]')
    for batch in train_progress_bar:
        # Explicitly move each tensor within the batch to the specified device
        batch = {k: v.to(device) for k, v in batch.items()}

        # Forward pass
        outputs = model(**batch)
        loss = outputs.loss

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

    # Evaluation phase
    model.eval()
    predictions, true_labels = [], []
    val_progress_bar = tqdm(val_loader, desc=f'Epoch {epoch+1}/{num_train_epochs} [Validation]')
    for batch in val_progress_bar:
        batch = {k: v.to(device) for k, v in batch.items()}
        with torch.no_grad():
            outputs = model(**batch)

        logits = outputs.logits
        predictions.append(logits.argmax(-1).cpu().numpy())
        true_labels.append(batch['labels'].cpu().numpy())

    predictions = np.concatenate(predictions)
    true_labels = np.concatenate(true_labels)
    accuracy = accuracy_score(true_labels, predictions)
    print(f'Validation Accuracy: {accuracy}')


In [None]:
predictions

array([3, 3, 8, ..., 3, 3, 3])

In [None]:
from transformers import BertTokenizer, BertForSequenceClassification
import torch
import openai

# Make sure the model and tokenizer are loaded and the model is placed on the appropriate device
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model_path = 'bert-base-uncased'  # or the path to your fine-tuned model
model = BertForSequenceClassification.from_pretrained(model_path, num_labels=num_clusters)  # Adjust num_labels accordingly
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)

def classify_intent(input_text):
    """Classify the intent of the input text using the fine-tuned BERT model."""
    inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True, max_length=512)
    inputs = {k: v.to(device) for k, v in inputs.items()}  # Move inputs to the same device as the model
    with torch.no_grad():
        outputs = model(**inputs)
    logits = outputs.logits
    predicted_intent_id = logits.argmax(axis=-1).item()
    return predicted_intent_id




Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
