# 🧠 Mental Health Chatbot with Empathy, ABSA, and RAG

This project focuses on building a **mental health support chatbot** that combines  
- **Emotion & Sentiment Detection**  
- **Aspect-Based Sentiment Analysis (ABSA)**  
- **Retrieval-Augmented Generation (RAG)**  
- **Empathy & Variety Management**  
- **Safety Filtering & Crisis Detection**  
- **Text-to-Speech Interaction**  

The chatbot is designed to provide **empathetic, safe, and supportive conversations** while maintaining variety and avoiding repetitive responses.

---

### 📌 Project Details
- **Name:** Mitul Srivastava  
- **Student ID:** C00313606  
- **Course:** MSc Data Science  
- **University:** South East Technological University (SETU)  
- **Module:** Dissertation  
- **Supervisor:** Dr. Joseph Kehoe  

---


### Mount Google Drive in Colab

In order to save or load files directly from **Google Drive** while using Google Colab,  
we need to **mount** the drive so that it becomes accessible in the notebook's file system.

In [1]:
# Import the drive module from google.colab library
from google.colab import drive

# Mount Google Drive to the /content/drive directory
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Install Required Python Libraries

This command installs all the necessary libraries for building, training, and enhancing our chatbot with advanced NLP,  
sentiment analysis, and retrieval capabilities.

**Breakdown of packages:**

- **transformers==4.21.0** → Hugging Face library for state-of-the-art NLP models (e.g., GPT, BERT, etc.).  
  We specify version `4.21.0` for compatibility with our code.
- **torch**, **torchvision**, **torchaudio** → PyTorch ecosystem for deep learning, with `torchvision` for image tasks and `torchaudio` for audio tasks.
- **accelerate** → Simplifies multi-GPU and mixed precision training with Hugging Face models.
- **peft** → Parameter-Efficient Fine-Tuning for large language models (saves memory & speeds training).
- **bitsandbytes** → Optimized low-bit (8-bit, 4-bit) model loading for reduced VRAM usage.
- **sentence-transformers** → Library for semantic embeddings, similarity search, and retrieval-augmented generation (RAG).
- **textblob** → Simple NLP library for basic sentiment analysis and text preprocessing.
- **vaderSentiment** → Pre-trained lexicon-based sentiment analysis (especially for social media text).
- **scikit-learn** → Machine learning toolkit for model evaluation, feature processing, and metrics.
- **pandas** → Data handling & analysis library (works with CSV/Excel/JSON datasets).
- **numpy** → Numerical computations & array operations.
- **matplotlib**, **seaborn** → Data visualization libraries for graphs and analysis plots.
- **faiss** → Facebook AI’s library for fast similarity search (used in RAG pipelines).

**Why we do this first:**
- Installing dependencies at the start ensures the environment is ready for the rest of the notebook.
- Helps avoid compatibility errors later during model training or inference.


In [None]:
# Install specified versions of libraries for machine learning, NLP, and data analysis
!pip install transformers==4.21.0 torch torchvision torchaudio accelerate peft bitsandbytes sentence-transformers textblob vaderSentiment scikit-learn pandas numpy matplotlib seaborn faiss

Collecting transformers==4.21.0
  Downloading transformers-4.21.0-py3-none-any.whl.metadata (81 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/82.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m82.0/82.0 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
Collecting bitsandbytes
  Downloading bitsandbytes-0.46.1-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting vaderSentiment
  Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl.metadata (572 bytes)
[31mERROR: Could not find a version that satisfies the requirement faiss (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for faiss[0m[31m
[0m

### Install Compatible Libraries
- Uninstall current `datasets` to avoid version conflicts.
- Install `datasets==3.6.0` (stable for this project).
- Upgrade `fsspec` (file system handling) and `markdown` (text rendering).


In [None]:
# Uninstall the existing datasets library to avoid version conflicts
!pip uninstall datasets

# Install datasets library version 3.6.0 for specific compatibility
!pip install datasets==3.6.0

# Update fsspec library to the latest version for file system operations
!pip install -U fsspec

Found existing installation: datasets 4.0.0
Uninstalling datasets-4.0.0:
  Would remove:
    /usr/local/bin/datasets-cli
    /usr/local/lib/python3.11/dist-packages/datasets-4.0.0.dist-info/*
    /usr/local/lib/python3.11/dist-packages/datasets/*
Proceed (Y/n)? Y
  Successfully uninstalled datasets-4.0.0
Collecting datasets==3.6.0
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Downloading datasets-3.6.0-py3-none-any.whl (491 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.5/491.5 kB[0m [31m25.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: datasets
Successfully installed datasets-3.6.0


Collecting fsspec
  Downloading fsspec-2025.7.0-py3-none-any.whl.metadata (12 kB)
Downloading fsspec-2025.7.0-py3-none-any.whl (199 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.6/199.6 kB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: fsspec
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2025.3.0
    Uninstalling fsspec-2025.3.0:
      Successfully uninstalled fsspec-2025.3.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datasets 3.6.0 requires fsspec[http]<=2025.3.0,>=2023.1.0, but you have fsspec 2025.7.0 which is incompatible.
torch 2.6.0+cu124 requires nvidia-cublas-cu12==12.4.5.8; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cublas-cu12 12.5.3.2 which is incompatible.
torch 2.6.0+cu124 requires nvidia-cuda-cupti-cu12==12.4.127; plat

### Load, Clean, and Combine Multiple Dialogue Datasets

This cell:
1. **Imports necessary libraries**  
   - `os`, `pandas`, `re`, `html` for file handling, data processing, and text cleaning.  
   - `datasets` from Hugging Face to load public dialogue datasets.  
   - `BeautifulSoup` for HTML tag removal.  

2. **Defines helper functions** to load and standardize multiple datasets:
   - **`clean_html(text)`** → Removes HTML tags and returns plain text.
   - **`load_empathetic()`** → Loads *EmpatheticDialogues*, extracts context, response, and emotion.
   - **`load_daily_dialog()`** → Loads *DailyDialog*, structures alternating user–bot turns.
   - **`load_hope_dataset()`** → Loads HOPE CSVs (mental health therapy conversations), cleans HTML, pairs Client–Therapist turns.
   - **`load_counselchat_dataset()`** → Loads *CounselChat*, cleans HTML, limits text to 3 sentences, maps question–answer into `input`/`response`.

3. **Data cleaning and filtering** (`clean_and_filter_data(df)`):
   - Drops rows with empty inputs or responses.
   - Enforces length constraints (avoids very short or excessively long text).
   - Removes harmful or unsafe phrases from responses.

4. **Main loading logic**:
   - If a combined CSV already exists on Google Drive, load it directly.
   - Otherwise, load each dataset, merge them, clean/filter, then save to CSV.

5. **Verification step**:  
   Prints a random sample of 3 rows from the combined dataset to check correctness.

**Techniques used**:
- Multi-source dataset integration.
- HTML and whitespace cleaning.
- Filtering based on length and safety.
- Fallback-safe dictionary `.get()` access to avoid missing key errors.
- Saving preprocessed data for faster future runs.


In [None]:
# Import required libraries for file handling, data processing, and dataset loading
import os
import pandas as pd
import re
import html
from datasets import load_dataset, Dataset
from bs4 import BeautifulSoup

# Path to save combined dataset CSV on Google Drive
dataset_path = "/content/drive/MyDrive/AIchatbotmodels/combined_dataset.csv"

# Helper: Clean HTML tags (used in HOPE dataset)
def clean_html(text):
    return BeautifulSoup(text, "html.parser").get_text(separator=" ").strip()

# Load EmpatheticDialogues dataset safely
def load_empathetic():
    empathetic = load_dataset("empathetic_dialogues")
    emp_data = []
    for split in ['train', 'validation']:
        for item in empathetic[split]:
            # Use .get() with fallback to avoid KeyError
            input_text = item.get('context', '') or item.get('prompt', '')
            response_text = item.get('utterance', '')
            emotion = item.get('emotion', 'neutral')

            emp_data.append({
                'input': input_text,
                'response': response_text,
                'emotion': emotion,
                'source': 'empathetic'
            })
    return emp_data

# Load DailyDialog dataset
def load_daily_dialog():
    daily = load_dataset("daily_dialog")
    daily_data = []
    for split in ['train', 'validation']:
        for item in daily[split]:
            if len(item['dialog']) >= 2:
                for i in range(0, len(item['dialog']) - 1, 2):
                    if i + 1 < len(item['dialog']):
                        daily_data.append({
                            'input': item['dialog'][i],
                            'response': item['dialog'][i + 1],
                            'emotion': 'neutral',
                            'source': 'daily'
                        })
    return daily_data

# Load HOPE dataset from CSV URLs
def load_hope_dataset():
    csv_urls = [
        "https://raw.githubusercontent.com/Mitul060299/Mental-Health-Chatbot/main/Hope_data/163.csv",
        "https://raw.githubusercontent.com/Mitul060299/Mental-Health-Chatbot/main/Hope_data/204.csv",
        "https://raw.githubusercontent.com/Mitul060299/Mental-Health-Chatbot/main/Hope_data/206.csv",
        "https://raw.githubusercontent.com/Mitul060299/Mental-Health-Chatbot/main/Hope_data/27.csv",
        "https://raw.githubusercontent.com/Mitul060299/Mental-Health-Chatbot/main/Hope_data/48.csv",
        "https://raw.githubusercontent.com/Mitul060299/Mental-Health-Chatbot/main/Hope_data/67.csv",
        "https://raw.githubusercontent.com/Mitul060299/Mental-Health-Chatbot/main/Hope_data/75.csv",
        "https://raw.githubusercontent.com/Mitul060299/Mental-Health-Chatbot/main/Hope_data/97.csv"
    ]
    hope_data = []
    for url in csv_urls:
        try:
            df = pd.read_csv(url)
            df['Speaker'] = df['Type'].map({'T': 'Therapist', 'P': 'Client'})
            df['Content'] = df['Utterance'].astype(str)

            for i in range(1, len(df)):
                if df.loc[i - 1, 'Speaker'] == "Client" and df.loc[i, 'Speaker'] == "Therapist":
                    user = clean_html(df.loc[i - 1, 'Content'])
                    bot = clean_html(df.loc[i, 'Content'])

                    hope_data.append({
                        'input': user,
                        'response': bot,
                        'emotion': 'neutral',
                        'source': 'hope'
                    })
        except Exception as e:
            print(f"Failed to process {url}: {e}")
    return hope_data

# Load CounselChat dataset from URL
def load_counselchat_dataset():
    url = "https://raw.githubusercontent.com/nbertagnolli/counsel-chat/master/data/counselchat-data.csv"
    try:
        df = pd.read_csv(url)
        df = df.dropna(subset=["questionText", "answerText"])

        # Clean and limit sentences (up to 3) for each question and answer
        def clean_and_limit_sentences(text, max_sentences=3):
            text = re.sub(r'<[^>]+>', '', text)  # remove HTML tags
            text = html.unescape(text)
            text = re.sub(r'\s+', ' ', text).strip()

            sentences = re.split(r'\.(\s|$)', text)
            full_sentences = []
            for i in range(0, len(sentences) - 1, 2):
                full_sentences.append(sentences[i].strip() + '.')
                if len(full_sentences) == max_sentences:
                    break
            return ' '.join(full_sentences)

        df["questionText"] = df["questionText"].apply(lambda x: clean_and_limit_sentences(x, max_sentences=3))
        df["answerText"] = df["answerText"].apply(lambda x: clean_and_limit_sentences(x, max_sentences=3))

        df["input"] = df["questionText"]
        df["response"] = df["answerText"]
        df["emotion"] = "neutral"
        df["source"] = "counselchat"

        counsel_data = df[["input", "response", "emotion", "source"]].to_dict(orient='records')
        return counsel_data
    except Exception as e:
        print(f"Failed to load CounselChat dataset: {e}")
        return []

# Clean and filter the combined DataFrame
def clean_and_filter_data(df):
    print("Cleaning and filtering combined dataset...")
    # Drop empty inputs or responses
    df = df.dropna(subset=['input', 'response'])
    df = df[(df['input'].str.strip() != '') & (df['response'].str.strip() != '')]

    # Length filtering
    df = df[df['response'].str.len().between(10, 500)]
    df = df[df['input'].str.len().between(5, 300)]

    # Filter out harmful keywords in responses
    harmful_keywords = ['suicide', 'kill yourself', 'end it all', 'worthless']
    pattern = '|'.join(harmful_keywords)
    df = df[~df['response'].str.contains(pattern, case=False, regex=True)]

    print(f"Filtered dataset size: {len(df)}")
    return df

# Main loading logic with saving/loading CSV
if os.path.exists(dataset_path):
    print(f"Loading combined dataset from {dataset_path}")
    df = pd.read_csv(dataset_path)
    print(f"Loaded {len(df)} samples")
else:
    print("Combined dataset CSV not found, loading datasets fresh...")

    emp_data = load_empathetic()
    daily_data = load_daily_dialog()
    hope_data = load_hope_dataset()
    counsel_data = load_counselchat_dataset()

    combined_data = emp_data + daily_data + hope_data + counsel_data
    df = pd.DataFrame(combined_data)
    df = clean_and_filter_data(df)

    df.to_csv(dataset_path, index=False)
    print(f"Saved combined dataset CSV with {len(df)} samples")

# Display some samples to verify
print(df.sample(3))


Loading combined dataset from /content/drive/MyDrive/AIchatbotmodels/combined_dataset.csv
Loaded 130367 samples
                                                    input  \
15438                                         embarrassed   
73548                                            prepared   
100777   For us , karaoke is becoming a popular entert...   

                                                 response  emotion      source  
15438   I was in a hurry and they did look similar. Ho...  neutral  empathetic  
73548   Sometimes it's the extra step someone is willi...  neutral  empathetic  
100777   Yep . If you are a good singer , your audienc...  neutral       daily  


### Train–Test Split of Combined Dataset

This cell:
1. **Loads the combined preprocessed dataset** from Google Drive (`combined_dataset.csv`).
2. **Splits the data** into:
   - **Training set** → 80%
   - **Testing set** → 20%  
   using `train_test_split` from scikit-learn with a fixed `random_state=42` for reproducibility.
3. **Saves the resulting datasets** as separate CSV files in Google Drive for later use.
4. **Prints confirmation messages** showing saved file paths.

**Techniques used**:
- Controlled random splitting for consistent results across runs.
- Persistent storage of split datasets for efficient model training and evaluation.


In [None]:
# Import libraries for data handling and dataset splitting
import pandas as pd
from sklearn.model_selection import train_test_split

# Path to the combined dataset
dataset_path = "/content/drive/MyDrive/AIchatbotmodels/combined_dataset.csv"

# Load dataset into a pandas DataFrame
df = pd.read_csv(dataset_path)

# Split dataset into training (80%) and testing (20%) sets
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

# Define paths to save the training and testing datasets
train_path = "/content/drive/MyDrive/AIchatbotmodels/train_dataset.csv"
test_path = "/content/drive/MyDrive/AIchatbotmodels/test_dataset.csv"

# Save the training and testing datasets to CSV files
train_df.to_csv(train_path, index=False)
test_df.to_csv(test_path, index=False)

# Print confirmation of saved files
print(f"Train dataset saved to {train_path}")
print(f"Test dataset saved to {test_path}")

Train dataset saved to /content/drive/MyDrive/AIchatbotmodels/train_dataset.csv
Test dataset saved to /content/drive/MyDrive/AIchatbotmodels/test_dataset.csv


### Tokenizing, Splitting, and Saving the Dataset (Idempotent)

This script checks if tokenized train/test datasets already exist in Google Drive.  
If they do, it loads them directly to save time.  
Otherwise, it:

1. Loads the combined CSV dataset.
2. Converts it into a Hugging Face `Dataset`.
3. Loads a tokenizer (DialoGPT by default) and ensures it has a `pad_token`.
4. Tokenizes the dataset using the `input` and `response` columns.
5. Splits it into train/test sets with a fixed random seed.
6. Saves the tokenized datasets and tokenizer to Drive for future use.
7. Previews dataset stats and decodes one sample for sanity checking.


In [None]:
# Tokenize, split, and save dataset (designed to be idempotent)
import os
import pandas as pd
from datasets import Dataset, load_from_disk
from transformers import AutoTokenizer

# --------- CONFIG ---------
# Define paths and parameters for dataset, tokenizer, and model
dataset_path = "/content/drive/MyDrive/AIchatbotmodels/combined_dataset.csv"
train_path = "/content/drive/MyDrive/AIchatbotmodels/train_dataset_tokenized"
test_path  = "/content/drive/MyDrive/AIchatbotmodels/test_dataset_tokenized"
tokenizer_save_path = "/content/drive/MyDrive/AIchatbotmodels/tokenizer"
model_name = "microsoft/DialoGPT-medium"   # Model for tokenizer (can be changed)
max_length = 512  # Maximum token length for truncation/padding
test_size = 0.10  # Test set proportion
seed = 42  # Random seed for reproducibility
# --------------------------

# Check if tokenized datasets already exist to avoid reprocessing
print("Checking for existing tokenized datasets...")
if os.path.exists(train_path) and os.path.exists(test_path):
    print("✅ Found saved tokenized datasets. Loading them from Drive...")
    train_dataset = load_from_disk(train_path)
    test_dataset  = load_from_disk(test_path)
    print(f"Loaded: train={len(train_dataset)}  test={len(test_dataset)}")
else:
    print("No saved tokenized datasets found — creating them now (this will take some time)...")

    # 1) Load the combined dataset from CSV
    df = pd.read_csv(dataset_path)
    print(f"Loaded combined csv: {len(df)} rows. Columns: {list(df.columns)}")

    # 2) Convert pandas DataFrame to Hugging Face Dataset
    ds = Dataset.from_pandas(df)

    # 3) Set up tokenizer from pretrained model
    tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
    # Ensure a pad token exists for DialoGPT
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
        print("No pad_token found — set pad_token = eos_token.")

    # 4) Define tokenization function for input and response columns
    def tokenize_fn(examples):
        return tokenizer(
            examples["input"],
            text_pair=examples["response"],
            truncation=True,
            padding="max_length",
            max_length=max_length
        )

    # 5) Tokenize dataset in batches and remove original text columns
    print("Tokenizing dataset (batched) — this may take a while...")
    tokenized = ds.map(tokenize_fn, batched=True, remove_columns=ds.column_names)

    # 6) Split tokenized dataset into train and test sets
    print(f"Splitting tokenized dataset into train/test with test_size={test_size} (seed={seed})...")
    split = tokenized.train_test_split(test_size=test_size, seed=seed)
    train_dataset = split["train"]
    test_dataset  = split["test"]

    # 7) Save tokenized datasets and tokenizer to Google Drive
    print("Saving tokenized train/test datasets to Drive...")
    train_dataset.save_to_disk(train_path)
    test_dataset.save_to_disk(test_path)
    tokenizer.save_pretrained(tokenizer_save_path)
    print("Saved tokenized datasets and tokenizer.")

# Quick inspections of the datasets
print("\n=== Preview ===")
print(f"Train samples: {len(train_dataset)}")
print(f"Test  samples: {len(test_dataset)}")
print("Train dataset columns:", train_dataset.column_names)

# Verify tokenization by decoding one sample
sample = train_dataset[0]
if "input_ids" in sample:
    # Load tokenizer from saved path or model if not saved
    tok = AutoTokenizer.from_pretrained(tokenizer_save_path) if os.path.exists(tokenizer_save_path) else AutoTokenizer.from_pretrained(model_name)
    decoded = tok.decode(sample["input_ids"], skip_special_tokens=True)
    print("\nDecoded example (first train sample):")
    print(decoded[:1000])
else:
    print("Note: 'input_ids' not found in tokenized dataset — something went wrong.")

Checking for existing tokenized datasets...
No saved tokenized datasets found — creating them now (this will take some time)...
Loaded combined csv: 130367 rows. Columns: ['input', 'response', 'emotion', 'source']
No pad_token found — set pad_token = eos_token.
Tokenizing dataset (batched) — this may take a while...


Map:   0%|          | 0/130367 [00:00<?, ? examples/s]

Splitting tokenized dataset into train/test with test_size=0.1 (seed=42)...
Saving tokenized train/test datasets to Drive...


Saving the dataset (0/1 shards):   0%|          | 0/117330 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/13037 [00:00<?, ? examples/s]

Saved tokenized datasets and tokenizer.

=== Preview ===
Train samples: 117330
Test  samples: 13037
Train dataset columns: ['input_ids', 'attention_mask']

Decoded example (first train sample):
 We have quite a variety of shirts here.  What kind of material is this? 


### Load the Main DialoGPT Model onto GPU/CPU

This cell:
1. Detects available device (`cuda` if GPU, else `cpu`).
2. Loads the pre-trained DialoGPT model.
3. Moves the model to the selected device for faster inference.


In [None]:
# Import AutoModelForCausalLM for loading the pretrained model
from transformers import AutoModelForCausalLM

# Set device to CUDA if GPU is available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Load the DialoGPT-medium model for causal language modeling
main_model = AutoModelForCausalLM.from_pretrained(model_name)
main_model.to(device)
print("Model loaded and moved to device")


Using device: cuda


config.json:   0%|          | 0.00/642 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/863M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/863M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Model loaded and moved to device


### Fine-Tuning DialoGPT on Mental Health Dataset

This cell:
1. **Loads training and evaluation datasets** from disk.
2. **Defines training arguments** such as:
   - Output directory for model checkpoints.
   - Number of epochs, batch sizes, warmup steps.
   - Logging, saving, and evaluation intervals.
   - Restriction on saved checkpoint count to save space.
3. **Implements a custom data collator** to prepare inputs and labels for causal language modeling.
4. **Initializes the `Trainer`** with:
   - Model and tokenizer.
   - Training and evaluation datasets.
   - Training configuration and collator.
5. **Trains the model** and saves the final fine-tuned model and tokenizer to Google Drive.


In [None]:
# Import Trainer and TrainingArguments for model fine-tuning, and load_from_disk for datasets
from transformers import Trainer, TrainingArguments
from datasets import load_from_disk

# Load tokenized training and evaluation datasets from Google Drive
train_subset = load_from_disk("/content/drive/MyDrive/AIchatbotmodels/train_dataset")
eval_subset = load_from_disk("/content/drive/MyDrive/AIchatbotmodels/test_dataset")

# Define training arguments for fine-tuning the model
training_args = TrainingArguments(
    output_dir="/content/drive/MyDrive/AIchatbotmodels/mental_health_model",
    overwrite_output_dir=True,  # Overwrite output directory if it exists
    num_train_epochs=3,  # Number of training epochs
    per_device_train_batch_size=4,  # Batch size for training
    per_device_eval_batch_size=4,  # Batch size for evaluation
    warmup_steps=500,  # Number of warmup steps for learning rate scheduler
    logging_steps=100,  # Log training metrics every 100 steps
    save_steps=1000,  # Save model checkpoint every 1000 steps
    eval_strategy="steps",  # Evaluate model during training at specified steps
    eval_steps=1000,  # Evaluate every 1000 steps
    save_total_limit=2,  # Keep only the last 2 checkpoints
    prediction_loss_only=True,  # Compute only the loss during evaluation
    report_to="none"  # Disable external logging (e.g., wandb)
)

# Define data collator to prepare inputs and labels for causal language modeling
def data_collator(features):
    input_ids = torch.stack([f["input_ids"] for f in features])
    attention_mask = torch.stack([f["attention_mask"] for f in features])
    labels = input_ids.clone()  # Labels are input_ids, shifted internally by the model

    return {
        "input_ids": input_ids,
        "attention_mask": attention_mask,
        "labels": labels,
    }

# Initialize Trainer with model, arguments, datasets, collator, and tokenizer
trainer = Trainer(
    model=main_model,
    args=training_args,
    train_dataset=train_subset,
    eval_dataset=eval_subset,
    data_collator=data_collator,
    tokenizer=main_tokenizer,
)

# Start the fine-tuning process
print("Starting training... (this may take some time)")
trainer.train()

# Confirm training completion
print("Training completed!")

# Save the fine-tuned model and tokenizer to Google Drive
trainer.save_model("/content/drive/MyDrive/AIchatbotmodels/mental_health_model")
main_tokenizer.save_pretrained("/content/drive/MyDrive/AIchatbotmodels/mental_health_model")
print("Model and tokenizer saved!")


  trainer = Trainer(


Starting training... (this may take some time)


`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.


Step,Training Loss,Validation Loss
1000,0.43,0.641655
2000,0.3817,0.617315
3000,0.3841,0.612183
4000,0.3664,0.606826
5000,0.371,0.603034
6000,0.3869,0.599515
7000,0.3735,0.594842
8000,0.3577,0.594487
9000,0.3681,0.592354
10000,0.3592,0.587957


Step,Training Loss,Validation Loss
1000,0.43,0.641655
2000,0.3817,0.617315
3000,0.3841,0.612183
4000,0.3664,0.606826
5000,0.371,0.603034
6000,0.3869,0.599515
7000,0.3735,0.594842
8000,0.3577,0.594487
9000,0.3681,0.592354
10000,0.3592,0.587957


### Fine-tuning DialoGPT-medium on Mental Health Dataset (with Resume from Checkpoint)

This script loads tokenized train and evaluation datasets from disk, sets up the `Trainer` API with a custom data collator for causal language modeling, and fine-tunes `microsoft/DialoGPT-medium`.  
If a previous training checkpoint exists, training resumes from it; otherwise, it starts from scratch.  
The final model and tokenizer are saved for future use.


In [None]:
# Import libraries for model training, dataset handling, and tokenization
import os
import torch
from datasets import load_from_disk
from transformers import Trainer, TrainingArguments, AutoModelForCausalLM, AutoTokenizer

# Define paths for datasets, tokenizer, model output, and checkpoint
train_path = "/content/drive/MyDrive/AIchatbotmodels/train_dataset_tokenized"
test_path  = "/content/drive/MyDrive/AIchatbotmodels/test_dataset_tokenized"
tokenizer_path = "/content/drive/MyDrive/AIchatbotmodels/tokenizer"
output_dir = "/content/drive/MyDrive/AIchatbotmodels/mental_health_model"
checkpoint_path = os.path.join(output_dir, "checkpoint-15000")
model_name = "microsoft/DialoGPT-medium"  # Consistent with tokenization step

# Load tokenized training and evaluation datasets
print("Loading tokenized datasets...")
train_dataset = load_from_disk(train_path)
eval_dataset = load_from_disk(test_path)
print(f"Train samples: {len(train_dataset)} | Eval samples: {len(eval_dataset)}")

# Load tokenizer from saved path or pretrained model
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path if os.path.exists(tokenizer_path) else model_name)
# Ensure pad token exists for DialoGPT
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load the pretrained DialoGPT model and adjust token embeddings
model = AutoModelForCausalLM.from_pretrained(model_name)
model.resize_token_embeddings(len(tokenizer))

# Define training arguments for fine-tuning
training_args = TrainingArguments(
    output_dir=output_dir,
    overwrite_output_dir=False,  # Preserve existing checkpoints
    num_train_epochs=3,  # Number of training epochs
    per_device_train_batch_size=4,  # Batch size for training
    per_device_eval_batch_size=4,  # Batch size for evaluation
    warmup_steps=500,  # Warmup steps for learning rate scheduler
    logging_steps=100,  # Log training metrics every 100 steps
    save_steps=1000,  # Save checkpoint every 1000 steps
    eval_strategy="steps",  # Evaluate during training at specified steps
    eval_steps=1000,  # Evaluate every 1000 steps
    save_total_limit=2,  # Keep only the last 2 checkpoints
    report_to="none"  # Disable external logging (e.g., wandb, tensorboard)
)

# Define data collator for causal language modeling
def data_collator(features):
    input_ids = torch.tensor([f["input_ids"] for f in features], dtype=torch.long)
    attention_mask = torch.tensor([f["attention_mask"] for f in features], dtype=torch.long)
    labels = input_ids.clone()  # Labels are same as input_ids for causal LM
    return {
        "input_ids": input_ids,
        "attention_mask": attention_mask,
        "labels": labels
    }

# Initialize Trainer with model, arguments, datasets, collator, and tokenizer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    data_collator=data_collator,
    tokenizer=tokenizer,
)

# Check for checkpoint and resume training if available, otherwise start fresh
if os.path.exists(checkpoint_path):
    print(f"Resuming training from checkpoint: {checkpoint_path}")
    trainer.train(resume_from_checkpoint=checkpoint_path)
else:
    print("Starting training from scratch...")
    trainer.train()

# Save the final fine-tuned model and tokenizer
trainer.save_model(output_dir)
tokenizer.save_pretrained(output_dir)
print(f"✅ Model and tokenizer saved at: {output_dir}")

Loading tokenized datasets...
Train samples: 117330 | Eval samples: 13037


  trainer = Trainer(


Resuming training from checkpoint: /content/drive/MyDrive/AIchatbotmodels/mental_health_model/checkpoint-15000


There were missing keys in the checkpoint model loaded: ['lm_head.weight'].
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.


Step,Training Loss,Validation Loss
16000,0.1179,0.11539
17000,0.1171,0.115117
18000,0.1206,0.115117


Step,Training Loss,Validation Loss
16000,0.1179,0.11539
17000,0.1171,0.115117
18000,0.1206,0.115117


In [2]:
# Install faiss-gpu-cu12 for GPU-accelerated similarity search
!pip install faiss-gpu-cu12

# Install gTTS for text-to-speech functionality
!pip install gtts

# Install mpg123 for audio playback support
!apt-get install -y mpg123

# Install SpeechRecognition and pydub for speech recognition and audio processing
!pip install SpeechRecognition pydub

Collecting faiss-gpu-cu12
  Downloading faiss_gpu_cu12-1.11.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Collecting numpy<2 (from faiss-gpu-cu12)
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
Downloading faiss_gpu_cu12-1.11.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (48.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.0/48.0 MB[0m [31m14.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.3/18.3 MB[0m [31m68.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: numpy, faiss-gpu-cu12
  Attempting uninstall: numpy
    Found existing installation: numpy 2.0.2
    Uninstalling numpy-

### Setup Models, RAG, and Prompt Templates (Improved Balance & Variety)

This cell initializes and configures all the core components for our mental health chatbot:

1. **Device Setup**  
   Detects whether to run on GPU (`cuda`) or CPU, improving performance when available.

2. **Model Loading**
   - **Fine-tuned Chat Model** – Loads a previously fine-tuned `DialoGPT-medium` model for conversational responses.
   - **Emotion Detection Model** – Uses `j-hartmann/emotion-english-distilroberta-base` to classify emotions (e.g., sadness, joy, anger).
   - **Sentiment Analysis Model** – Uses `cardiffnlp/twitter-roberta-base-sentiment-latest` to detect overall sentiment (positive, neutral, negative).
   - **ABSA Model** – Loads an aspect-based sentiment analysis model (`yangheng/deberta-v3-base-absa-v1.1`) to detect opinions toward specific topics.
   - **Embedding Model** – Loads `all-MiniLM-L6-v2` from SentenceTransformers for semantic similarity and retrieval.

3. **Speech Output**  
   - Imports **gTTS** and `IPython.display` for converting chatbot replies into spoken audio (a `speak()` helper can be defined if needed).

4. **Unsafe Keyword Detection**  
   - Defines a set of crisis-related phrases (e.g., *"suicide"*, *"want to die"*) and encodes them for similarity checks to trigger appropriate safeguards.

5. **Knowledge Base for Retrieval-Augmented Generation (RAG)**  
   - Loads a knowledge base from an Excel file containing empathetic and supportive advice.
   - Encodes documents into normalized embeddings and stores them in a **FAISS index** for fast semantic search.
   - Provides `retrieve_docs()` to fetch top-k relevant knowledge snippets for a given query.

6. **Prompt Construction with Style Variety**  
   - Defines **response styles**: *empathetic*, *informative*, *reflective*, and *closing*.
   - Avoids repeating the same style consecutively and rotates empathy phrases to keep interactions natural.
   - Incorporates retrieved context, recent chat history, and detected emotion into the final prompt passed to the chat model.

This setup ensures the chatbot:
- Can interpret emotional tone
- Retrieves relevant and supportive context
- Avoids repetitive responses
- Maintains a mix of empathy, practical guidance, and engagement


In [4]:
# Import libraries for model loading, embeddings, search, and audio processing
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoModelForCausalLM, pipeline
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
import pandas as pd
import random
from gtts import gTTS
import IPython.display as ipd

# Set device to CUDA if GPU is available, else CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# --- Load Trained DialoGPT Model ---
chat_model_path = "/content/drive/MyDrive/AIchatbotmodels/mental_health_model/checkpoint-18000"
chat_tokenizer = AutoTokenizer.from_pretrained(chat_model_path)
chat_model = AutoModelForCausalLM.from_pretrained(chat_model_path).to(device)

# --- Load Emotion Detection Model ---
emotion_model_name = "j-hartmann/emotion-english-distilroberta-base"
emotion_tokenizer = AutoTokenizer.from_pretrained(emotion_model_name)
emotion_model = AutoModelForSequenceClassification.from_pretrained(emotion_model_name).to(device)
emotion_pipeline = pipeline(
    "text-classification",
    model=emotion_model,
    tokenizer=emotion_tokenizer,
    device=0 if torch.cuda.is_available() else -1,
    return_all_scores=True
)

# --- Load Sentiment Model ---
sentiment_model_name = "cardiffnlp/twitter-roberta-base-sentiment-latest"
sentiment_tokenizer = AutoTokenizer.from_pretrained(sentiment_model_name)
sentiment_model = AutoModelForSequenceClassification.from_pretrained(sentiment_model_name).to(device)
sentiment_pipeline = pipeline(
    "sentiment-analysis",
    model=sentiment_model,
    tokenizer=sentiment_tokenizer,
    device=0 if torch.cuda.is_available() else -1
)

# --- Load ABSA Model ---
absa_model_name = "yangheng/deberta-v3-base-absa-v1.1"
absa_tokenizer = AutoTokenizer.from_pretrained(absa_model_name)
absa_model = AutoModelForSequenceClassification.from_pretrained(absa_model_name).to(device)
absa_pipeline = pipeline(
    "text-classification",
    model=absa_model,
    tokenizer=absa_tokenizer,
    device=0 if torch.cuda.is_available() else -1
)

# --- Embedding Model ---
embedding_model = SentenceTransformer("all-MiniLM-L6-v2", device=device)

# --- Unsafe Keyword List ---
unsafe_keywords = [
    "suicide", "kill myself", "self harm", "hurt myself",
    "end my life", "overdose", "cutting", "hang myself",
    "can't go on", "want to die", "give up on life",
    "life is pointless", "I see no future", "end it all"
]
# Encode unsafe keywords for similarity detection
unsafe_embeddings = embedding_model.encode(unsafe_keywords, convert_to_tensor=True)

# --- Expanded Knowledge Base RAG ---
# Load knowledge base from Excel file
rag_df = pd.read_excel("/content/drive/MyDrive/AIchatbotmodels/RAG_Knowledge_Base_WithID.xlsx")
documents = rag_df["Knowledge Entry"].tolist()

# Encode and normalize document embeddings for RAG
doc_embeddings = embedding_model.encode(
    documents,
    convert_to_numpy=True,
    normalize_embeddings=True
)

# Create FAISS index for efficient similarity search
dimension = doc_embeddings.shape[1]
index = faiss.IndexFlatIP(dimension)
index.add(doc_embeddings)

# Function to retrieve top-k relevant documents for a query
def retrieve_docs(query, top_k=3):
    q_emb = embedding_model.encode(
        [query],
        convert_to_numpy=True,
        normalize_embeddings=True
    )
    distances, indices = index.search(q_emb, top_k)
    return [documents[idx] for idx in indices[0]]

# --- Prompt Template with Response-Type & Empathy Phrase Control ---
previous_response_type = None
previous_prompt_text = None
last_empathy_phrase = None

# Define empathy phrases for different emotions and intensities
empathy_phrases = {
    "sadness": {
        "very_high": [
            "This must feel overwhelming to carry.",
            "I’m so sorry you’re facing such heavy emotions.",
            "This sounds incredibly painful for you.",
            "My heart aches for what you’re going through."
        ],
        "high": [
            "I can see how difficult this must be for you.",
            "That sounds really tough to handle.",
            "It’s understandable to feel so down about this.",
            "This must be weighing heavily on you."
        ],
        "low": [
            "I hear you’re feeling a bit low.",
            "That sounds like it’s been tough.",
            "It’s okay to feel down right now.",
            "That must be a bit challenging."
        ]
    },
    "anger": {
        "very_high": [
            "This must be absolutely infuriating for you.",
            "I can feel how intense this frustration is.",
            "That sounds incredibly aggravating to deal with.",
            "Your anger is so understandable right now."
        ],
        "high": [
            "I get how frustrating this must be.",
            "That sounds really irritating.",
            "It’s understandable to feel upset about this.",
            "This must be tough to deal with."
        ],
        "low": [
            "I hear you’re a bit annoyed.",
            "That sounds a little frustrating.",
            "It’s okay to feel irritated.",
            "That must be mildly upsetting."
        ]
    },
    "fear": {
        "very_high": [
            "This must feel incredibly scary for you.",
            "I can see how overwhelming this worry is.",
            "That sounds deeply unsettling.",
            "Your fear is completely understandable."
        ],
        "high": [
            "That must be really worrying for you.",
            "I can see why this feels scary.",
            "It’s understandable to feel anxious.",
            "This must be tough to face."
        ],
        "low": [
            "That sounds a bit unsettling.",
            "I hear you’re feeling a little anxious.",
            "It’s okay to feel a bit worried.",
            "That must be slightly nerve-wracking."
        ]
    },
    "joy": {
        "very_high": [
            "That’s absolutely fantastic to hear!",
            "I’m so thrilled for you!",
            "What an incredible moment!",
            "This must feel amazing!"
        ],
        "high": [
            "That’s wonderful to hear!",
            "I’m really happy for you!",
            "What a great moment!",
            "This must feel so good!"
        ],
        "low": [
            "That’s nice to hear!",
            "I’m glad you’re feeling good.",
            "Sounds like a happy moment.",
            "That’s great to know!"
        ]
    },
    "love": {
        "very_high": [
            "That’s so beautiful and heartwarming!",
            "Your love is truly special!",
            "What an incredible feeling to share!",
            "This must feel so warm and wonderful!"
        ],
        "high": [
            "That’s really sweet to hear!",
            "I’m happy for your connection!",
            "What a lovely feeling!",
            "This must feel so special!"
        ],
        "low": [
            "That’s sweet to hear.",
            "I’m glad you’re feeling this warmth.",
            "Sounds like a nice moment.",
            "That’s lovely to know."
        ]
    },
    "surprise": {
        "very_high": [
            "Wow, that must be a huge shock!",
            "I can’t believe how surprising this is!",
            "That sounds like an incredible twist!",
            "What a wild moment for you!"
        ],
        "high": [
            "That’s really surprising!",
            "I can see why that caught you off guard!",
            "What an unexpected turn!",
            "That must be quite a shock!"
        ],
        "low": [
            "That’s a bit surprising!",
            "Sounds like that was unexpected.",
            "I hear it caught you off guard.",
            "That must be a little startling."
        ]
    },
    "neutral": {
        "very_high": ["I hear you."],
        "high": ["I understand."],
        "low": ["Got it."]
    }
}

# Function to select a unique empathy phrase based on emotion and intensity
def get_unique_empathy_phrase(emotion, intensity="low"):
    if emotion not in empathy_phrases:
        emotion = "neutral"
    phrases = empathy_phrases[emotion][intensity]
    available_phrases = [p for p in phrases if p != last_empathy_phrase]
    if not available_phrases:
        available_phrases = phrases
    phrase = random.choice(available_phrases)
    global last_empathy_phrase
    last_empathy_phrase = phrase
    return phrase

# Function to build a prompt for the chatbot
def build_prompt(user_query, retrieved_docs, emotion=None, emotion_score=0.0, prev_user_messages=None, aspects=None, intent=None):
    """
    Build a concise prompt focusing on history, input, and primary aspect.
    - Uses short style/empathy instructions.
    - Adapts to emotion intensity and intent (e.g., closing).
    """
    global previous_response_type, previous_prompt_text

    # Define possible response styles
    response_styles = ["empathetic", "informative", "reflective", "closing"]
    if intent == "closing":
        chosen_style = "closing"
    else:
        possible_styles = [s for s in response_styles if s != previous_response_type and s != "closing"]
        chosen_style = random.choice(possible_styles)
    previous_response_type = chosen_style

    # Build prompt components
    history_line = (
        f"History:\n" + "\n".join([f"User: {msg}" for msg in prev_user_messages[-3:]]) + "\n"
        if prev_user_messages else "No history.\n"
    )
    primary_aspect = aspects[0]['aspect'] if aspects else "situation"
    aspect_line = f"Primary aspect: {primary_aspect} ({aspects[0]['sentiment'] if aspects else 'neutral'})."
    emotion_line = f"Emotion: {emotion} (intensity: {emotion_score:.2f})."

    # Determine empathy intensity
    intensity = "very_high" if emotion_score > 0.9 else "high" if emotion_score > 0.7 else "low"

    # Style-specific instructions
    if chosen_style == "empathetic":
        empathy_phrase = get_unique_empathy_phrase(emotion, intensity)
        style_instruction = f"Start with '{empathy_phrase}' Address the primary aspect in 1-2 sentences using history."
    elif chosen_style == "informative":
        style_instruction = f"Give a 1-2 sentence tip about the primary aspect using history."
    elif chosen_style == "closing":
        style_instruction = f"Acknowledge the user's intent to close or accept advice in 1 sentence."
    else:  # reflective
        style_instruction = f"Ask a 1-sentence question or summarize the primary aspect using history."

    prompt = (
        f"{history_line}{emotion_line}\n{aspect_line}\nInput: {user_query}\n"
        f"Instructions: {style_instruction}\nAssistant:"
    )

    # Avoid duplicate prompts
    if prompt == previous_prompt_text:
        prompt += " Provide a unique response."
    previous_prompt_text = prompt

    return prompt

tokenizer_config.json:   0%|          | 0.00/294 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/329M [00:00<?, ?B/s]

Device set to use cuda:0


config.json:   0%|          | 0.00/929 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/329M [00:00<?, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/501M [00:00<?, ?B/s]

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


tokenizer_config.json:   0%|          | 0.00/372 [00:00<?, ?B/s]

spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/501M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/18.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/156 [00:00<?, ?B/s]



config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/738M [00:00<?, ?B/s]

Device set to use cuda:0


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

### Response Generation & Safety Filtering (ABSA-aware + gTTS TTS)

This cell defines the chatbot’s **core reply engine**, combining language generation, emotion detection, safety checks, retrieval, and natural-sounding delivery.

**Key features:**
1. **Emotion & Sentiment Detection** – Pipelines classify the user’s emotional tone and overall sentiment, then reconcile them for consistency.  
2. **Aspect-Based Sentiment Analysis (ABSA)** – Detects fine-grained opinions toward specific topics.  
3. **Safety Filtering** – Uses semantic similarity to detect unsafe or crisis-related input and trigger supportive fallback responses.  
4. **Empathy & Variety** –  
   - Inserts empathetic phrases aligned with detected emotion.  
   - Prevents reuse of identical empathy/advice lines.  
   - Alternates between offering advice and asking reflective questions.  
5. **Prompt Construction & RAG** – Retrieves top-k relevant knowledge snippets and builds a context-rich prompt for the model.  
6. **Text Generation** – Generates replies with the main chat model, maintaining short-term conversation history.  
7. **Duplicate Response Avoidance** – Detects and rephrases overly similar outputs to keep interactions fresh.  
8. **Text-to-Speech Output** – Converts the final reply to audio with **gTTS** and plays it inline in Colab.  

This setup produces responses that are **empathetic, varied, context-aware, and optionally spoken aloud**.


In [9]:
# Import libraries for string manipulation, similarity checking, and embeddings
import random
import re
from difflib import SequenceMatcher
import torch
from sentence_transformers import util

# Global variables for tracking conversation state
_last_empathy_line = None
_last_quote_turn = -3
_turn_counter = 0
_last_advice_line = None
_last_question_line = None
_last_response_category = None
_last_category_turn = -5
_last_full_response = None
_last_structure = None
_last_hear_you_turn = -5

# --- Helper Detectors ---
# Detect emotion in user input, considering context for longer inputs
def detect_emotion(text, context=None, min_confidence=0.4):
    # Use only current input for short texts to avoid history bias
    input_text = text if len(text.split()) < 10 else f"{context} {text}" if context else text
    try:
        results = emotion_pipeline(input_text)
        print(f"DEBUG: Emotion pipeline output: {results}")
        if isinstance(results, list) and results and isinstance(results[0], list):
            top_emotion = max(results[0], key=lambda x: x['score'])
            if top_emotion['score'] < min_confidence:
                return "neutral", round(top_emotion['score'], 2)
            return top_emotion['label'], round(top_emotion['score'], 2)
        else:
            print("⚠️ Unexpected emotion pipeline output format")
            return "neutral", 0.0
    except Exception as e:
        print(f"⚠️ Emotion detection error: {e}")
        return "neutral", 0.0

# Detect sentiment in user input
def detect_sentiment(text):
    try:
        result = sentiment_pipeline(text)[0]
        print(f"DEBUG: Sentiment pipeline output: {result}")
        return result['label'].lower(), round(result['score'], 2)
    except Exception as e:
        print(f"⚠️ Sentiment detection error: {e}")
        return "neutral", 0.0

# Detect if user intends to close the conversation
def detect_intent(text):
    """Detect if the user intends to close or acknowledge advice."""
    closure_cues = r"\b(thanks|thank you|sure|okay|ok|will do|got it|appreciate|alright|gotcha|great)\b"
    if re.search(closure_cues, text.lower()):
        return "closing"
    return None

# Perform aspect-based sentiment analysis (ABSA)
def detect_absa(text, context=None):
    try:
        def normalize_aspect(aspect):
            synonym_map = {
                "test": "exam",
                "marks": "exam",
                "grades": "exam",
                "score": "exam",
                "studies": "study",
                "sickness": "sick",
                "illness": "sick",
                "disease": "sick"
            }
            return synonym_map.get(aspect.lower(), aspect.lower())

        def extract_aspects(text):
            words = text.lower().split()
            potential_aspects = []
            aspect_keywords = [
                'girlfriend', 'boyfriend', 'partner', 'husband', 'wife',
                'relationship', 'marriage', 'breakup', 'divorce',
                'family', 'mother', 'father', 'parent', 'sibling', 'brother', 'sister',
                'friend', 'friendship',
                'job', 'career', 'work', 'boss', 'manager', 'colleague',
                'study', 'school', 'college', 'university', 'exam', 'test', 'marks', 'grades',
                'depression', 'depressed', 'anxiety', 'stressed', 'stress',
                'fear', 'worry', 'lonely', 'loneliness', 'sad', 'sadness',
                'angry', 'anger', 'frustrated', 'confused', 'hopeless',
                'health', 'illness', 'sick', 'tired', 'fatigue',
                'loss', 'grief', 'death', 'trauma', 'change', 'moving', 'new place'
            ]
            for word in words:
                normalized = normalize_aspect(word)
                if normalized in aspect_keywords:
                    potential_aspects.append(normalized)
            text_lower = text.lower()
            if 'break up' in text_lower:
                potential_aspects.append('breakup')
            if 'lost job' in text_lower:
                potential_aspects.append('job loss')
            if 'best friend' in text_lower:
                potential_aspects.append('friend')
            if 'family issue' in text_lower:
                potential_aspects.append('family')
            if 'love life' in text_lower:
                potential_aspects.append('relationship')
            if 'work stress' in text_lower:
                potential_aspects.append('stress')
            if 'career change' in text_lower:
                potential_aspects.append('career')
            if 'mental health' in text_lower:
                potential_aspects.append('mental health')
            if not potential_aspects:
                potential_aspects = ['situation']
            prioritized = [a for a in potential_aspects if a not in ['sad', 'sadness', 'anger', 'fear', 'joy', 'love', 'surprise']]
            return list(set(prioritized if prioritized else potential_aspects[:1]))

        potential_aspects = extract_aspects(text)
        results = []
        for aspect in potential_aspects:
            absa_input = f"[CLS] {aspect} [SEP] {text} [SEP]"  # Use only current text
            pipeline_output = absa_pipeline(absa_input)
            print(f"DEBUG: ABSA input for '{aspect}': {absa_input}")
            print(f"DEBUG: ABSA pipeline output: {pipeline_output}")
            if isinstance(pipeline_output, list) and pipeline_output and isinstance(pipeline_output[0], dict):
                result = pipeline_output[0]
                sentiment = result.get('label', 'neutral').lower()
                confidence = round(result.get('score', 0.0), 2)
                if confidence >= 0.5:  # Filter low-confidence aspects
                    results.append({
                        'aspect': aspect,
                        'sentiment': sentiment,
                        'confidence': confidence
                    })
            else:
                results.append({
                    'aspect': aspect,
                    'sentiment': 'neutral',
                    'confidence': 0.0
                })
        # Sort by confidence, preferring specific aspects and negative sentiment for sadness
        results = sorted(
            results,
            key=lambda x: (x['confidence'], x['aspect'] != 'situation', x['sentiment'] == 'negative'),
            reverse=True
        )
        print(f"DEBUG: ABSA final results: {results}")
        return results if results else [{'aspect': 'situation', 'sentiment': 'neutral', 'confidence': 0.0}]
    except Exception as e:
        print(f"⚠️ ABSA error: {e}")
        return [{'aspect': 'situation', 'sentiment': 'neutral', 'confidence': 0.0}]

# Reconcile emotion and sentiment for consistency
def reconcile_emotion_sentiment(emotion, sentiment, text):
    # Prioritize intent for short texts or specific cues
    if len(text.split()) < 10:
        if re.search(r"\b(thanks|thank you|sure|okay|ok|will do|got it|appreciate|alright|gotcha|great)\b", text.lower()):
            return "joy" if sentiment == "positive" else "neutral"
        if re.search(r"\b(what should I do|how can I|how do I|any tips|suggest|recommend)\b", text.lower()):
            return "neutral"
    if re.search(r"\b(miss|long for|yearn|depressed|depression)\b", text.lower()):
        return "sadness"
    if sentiment == "negative" and emotion in ["joy", "love", "surprise"]:
        return "sadness"
    if sentiment == "positive" and emotion in ["sadness", "fear", "anger"]:
        return "joy"
    return emotion

# Check if input contains unsafe content
def is_safe(text, threshold=0.65):
    try:
        text_emb = embedding_model.encode(text, convert_to_tensor=True)
        cos_sim = util.pytorch_cos_sim(text_emb, unsafe_embeddings)
        return torch.max(cos_sim).item() < threshold
    except Exception:
        return True

# Paraphrase a snippet of text for inclusion in responses
def paraphrase_snippet(text, max_len=10):
    words = text.split()
    snippet = " ".join(words[:max_len])
    if len(words) > max_len:
        snippet += "..."
    snippet = snippet.replace("I ", "you ").replace("my ", "your ")
    return snippet

# Detect if user input is requesting advice
def is_advice_question(text):
    """Detect if the user input is requesting advice."""
    advice_cues = r"\b(what should I do|how can I|how do I|any tips|suggest|recommend|advice|help me)\b"
    return bool(re.search(advice_cues, text.lower()))

# Post-process response to match desired style
def post_process_style(response, response_style, primary_aspect, emotion, emotion_score, prev_user_messages=None, intent=None):
    intensity = "very_high" if emotion_score > 0.9 else "high" if emotion_score > 0.7 else "low"
    context = prev_user_messages[-1] if prev_user_messages else ""

    if intent == "closing":
        closing_templates = {
            "exam": f"Glad I could help with your {primary_aspect} concerns!",
            "study": f"Great to support your {primary_aspect} efforts!",
            "breakup": f"Glad I could support you through your {primary_aspect} concerns!",
            "default": "Glad I could help you out!"
        }
        return closing_templates.get(primary_aspect, closing_templates["default"])

    if response_style == "empathetic":
        if not any(response.startswith(p) for p in empathy_phrases.get(emotion, {}).get(intensity, [])):
            empathy_phrase = get_unique_empathy_phrase(emotion, intensity)
            if primary_aspect in ["exam", "marks", "test", "grades"]:
                return f"{empathy_phrase} It sounds tough to face your {primary_aspect} challenges."
            elif primary_aspect == "breakup":
                return f"{empathy_phrase} It’s really tough dealing with this {primary_aspect}."
            return f"{empathy_phrase} It’s hard dealing with this {primary_aspect}."
    elif response_style == "informative":
        if not re.search(r"(try|consider|might help)", response.lower()):
            advice_templates = {
                "exam": "Try a study schedule to boost your exam confidence.",
                "marks": "Review key concepts to improve your grades.",
                "test": "Focus on one topic at a time for your test.",
                "study": "Break study sessions into small, manageable chunks.",
                "breakup": "Talking to a trusted friend might help you process your feelings.",
                "default": "A short break might lift your mood."
            }
            return advice_templates.get(primary_aspect, advice_templates["default"])
    elif response_style == "reflective":
        if not (re.search(r"\?$", response) or re.search(r"(feeling|seems|sounds)", response.lower())):
            question_templates = {
                "exam": f"How are you feeling about your {primary_aspect} now?",
                "marks": f"What’s been toughest about your {primary_aspect}?",
                "test": f"How’s your confidence after this {primary_aspect}?",
                "study": f"What’s challenging about your {primary_aspect}?",
                "breakup": f"How are you coping with your {primary_aspect} right now?",
                "default": f"How are you managing with this {primary_aspect}?"
            }
            return question_templates.get(primary_aspect, question_templates["default"])
    return response

# Add empathy to responses based on emotion and style
def add_empathy_filter(response, user_emotion, sentiment_score, emotion_score, user_input=None, response_style="empathetic", intent=None):
    global _last_empathy_line, _last_quote_turn, _turn_counter, _last_hear_you_turn
    if response_style != "empathetic" or not user_emotion or intent == "closing":
        return response
    intensity = "very_high" if emotion_score > 0.9 else "high" if emotion_score > 0.7 else "low"
    empathy_lines = empathy_phrases.get(user_emotion.lower(), {"very_high": ["I understand."], "high": ["I understand."], "low": ["I understand."]})[intensity]
    available_lines = [e for e in empathy_lines if e != _last_empathy_line] or empathy_lines
    empathy_line = random.choice(available_lines)
    _last_empathy_line = empathy_line
    if user_input and random.random() < 0.2 and (_turn_counter - _last_quote_turn >= 3):
        empathy_line += f" You mentioned: \"{paraphrase_snippet(user_input)}\"."
        _last_quote_turn = _turn_counter
    return f"{empathy_line} {response}".strip()

# Enrich responses with advice or questions based on context
def enrich_with_advice_and_questions(bot_reply, user_emotion, sentiment_label, advice_priority=False, user_input=None, response_style="empathetic", aspects=None, intent=None):
    global _last_advice_line, _last_question_line, _last_response_category, _turn_counter, _last_category_turn, _last_structure
    if intent == "closing":
        return bot_reply
    advice_templates = {
        "sadness": {
            "exam": ["Try a study schedule to boost your exam confidence.", "A short break might help you reset."],
            "marks": ["Review key concepts to improve your grades.", "Try a study schedule for better results."],
            "test": ["Focus on one topic at a time for your test.", "A tutor might help with tough concepts."],
            "study": ["Break study sessions into small, manageable chunks.", "Set small study goals to stay motivated."],
            "breakup": ["Talking to a trusted friend might help you process your feelings.", "Journaling your feelings can offer relief."],
            "default": ["A favorite activity might lift your mood.", "Journaling can help process your feelings."]
        },
        "fear": {
            "exam": ["Practice deep breathing to reduce exam anxiety.", "Break study sessions into smaller chunks."],
            "study": ["Try short study sessions to ease anxiety.", "Use flashcards to build confidence."],
            "default": ["Try the 5-4-3-2-1 grounding technique.", "Writing down worries can help."]
        },
        "anger": {
            "work": ["Take a short break to release work tension.", "Write a letter you don’t send to vent."],
            "default": ["A short walk can release tension.", "Try a creative activity to channel energy."]
        }
    }
    question_templates = {
        "sadness": {
            "exam": ["How are you feeling about your exam now?", "What study strategies have you tried?"],
            "marks": ["What’s been hardest about your grades?", "Have you found any study methods that work?"],
            "test": ["How’s your confidence after this test?", "What’s been toughest about preparing?"],
            "study": ["What’s challenging about your study routine?", "What study methods have you tried?"],
            "breakup": ["How are you coping with your breakup right now?", "What’s been helping you through this time?"],
            "default": ["What’s been challenging lately?", "What helps you feel better?"]
        },
        "fear": {
            "exam": ["What’s worrying you most about your exams?", "Have you tried any study strategies?"],
            "study": ["What’s making studying feel overwhelming?", "What helps you stay calm while studying?"],
            "default": ["What’s the main source of your worry?", "What might help you feel calmer?"]
        },
        "anger": {
            "work": ["What’s frustrating you most at work?", "What helps you unwind after a tough day?"],
            "default": ["What’s the main source of your frustration?", "What helps you calm down?"]
        },
        "default": {
            "default": ["What’s been challenging lately?", "What helps you cope?"]
        }
    }
    if response_style == "empathetic":
        return bot_reply
    primary_aspect = aspects[0]['aspect'] if aspects else "default"
    category = "advice" if response_style == "informative" or advice_priority else "question"
    if category == "advice":
        aspect_templates = advice_templates.get(user_emotion, advice_templates["sadness"]).get(primary_aspect, advice_templates["sadness"]["default"])
        possible_advice = [a for a in aspect_templates if a != _last_advice_line] or aspect_templates
        advice_line = random.choice(possible_advice)
        bot_reply = advice_line
        _last_advice_line = advice_line
    elif category == "question":
        aspect_templates = question_templates.get(user_emotion, question_templates["default"]).get(primary_aspect, question_templates["default"]["default"])
        possible_questions = [q for q in aspect_templates if q != _last_question_line] or aspect_templates
        question_line = random.choice(possible_questions)
        bot_reply = question_line
        _last_question_line = question_line
    _last_structure = f"{response_style}+{_last_category_turn}"
    _last_response_category = category
    _last_category_turn = _turn_counter
    return bot_reply

# Main function to generate chatbot responses
def generate_response(user_input, chat_history_ids=None, prev_user_messages=None, max_history_tokens=512):
    global _turn_counter, _last_full_response
    _turn_counter += 1
    context = "\n".join(prev_user_messages[-3:]) if prev_user_messages else ""
    emotion_label, emotion_score = detect_emotion(user_input, context=context)
    sentiment_label, sentiment_score = detect_sentiment(user_input)
    intent = detect_intent(user_input)
    absa_results = detect_absa(user_input)  # Use only current input
    emotion_label = reconcile_emotion_sentiment(emotion_label, sentiment_label, user_input)
    if not is_safe(user_input) or (prev_user_messages and any(not is_safe(msg) for msg in prev_user_messages)):
        safety_msg = (
            "I’m concerned about your safety. Please consider talking to a mental health professional "
            "or contacting a helpline for immediate support."
        )
        return safety_msg, chat_history_ids, emotion_label, emotion_score, sentiment_label, sentiment_score, absa_results
    try:
        retrieved_docs = retrieve_docs(user_input)
    except Exception:
        retrieved_docs = []
    final_prompt = build_prompt(
        user_query=user_input,
        retrieved_docs=retrieved_docs,
        emotion=emotion_label,
        emotion_score=emotion_score,
        prev_user_messages=prev_user_messages,
        aspects=absa_results,
        intent=intent
    )
    input_encodings = chat_tokenizer(final_prompt + chat_tokenizer.eos_token, return_tensors="pt", truncation=True, padding=True).to(device)
    bot_input_ids = input_encodings["input_ids"]
    attention_mask = input_encodings["attention_mask"]
    if chat_history_ids is not None:
        bot_input_ids = torch.cat([chat_history_ids, bot_input_ids], dim=-1)
        attention_mask = torch.cat([chat_history_ids.new_ones(chat_history_ids.shape), attention_mask], dim=-1)
        if bot_input_ids.shape[-1] > max_history_tokens:
            bot_input_ids = bot_input_ids[:, -max_history_tokens:]
            attention_mask = attention_mask[:, -max_history_tokens:]
    chat_history_ids = chat_model.generate(
        bot_input_ids,
        attention_mask=attention_mask,
        max_new_tokens=100,  # Increased for more detailed responses
        pad_token_id=chat_tokenizer.pad_token_id,
        do_sample=True,
        top_p=0.9,
        top_k=50,
        temperature=0.85,
        no_repeat_ngram_size=3
    )
    bot_reply = chat_tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True).strip()
    # Strict fallback for truly unusable outputs
    primary_aspect = absa_results[0]['aspect'] if absa_results else "situation"
    if len(bot_reply.split()) < 2 or (len(bot_reply.split()) < 4 and not re.search(r"[.!?]$", bot_reply)):
        intensity = "very_high" if emotion_score > 0.9 else "high" if emotion_score > 0.7 else "low"
        context_snippet = paraphrase_snippet(prev_user_messages[-1] if prev_user_messages else user_input)
        if intent == "closing":
            closing_templates = {
                "exam": f"Glad I could help with your {primary_aspect} concerns!",
                "study": f"Great to support your {primary_aspect} efforts!",
                "breakup": f"Glad I could support you through your {primary_aspect} concerns!",
                "default": "Glad I could help you out!"
            }
            bot_reply = closing_templates.get(primary_aspect, closing_templates["default"])
        elif previous_response_type == "empathetic":
            empathy_phrase = get_unique_empathy_phrase(emotion_label, intensity)
            if primary_aspect in ["exam", "marks", "test", "grades"]:
                bot_reply = f"{empathy_phrase} Your {primary_aspect} challenges sound tough."
            elif primary_aspect == "breakup":
                bot_reply = f"{empathy_phrase} It’s really tough dealing with this {primary_aspect}."
            else:
                bot_reply = f"{empathy_phrase} It’s hard dealing with {primary_aspect}."
        elif previous_response_type == "informative":
            advice_templates = {
                "exam": "Try a study schedule to boost exam confidence.",
                "marks": "Review key concepts to improve your grades.",
                "test": "Focus on one topic at a time for your test.",
                "study": "Break study sessions into small, manageable chunks.",
                "breakup": "Talking to a trusted friend might help you process your feelings.",
                "default": f"A short break might lift your mood."
            }
            bot_reply = advice_templates.get(primary_aspect, advice_templates["default"])
        else:  # reflective
            question_templates = {
                "exam": f"How are you feeling about your {primary_aspect} now?",
                "marks": f"What’s been toughest about your {primary_aspect}?",
                "test": f"How’s your confidence after this {primary_aspect}?",
                "study": f"What’s challenging about your {primary_aspect}?",
                "breakup": f"How are you coping with your {primary_aspect} right now?",
                "default": f"How are you managing with {context_snippet}?"
            }
            bot_reply = question_templates.get(primary_aspect, question_templates["default"])
    # Apply empathy and style post-processing
    bot_reply = add_empathy_filter(
        bot_reply, emotion_label, sentiment_score, emotion_score, user_input=user_input, response_style=previous_response_type, intent=intent
    )
    bot_reply = enrich_with_advice_and_questions(
        bot_reply, emotion_label, sentiment_label, advice_priority=is_advice_question(user_input),
        user_input=user_input, response_style=previous_response_type, aspects=absa_results, intent=intent
    )
    bot_reply = post_process_style(
        bot_reply, previous_response_type, primary_aspect, emotion_label, emotion_score, prev_user_messages, intent
    )
    if _last_full_response and SequenceMatcher(None, bot_reply, _last_full_response).ratio() > 0.85:
        bot_reply = f"Here’s another thought: {bot_reply}"
    _last_full_response = bot_reply
    return bot_reply, chat_history_ids, emotion_label, emotion_score, sentiment_label, sentiment_score, absa_results


### Evaluation Framework: Core NLP Metrics & System Monitoring

This cell defines a **comprehensive evaluation system** for assessing the chatbot’s linguistic quality, emotional awareness, and performance.  

**Key components:**

1. **Modern NLP Metrics**  
   - **Perplexity** – Evaluates fluency of generated responses using the fine-tuned model.  
   - **Semantic Diversity** – Uses embeddings to measure variety across responses.  
   - **Response Appropriateness** – Measures how well replies align with user inputs.  
   - **Conversational Depth** – Detects reasoning, nuance, and reflective complexity.  
   - **Emotional Intelligence** – Scores emotional awareness and empathy in responses.  

2. **Performance Metrics**  
   - **Response Time** – Tracks average latency per reply.  
   - **Response Length** – Monitors verbosity of generated answers.  
   - **Memory Usage** – Reports system memory consumption during evaluation.  

3. **Evaluator Class (`ImprovedMentalHealthEvaluator`)**  
   - Stores and manages conversation data.  
   - Provides `add_conversation()` to log user/bot turns.  
   - Runs `evaluate_comprehensive()` to calculate all metrics.  
   - Outputs results with interpretation guidelines via `display_results()`.  

**Interpretation Guidelines:**  
- **Perplexity**: <50 excellent, <100 good, <200 fair  
- **Semantic Diversity**: ≥0.30 indicates low repetition  
- **Appropriateness**: ≥0.70 suggests good contextual fit  
- **Conversational Depth**: ≥0.50 good, ≥0.70 excellent  
- **Emotional Intelligence**: ≥0.30 good, ≥0.50 excellent  

This framework ensures the chatbot is evaluated not only for **linguistic quality** but also for **empathy, depth, and resource efficiency**.  


In [7]:
# Import libraries for model evaluation, embeddings, and system monitoring
import torch
import numpy as np
import time
import psutil
import re
from transformers import AutoTokenizer, AutoModelForCausalLM
from sentence_transformers import SentenceTransformer, util
import warnings
warnings.filterwarnings('ignore')

# === MODERN NLP EVALUATION METRICS ===

# Calculate perplexity for generated responses
def calculate_perplexity(responses, model_path="/content/drive/MyDrive/AIchatbotmodels/mental_health_model/checkpoint-18000"):
    """Calculate perplexity for generated responses using a pretrained language model"""
    try:
        tokenizer = AutoTokenizer.from_pretrained(model_path)
        model = AutoModelForCausalLM.from_pretrained(model_path)
        model.eval()

        if not responses:
            return 0.0

        perplexities = []
        for response in responses:
            if not response.strip():
                continue
            inputs = tokenizer(response, return_tensors="pt", truncation=True, max_length=512)
            input_ids = inputs["input_ids"]

            with torch.no_grad():
                outputs = model(input_ids, labels=input_ids)
                loss = outputs.loss

            perplexity = torch.exp(loss).item()
            perplexities.append(perplexity)

        if not perplexities:
            return 0.0

        return np.mean(perplexities)
    except Exception as e:
        print(f"⚠️ Perplexity calculation failed: {e}")
        return 0.0

# Calculate semantic diversity using sentence embeddings
def calculate_semantic_similarity(generated_responses, model):
    """Calculate semantic diversity using sentence embeddings"""
    if not generated_responses or len(generated_responses) < 2:
        return 0.0

    try:
        embeddings = model.encode(generated_responses)
        similarities = []

        for i in range(len(embeddings)):
            for j in range(i+1, len(embeddings)):
                sim = util.pytorch_cos_sim(embeddings[i], embeddings[j]).item()
                similarities.append(sim)

        avg_similarity = np.mean(similarities)
        diversity_score = 1 - avg_similarity
        return max(0, diversity_score)

    except Exception as e:
        print(f"⚠️ Semantic similarity calculation failed: {e}")
        return 0.0

# Measure appropriateness of responses to user inputs
def calculate_response_appropriateness(user_inputs, bot_responses, model):
    """Measure how appropriate responses are to user inputs"""
    if not user_inputs or not bot_responses or len(user_inputs) != len(bot_responses):
        return 0.0

    try:
        appropriateness_scores = []

        for user_input, bot_response in zip(user_inputs, bot_responses):
            user_emb = model.encode([user_input])
            bot_emb = model.encode([bot_response])

            similarity = util.pytorch_cos_sim(user_emb, bot_emb).item()
            appropriateness_scores.append(similarity)

        return np.mean(appropriateness_scores)

    except Exception as e:
        print(f"⚠️ Response appropriateness calculation failed: {e}")
        return 0.0

# Evaluate conversational depth and complexity
def calculate_conversational_depth(responses):
    """Evaluate conversational depth and complexity"""
    if not responses:
        return 0.0

    depth_indicators = [
        r'\b(because|since|therefore|however|although|while)\b',
        r'\b(feel|think|believe|consider|experience)\b',
        r'\b(might|could|perhaps|possibly|sometimes)\b',
        r'\b(relationship|pattern|connection|underlying)\b'
    ]

    scores = []
    for response in responses:
        response_lower = response.lower()
        depth_score = sum(1 for pattern in depth_indicators
                         if re.search(pattern, response_lower))

        word_count = len(response.split())
        normalized_score = min(1.0, depth_score / max(1, word_count / 15))
        scores.append(normalized_score)

    return np.mean(scores)

# Evaluate emotional intelligence in responses
def calculate_emotional_intelligence(responses):
    """Evaluate emotional intelligence in responses"""
    if not responses:
        return 0.0

    ei_indicators = [
        r'\b(feel|feeling|emotion|emotional|mood)\b',
        r'\b(angry|sad|happy|anxious|frustrated|excited|worried)\b',
        r'\b(makes you|causes you|leads you|helps you)\b.*\bfeel\b',
        r'\b(cope|manage|handle|deal with)\b.*\b(feeling|emotion)\b'
    ]

    scores = []
    for response in responses:
        response_lower = response.lower()
        ei_score = sum(1 for pattern in ei_indicators
                      if re.search(pattern, response_lower))

        word_count = len(response.split())
        normalized_score = min(1.0, ei_score / max(1, word_count / 12))
        scores.append(normalized_score)

    return np.mean(scores)

# === COMPREHENSIVE EVALUATION SYSTEM ===

# Class to manage and evaluate chatbot conversations
class ImprovedMentalHealthEvaluator:
    def __init__(self):
        try:
            self.semantic_model = SentenceTransformer("all-MiniLM-L6-v2")
            print("✅ Semantic similarity model loaded")
        except Exception as e:
            print(f"⚠️ Could not load semantic model: {e}")
            self.semantic_model = None

        self.reset_metrics()

    # Reset stored conversation data
    def reset_metrics(self):
        self.user_inputs = []
        self.generated_responses = []
        self.reference_responses = []
        self.response_times = []

    # Add a conversation turn for evaluation
    def add_conversation(self, user_input, bot_response, reference_response=None,
                        response_time=0.0):
        """Add a conversation turn for evaluation"""
        self.user_inputs.append(user_input)
        self.generated_responses.append(bot_response)
        self.reference_responses.append(reference_response or "")
        self.response_times.append(response_time)

    # Run comprehensive evaluation
    def evaluate_comprehensive(self):
        """Run comprehensive evaluation with core and performance metrics"""
        if not self.generated_responses:
            print("❌ No conversations to evaluate!")
            return {}

        print(f"📊 Evaluating {len(self.generated_responses)} conversations...")

        results = {}

        # === CORE METRICS ===
        print("🔄 Calculating core metrics...")

        results['perplexity'] = calculate_perplexity(self.generated_responses)

        if self.semantic_model:
            results['semantic_diversity'] = calculate_semantic_similarity(
                self.generated_responses, self.semantic_model
            )
            results['response_appropriateness'] = calculate_response_appropriateness(
                self.user_inputs, self.generated_responses, self.semantic_model
            )
        else:
            results['semantic_diversity'] = 0.0
            results['response_appropriateness'] = 0.0

        results['conversational_depth'] = calculate_conversational_depth(
            self.generated_responses
        )
        results['emotional_intelligence'] = calculate_emotional_intelligence(
            self.generated_responses
        )

        # === PERFORMANCE METRICS ===
        print("🔄 Calculating performance metrics...")
        results['avg_response_time'] = np.mean(self.response_times) if self.response_times else 0.0
        results['avg_response_length'] = np.mean([len(resp.split()) for resp in self.generated_responses])

        process = psutil.Process()
        results['memory_usage_gb'] = process.memory_info().rss / 1024 / 1024 / 1024

        return results

    # Display evaluation results with interpretation
    def display_results(self, results):
        """Display evaluation results with proper interpretation"""
        print("\n" + "="*60)
        print("📋 MENTAL HEALTH CHATBOT EVALUATION")
        print("="*60)

        print("\n🎯 SEMANTIC & LINGUISTIC QUALITY:")
        print(f"   Perplexity (lower is better):     {results.get('perplexity', 0.0):.2f}")
        print(f"   Semantic Diversity:               {results.get('semantic_diversity', 0.0):.4f}")
        print(f"   Response Appropriateness:         {results.get('response_appropriateness', 0.0):.4f}")
        print(f"   Conversational Depth:             {results.get('conversational_depth', 0.0):.4f}")
        print(f"   Emotional Intelligence:           {results.get('emotional_intelligence', 0.0):.4f}")

        print("\n⚡ PERFORMANCE:")
        print(f"   Average Response Time (s):        {results.get('avg_response_time', 0.0):.3f}")
        print(f"   Average Response Length (words):  {results.get('avg_response_length', 0.0):.1f}")
        print(f"   Memory Usage (GB):                {results.get('memory_usage_gb', 0.0):.2f}")

        print("\n📊 SCORE INTERPRETATION:")
        print("   • Perplexity: <50 excellent, <100 good, <200 fair")
        print("   • Semantic Diversity: 0.30+ good (not too repetitive)")
        print("   • Response Appropriateness: 0.70+ good contextual fit")
        print("   • Conversational Depth: 0.50+ good, 0.70+ excellent")
        print("   • Emotional Intelligence: 0.30+ good, 0.50+ excellent")

        return results

### Interactive Chat Loop (Voice + Text Input, ABSA Context, Evaluation & TTS)

This cell implements the **main conversational loop** for the mental health chatbot, integrating **typed and voice input**, **diagnostic outputs**, and **evaluation logging**.

**Key features:**

1. **Input Modes**  
   - **Typed Input** – User can type messages directly.  
   - **Voice Input** – Uses browser mic recording (via Colab JS + `speech_recognition`) for hands-free interaction.  

2. **Speech Processing**  
   - **Voice Capture** – Records audio in Colab and saves as `.wav`.  
   - **Speech-to-Text** – Transcribes voice input using Google Web Speech API.  
   - **Text-to-Speech** – Converts chatbot replies into natural audio via **gTTS**.  

3. **Response Generation**  
   - Calls the previously defined `generate_response()` function.  
   - Includes **emotion, sentiment, and ABSA context** in diagnostics.  
   - Tracks chat history to maintain conversational flow.  

4. **Evaluation Integration**  
   - Logs each interaction with `ImprovedMentalHealthEvaluator`.  
   - Captures:  
     - Response time  
     - User query  
     - Bot reply  
     - ABSA insights (topic → sentiment → confidence)  
   - Prepares data for **comprehensive evaluation** at session end.  

5. **Exit & Safeguards**  
   - Supports `quit` / `exit` commands.  
   - Handles transcription or generation errors gracefully with fallback messages.  

6. **Session Summary**  
   - At the end of the chat, prompts user to run `run_comprehensive_evaluation()` (or automatically offers).  
   - Reports total number of interactions collected.  

This loop provides a **hands-on, multimodal experience** where users can:  
- Speak or type their input  
- Hear empathetic, context-aware replies  
- See diagnostic signals (emotion, sentiment, ABSA)  
- Log conversations for **NLP evaluation and system performance review**  


In [10]:
# Import libraries for timing, file handling, audio processing, and Colab integration
import time
import tempfile
import os
import base64
import IPython.display as ipd
from gtts import gTTS
import speech_recognition as sr
from google.colab import output

# === CHAT LOOP WITH INTEGRATED EVALUATION CALLS ===
print("Chat ready. Type 't' to type, 'v' for voice input, 'quit' to exit, or enter your message directly.\n")

# Initialize the evaluator (assumes ImprovedMentalHealthEvaluator is already defined)
eval_metrics = ImprovedMentalHealthEvaluator()

# Initialize chat history and user message tracking
chat_history_ids = None
prev_user_messages = []
exit_commands = ["quit", "exit"]

# Browser recorder helpers
def save_audio(b64data, filename):
    """Callback helper: save base64 audio data from browser to a local file."""
    audio_bytes = base64.b64decode(b64data.split(',')[1])
    with open(filename, 'wb') as f:
        f.write(audio_bytes)
    print(f"✅ Audio saved to {filename}")

def _record_js(duration_seconds):
    """JS snippet that records from mic for duration_seconds and sends base64 to Python callback."""
    return f"""
    const sleep = time => new Promise(resolve => setTimeout(resolve, time));
    const b2text = blob => new Promise(resolve => {{
      const reader = new FileReader();
      reader.onloadend = e => resolve(e.target.result);
      reader.readAsDataURL(blob);
    }});
    async function record(sec) {{
      const stream = await navigator.mediaDevices.getUserMedia({{ audio: true }});
      const recorder = new MediaRecorder(stream);
      const data = [];
      recorder.ondataavailable = e => data.push(e.data);
      recorder.start();
      await sleep(sec * 1000);
      recorder.stop();
      await new Promise(resolve => recorder.onstop = resolve);
      const blob = new Blob(data, {{ type: 'audio/wav' }});
      const b64data = await b2text(blob);
      google.colab.kernel.invokeFunction('notebook.save_audio', [b64data], {{ }});
    }}
    record({duration_seconds});
    """

def record_audio_colab(filename="voice_input.wav", duration=5):
    """Trigger browser recording and save to filename. duration in seconds."""
    output.register_callback('notebook.save_audio', lambda b64: save_audio(b64, filename))
    display(ipd.Javascript(_record_js(duration)))
    print(f"🎤 Recording for {duration} seconds... Speak now.")

# Speech-to-text transcription using Google Web Speech API
def transcribe_audio(filename="voice_input.wav"):
    """Transcribe WAV file using Google Web Speech API (via SpeechRecognition)."""
    r = sr.Recognizer()
    try:
        with sr.AudioFile(filename) as source:
            audio = r.record(source)
        text = r.recognize_google(audio)
        print(f"🗣 Recognized: {text}")
        return text
    except sr.UnknownValueError:
        print("❌ Could not understand audio.")
        return None
    except sr.RequestError as e:
        print(f"❌ Speech recognition request failed: {e}")
        return None
    except FileNotFoundError:
        print("❌ Audio file not found (recording may have failed).")
        return None

# Generate and play text-to-speech response
def speak_response(text, lang='en', autoplay=True):
    """Generate TTS mp3 (gTTS) and process inline in Colab."""
    try:
        with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as tf:
            tmp_name = tf.name
        tts = gTTS(text=text, lang=lang)
        tts.save(tmp_name)
        display(ipd.Audio(tmp_name, autoplay=autoplay))
        time.sleep(0.5)
        try:
            os.remove(tmp_name)
        except Exception:
            pass
    except Exception as e:
        print(f"[TTS Error] {e}")

# Run comprehensive evaluation using the evaluator
def run_comprehensive_evaluation():
    """Run the comprehensive evaluation and display results"""
    try:
        results = eval_metrics.evaluate_comprehensive()
        eval_metrics.display_results(results)
    except Exception as e:
        print(f"⚠️ Evaluation failed: {e}")

# Main chat loop
while True:
    user_input = input("\nChoose mode — 't' = type, 'v' = voice, 'quit' to exit, or enter your message: ").strip().lower()
    if user_input in exit_commands:
        print("Bot: Goodbye. Take care of yourself! 💛")
        break
    mode = user_input if user_input in ['t', 'v'] else 't'
    if mode == 't':
        if user_input not in ['t', 'v']:
            user_message = user_input
        else:
            user_message = input("You (type): ").strip()
    elif mode == 'v':
        filename = "voice_input.wav"
        record_audio_colab(filename=filename, duration=5)
        print("⌛ Waiting for recording to finish...")
        time.sleep(6)
        user_message = transcribe_audio(filename=filename)
        if not user_message:
            print("No transcribed text — try again or type instead.")
            continue
    else:
        print("Invalid mode. Assuming typed input.")
        user_message = user_input
    if any(cmd in user_message.lower() for cmd in exit_commands):
        print("Bot: Goodbye. Take care of yourself! 💛")
        break
    try:
        start_time = time.time()
        # Generate response using the previously defined generate_response function
        bot_reply, chat_history_ids, emotion_label, emotion_score, sentiment_label, sentiment_score, aspects = generate_response(
            user_input=user_message,
            chat_history_ids=chat_history_ids,
            prev_user_messages=prev_user_messages,
            max_history_tokens=512
        )
        # Store user input in history (limit to last 3 messages)
        prev_user_messages.append(user_message)
        if len(prev_user_messages) > 3:
            prev_user_messages.pop(0)
        # Prepare ABSA summary for display
        parsed_aspects = []
        for a in aspects:
            if isinstance(a, dict) and 'aspect' in a and 'sentiment' in a and 'confidence' in a:
                parsed_aspects.append({
                    'aspect': a['aspect'],
                    'sentiment': a['sentiment'],
                    'confidence': a['confidence']
                })
            else:
                print(f"DEBUG: Invalid ABSA entry: {a}")
                parsed_aspects.append({
                    'aspect': 'situation',
                    'sentiment': 'neutral',
                    'confidence': 0.0
                })
        absa_context = "; ".join([f"{p['aspect']} → {p['sentiment']} ({p['confidence']:.2f})" for p in parsed_aspects]) if parsed_aspects else "No specific aspects detected."
        # Calculate response time
        response_time = time.time() - start_time
        # Show diagnostics and bot reply
        print(f"\n[Emotion: {emotion_label} ({emotion_score:.2f})] | [Sentiment: {sentiment_label} ({sentiment_score:.2f})]")
        print(f"[ABSA: {absa_context}]")
        print(f"Bot: {bot_reply}\n")
        # Add to evaluation metrics
        try:
            eval_metrics.add_conversation(
                user_input=user_message,
                bot_response=bot_reply,
                reference_response=None,
                response_time=response_time
            )
            print(f"📊 Evaluation sample #{len(eval_metrics.generated_responses)} recorded")
        except NameError:
            print("⚠️ Evaluation metrics not available (run evaluation setup cell first)")
        except Exception as e:
            print(f"⚠️ Error recording evaluation metrics: {e}")
        # Speak the bot reply using text-to-speech
        speak_response(bot_reply)
    except Exception as e:
        error_message = f"Sorry, something went wrong: {e}"
        print(f"Bot: {error_message}")
        speak_response(error_message)
        try:
            response_time = time.time() - start_time if 'start_time' in locals() else 0
            eval_metrics.add_conversation(
                user_input=user_message,
                bot_response=error_message,
                reference_response=None,
                response_time=response_time
            )
        except NameError:
            print("⚠️ Evaluation metrics not available (run evaluation setup cell first)")
        except Exception as e:
            print(f"⚠️ Error recording evaluation metrics: {e}")
# End of chat loop
print("\n🔄 Chat session ended. Run 'run_comprehensive_evaluation()' to see your metrics!")
# Automatically run evaluation if data exists
if eval_metrics.generated_responses:
    print(f"\n📊 You had {len(eval_metrics.generated_responses)} interactions.")
    user_choice = input("Run evaluation now? (y/n): ").strip().lower()
    if user_choice in ['y', 'yes']:
        run_comprehensive_evaluation()


Chat ready. Type 't' to type, 'v' for voice input, 'quit' to exit, or enter your message directly.

✅ Semantic similarity model loaded

Choose mode — 't' = type, 'v' = voice, 'quit' to exit, or enter your message: t
You (type): I am very depressed because of my recent breakup.
DEBUG: Emotion pipeline output: [[{'label': 'anger', 'score': 0.0006501637399196625}, {'label': 'disgust', 'score': 0.002027425216510892}, {'label': 'fear', 'score': 0.0020103484857827425}, {'label': 'joy', 'score': 0.0012133674463257194}, {'label': 'neutral', 'score': 0.0037288435269147158}, {'label': 'sadness', 'score': 0.9889687299728394}, {'label': 'surprise', 'score': 0.0014010182349011302}]]
DEBUG: Sentiment pipeline output: {'label': 'negative', 'score': 0.9186303615570068}
DEBUG: ABSA input for 'depressed': [CLS] depressed [SEP] I am very depressed because of my recent breakup. [SEP]
DEBUG: ABSA pipeline output: [{'label': 'Negative', 'score': 0.8693626523017883}]
DEBUG: ABSA final results: [{'aspect': 'd


Choose mode — 't' = type, 'v' = voice, 'quit' to exit, or enter your message: t
You (type): I am also sad because of my less marks in test.
DEBUG: Emotion pipeline output: [[{'label': 'anger', 'score': 0.0005896289367228746}, {'label': 'disgust', 'score': 0.0016037849709391594}, {'label': 'fear', 'score': 0.0010667084716260433}, {'label': 'joy', 'score': 0.0013854911085218191}, {'label': 'neutral', 'score': 0.003941924311220646}, {'label': 'sadness', 'score': 0.9895510077476501}, {'label': 'surprise', 'score': 0.0018614899599924684}]]
DEBUG: Sentiment pipeline output: {'label': 'negative', 'score': 0.8925520777702332}
DEBUG: ABSA input for 'exam': [CLS] exam [SEP] I am also sad because of my less marks in test. [SEP]
DEBUG: ABSA pipeline output: [{'label': 'Neutral', 'score': 0.7134143710136414}]
DEBUG: ABSA final results: [{'aspect': 'exam', 'sentiment': 'neutral', 'confidence': 0.71}]

[Emotion: sadness (0.99)] | [Sentiment: negative (0.89)]
[ABSA: exam → neutral (0.71)]
Bot: My hea


Choose mode — 't' = type, 'v' = voice, 'quit' to exit, or enter your message: t
You (type): Not yet but I doubt they will understand.
DEBUG: Emotion pipeline output: [[{'label': 'anger', 'score': 0.010856837034225464}, {'label': 'disgust', 'score': 0.021299295127391815}, {'label': 'fear', 'score': 0.010373285040259361}, {'label': 'joy', 'score': 0.00292309676297009}, {'label': 'neutral', 'score': 0.704594075679779}, {'label': 'sadness', 'score': 0.014001020230352879}, {'label': 'surprise', 'score': 0.23595239222049713}]]
DEBUG: Sentiment pipeline output: {'label': 'negative', 'score': 0.7318919897079468}
DEBUG: ABSA input for 'situation': [CLS] situation [SEP] Not yet but I doubt they will understand. [SEP]
DEBUG: ABSA pipeline output: [{'label': 'Negative', 'score': 0.9202898144721985}]
DEBUG: ABSA final results: [{'aspect': 'situation', 'sentiment': 'negative', 'confidence': 0.92}]

[Emotion: neutral (0.70)] | [Sentiment: negative (0.73)]
[ABSA: situation → negative (0.92)]
Bot: A s


Choose mode — 't' = type, 'v' = voice, 'quit' to exit, or enter your message: t
You (type): What else can I do to lift up my mood?
DEBUG: Emotion pipeline output: [[{'label': 'anger', 'score': 0.0005839711520820856}, {'label': 'disgust', 'score': 0.0016199953388422728}, {'label': 'fear', 'score': 0.0014280350878834724}, {'label': 'joy', 'score': 0.0013554085744544864}, {'label': 'neutral', 'score': 0.004843020346015692}, {'label': 'sadness', 'score': 0.9876049757003784}, {'label': 'surprise', 'score': 0.002564603928476572}]]
DEBUG: Sentiment pipeline output: {'label': 'neutral', 'score': 0.7176641225814819}
DEBUG: ABSA input for 'situation': [CLS] situation [SEP] What else can I do to lift up my mood? [SEP]
DEBUG: ABSA pipeline output: [{'label': 'Positive', 'score': 0.8130991458892822}]
DEBUG: ABSA final results: [{'aspect': 'situation', 'sentiment': 'positive', 'confidence': 0.81}]

[Emotion: sadness (0.99)] | [Sentiment: neutral (0.72)]
[ABSA: situation → positive (0.81)]
Bot: What


Choose mode — 't' = type, 'v' = voice, 'quit' to exit, or enter your message: GYmming usually helps me relax.
DEBUG: Emotion pipeline output: [[{'label': 'anger', 'score': 0.003215518081560731}, {'label': 'disgust', 'score': 0.005189365707337856}, {'label': 'fear', 'score': 0.0013472363352775574}, {'label': 'joy', 'score': 0.7578313946723938}, {'label': 'neutral', 'score': 0.17665114998817444}, {'label': 'sadness', 'score': 0.04769810661673546}, {'label': 'surprise', 'score': 0.00806723814457655}]]
DEBUG: Sentiment pipeline output: {'label': 'positive', 'score': 0.822674036026001}
DEBUG: ABSA input for 'situation': [CLS] situation [SEP] gymming usually helps me relax. [SEP]
DEBUG: ABSA pipeline output: [{'label': 'Neutral', 'score': 0.6047804951667786}]
DEBUG: ABSA final results: [{'aspect': 'situation', 'sentiment': 'neutral', 'confidence': 0.6}]

[Emotion: joy (0.76)] | [Sentiment: positive (0.82)]
[ABSA: situation → neutral (0.60)]
Bot: A short break might lift your mood.

📊 Evalua


Choose mode — 't' = type, 'v' = voice, 'quit' to exit, or enter your message: quit
Bot: Goodbye. Take care of yourself! 💛

🔄 Chat session ended. Run 'run_comprehensive_evaluation()' to see your metrics!

📊 You had 5 interactions.
Run evaluation now? (y/n): y
📊 Evaluating 5 conversations...
🔄 Calculating core metrics...


`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.


🔄 Calculating performance metrics...

📋 MENTAL HEALTH CHATBOT EVALUATION

🎯 SEMANTIC & LINGUISTIC QUALITY:
   Perplexity (lower is better):     199.38
   Semantic Diversity:               0.5186
   Response Appropriateness:         0.2956
   Conversational Depth:             0.6000
   Emotional Intelligence:           0.6000

⚡ PERFORMANCE:
   Average Response Time (s):        0.564
   Average Response Length (words):  8.6
   Memory Usage (GB):                2.50

📊 SCORE INTERPRETATION:
   • Perplexity: <50 excellent, <100 good, <200 fair
   • Semantic Diversity: 0.30+ good (not too repetitive)
   • Response Appropriateness: 0.70+ good contextual fit
   • Conversational Depth: 0.50+ good, 0.70+ excellent
   • Emotional Intelligence: 0.30+ good, 0.50+ excellent


## 📈 Evaluation & Diagnostics

This section evaluates chatbot performance on a **held-out `test_dataset`** rather than only ad-hoc chat logs.

Metrics computed:
- **Perplexity** – Measures fluency of generated responses.  
- **Diversity (Distinct-n)** – Checks lexical variety across outputs.  
- **Average Response Length** – Monitors verbosity vs. conciseness.  
- **Safety & Variety Checks** – Ensures responses remain supportive, non-repetitive, and context-appropriate.  

### Usage
After training or fine-tuning, run:
```python
run_comprehensive_evaluation(test_dataset)


In [20]:
# Import libraries for data handling, model evaluation, and chatbot pipeline
import pandas as pd
import torch
import numpy as np
import time
import psutil
import re
import random
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoModelForCausalLM, pipeline
from sentence_transformers import SentenceTransformer, util
import faiss
from difflib import SequenceMatcher
import warnings
warnings.filterwarnings('ignore')

# === MODERN NLP EVALUATION METRICS ===
# (No changes from previous)
def calculate_perplexity(responses, model_path="/content/drive/MyDrive/AIchatbotmodels/mental_health_model/checkpoint-18000"):
    try:
        tokenizer = AutoTokenizer.from_pretrained(model_path)
        model = AutoModelForCausalLM.from_pretrained(model_path)
        model.eval()

        if not responses:
            return 0.0

        perplexities = []
        for response in responses:
            if not response.strip():
                continue
            inputs = tokenizer(response, return_tensors="pt", truncation=True, max_length=512)
            input_ids = inputs["input_ids"]

            with torch.no_grad():
                outputs = model(input_ids, labels=input_ids)
                loss = outputs.loss

            perplexity = torch.exp(loss).item()
            perplexities.append(perplexity)

        if not perplexities:
            return 0.0

        return np.mean(perplexities)
    except Exception as e:
        print(f"⚠️ Perplexity calculation failed: {e}")
        return 0.0

def calculate_semantic_similarity(generated_responses, model):
    if not generated_responses or len(generated_responses) < 2:
        return 0.0

    try:
        embeddings = model.encode(generated_responses)
        similarities = []

        for i in range(len(embeddings)):
            for j in range(i+1, len(embeddings)):
                sim = util.pytorch_cos_sim(embeddings[i], embeddings[j]).item()
                similarities.append(sim)

        avg_similarity = np.mean(similarities)
        diversity_score = 1 - avg_similarity
        return max(0, diversity_score)

    except Exception as e:
        print(f"⚠️ Semantic similarity calculation failed: {e}")
        return 0.0

def calculate_response_appropriateness(user_inputs, bot_responses, model):
    if not user_inputs or not bot_responses or len(user_inputs) != len(bot_responses):
        return 0.0

    try:
        appropriateness_scores = []

        for user_input, bot_response in zip(user_inputs, bot_responses):
            user_emb = model.encode([user_input])
            bot_emb = model.encode([bot_response])

            similarity = util.pytorch_cos_sim(user_emb, bot_emb).item()
            appropriateness_scores.append(similarity)

        return np.mean(appropriateness_scores)

    except Exception as e:
        print(f"⚠️ Response appropriateness calculation failed: {e}")
        return 0.0

def calculate_conversational_depth(responses):
    if not responses:
        return 0.0

    depth_indicators = [
        r'\b(because|since|therefore|however|although|while)\b',
        r'\b(feel|think|believe|consider|experience)\b',
        r'\b(might|could|perhaps|possibly|sometimes)\b',
        r'\b(relationship|pattern|connection|underlying)\b'
    ]

    scores = []
    for response in responses:
        response_lower = response.lower()
        depth_score = sum(1 for pattern in depth_indicators
                         if re.search(pattern, response_lower))

        word_count = len(response.split())
        normalized_score = min(1.0, depth_score / max(1, word_count / 15))
        scores.append(normalized_score)

    return np.mean(scores)

def calculate_emotional_intelligence(responses):
    if not responses:
        return 0.0

    ei_indicators = [
        r'\b(feel|feeling|emotion|emotional|mood)\b',
        r'\b(angry|sad|happy|anxious|frustrated|excited|worried)\b',
        r'\b(makes you|causes you|leads you|helps you)\b.*\bfeel\b',
        r'\b(cope|manage|handle|deal with)\b.*\b(feeling|emotion)\b'
    ]

    scores = []
    for response in responses:
        response_lower = response.lower()
        ei_score = sum(1 for pattern in ei_indicators
                      if re.search(pattern, response_lower))

        word_count = len(response.split())
        normalized_score = min(1.0, ei_score / max(1, word_count / 12))
        scores.append(normalized_score)

    return np.mean(scores)

# === COMPREHENSIVE EVALUATION SYSTEM ===
# (No changes from previous)
class ImprovedMentalHealthEvaluator:
    def __init__(self):
        try:
            self.semantic_model = SentenceTransformer("all-MiniLM-L6-v2")
            print("✅ Semantic similarity model loaded")
        except Exception as e:
            print(f"⚠️ Could not load semantic model: {e}")
            self.semantic_model = None

        self.reset_metrics()

    def reset_metrics(self):
        self.user_inputs = []
        self.generated_responses = []
        self.reference_responses = []
        self.response_times = []

    def add_conversation(self, user_input, bot_response, reference_response=None,
                        response_time=0.0):
        self.user_inputs.append(user_input)
        self.generated_responses.append(bot_response)
        self.reference_responses.append(reference_response or "")
        self.response_times.append(response_time)

    def evaluate_comprehensive(self):
        if not self.generated_responses:
            print("❌ No conversations to evaluate!")
            return {}

        print(f"📊 Evaluating {len(self.generated_responses)} conversations...")

        results = {}

        print("🔄 Calculating core metrics...")
        results['perplexity'] = calculate_perplexity(self.generated_responses)

        if self.semantic_model:
            results['semantic_diversity'] = calculate_semantic_similarity(
                self.generated_responses, self.semantic_model
            )
            results['response_appropriateness'] = calculate_response_appropriateness(
                self.user_inputs, self.generated_responses, self.semantic_model
            )
        else:
            results['semantic_diversity'] = 0.0
            results['response_appropriateness'] = 0.0

        results['conversational_depth'] = calculate_conversational_depth(
            self.generated_responses
        )
        results['emotional_intelligence'] = calculate_emotional_intelligence(
            self.generated_responses
        )

        print("🔄 Calculating performance metrics...")
        results['avg_response_time'] = np.mean(self.response_times) if self.response_times else 0.0
        results['avg_response_length'] = np.mean([len(resp.split()) for resp in self.generated_responses])

        process = psutil.Process()
        results['memory_usage_gb'] = process.memory_info().rss / 1024 / 1024 / 1024

        return results

    def display_results(self, results):
        print("\n" + "="*60)
        print("📋 MENTAL HEALTH CHATBOT EVALUATION")
        print("="*60)

        print("\n🎯 SEMANTIC & LINGUISTIC QUALITY:")
        print(f"   Perplexity (lower is better):     {results.get('perplexity', 0.0):.2f}")
        print(f"   Semantic Diversity:               {results.get('semantic_diversity', 0.0):.4f}")
        print(f"   Response Appropriateness:         {results.get('response_appropriateness', 0.0):.4f}")
        print(f"   Conversational Depth:             {results.get('conversational_depth', 0.0):.4f}")
        print(f"   Emotional Intelligence:           {results.get('emotional_intelligence', 0.0):.4f}")

        print("\n⚡ PERFORMANCE:")
        print(f"   Average Response Time (s):        {results.get('avg_response_time', 0.0):.3f}")
        print(f"   Average Response Length (words):  {results.get('avg_response_length', 0.0):.1f}")
        print(f"   Memory Usage (GB):                {results.get('memory_usage_gb', 0.0):.2f}")

        print("\n📊 SCORE INTERPRETATION:")
        print("   • Perplexity: <50 excellent, <100 good, <200 fair")
        print("   • Semantic Diversity: 0.30+ good (not too repetitive)")
        print("   • Response Appropriateness: 0.70+ good contextual fit")
        print("   • Conversational Depth: 0.50+ good, 0.70+ excellent")
        print("   • Emotional Intelligence: 0.30+ good, 0.50+ excellent")

        return results

# === CHATBOT PIPELINE ===
# Device setup
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# --- Load Trained DialoGPT Model ---
chat_model_path = "/content/drive/MyDrive/AIchatbotmodels/mental_health_model/checkpoint-18000"
chat_tokenizer = AutoTokenizer.from_pretrained(chat_model_path)
chat_model = AutoModelForCausalLM.from_pretrained(chat_model_path).to(device)

# --- Load Emotion Detection Model ---
emotion_model_name = "j-hartmann/emotion-english-distilroberta-base"
emotion_tokenizer = AutoTokenizer.from_pretrained(emotion_model_name)
emotion_model = AutoModelForSequenceClassification.from_pretrained(emotion_model_name).to(device)
emotion_pipeline = pipeline(
    "text-classification",
    model=emotion_model,
    tokenizer=emotion_tokenizer,
    device=0 if torch.cuda.is_available() else -1,
    return_all_scores=True
)

# --- Load Sentiment Model ---
sentiment_model_name = "cardiffnlp/twitter-roberta-base-sentiment-latest"
sentiment_tokenizer = AutoTokenizer.from_pretrained(sentiment_model_name)
sentiment_model = AutoModelForSequenceClassification.from_pretrained(sentiment_model_name).to(device)
sentiment_pipeline = pipeline(
    "sentiment-analysis",
    model=sentiment_model,
    tokenizer=sentiment_tokenizer,
    device=0 if torch.cuda.is_available() else -1
)

# --- Load ABSA Model ---
absa_model_name = "yangheng/deberta-v3-base-absa-v1.1"
absa_tokenizer = AutoTokenizer.from_pretrained(absa_model_name)
absa_model = AutoModelForSequenceClassification.from_pretrained(absa_model_name).to(device)
absa_pipeline = pipeline(
    "text-classification",
    model=absa_model,
    tokenizer=absa_tokenizer,
    device=0 if torch.cuda.is_available() else -1
)

# --- Embedding Model ---
embedding_model = SentenceTransformer("all-MiniLM-L6-v2", device=device)

# --- Unsafe Keyword List ---
unsafe_keywords = [
    "suicide", "kill myself", "self harm", "hurt myself",
    "end my life", "overdose", "cutting", "hang myself",
    "can't go on", "want to die", "give up on life",
    "life is pointless", "I see no future", "end it all"
]
unsafe_embeddings = embedding_model.encode(unsafe_keywords, convert_to_tensor=True)

# --- Knowledge Base RAG ---
rag_df = pd.read_excel("/content/drive/MyDrive/AIchatbotmodels/RAG_Knowledge_Base_WithID.xlsx")
documents = rag_df["Knowledge Entry"].tolist()

# Encode & normalize document embeddings
doc_embeddings = embedding_model.encode(
    documents,
    convert_to_numpy=True,
    normalize_embeddings=True
)

# Create FAISS index
dimension = doc_embeddings.shape[1]
index = faiss.IndexFlatIP(dimension)
index.add(doc_embeddings)

def retrieve_docs(query, top_k=3):
    q_emb = embedding_model.encode(
        [query],
        convert_to_numpy=True,
        normalize_embeddings=True
    )
    distances, indices = index.search(q_emb, top_k)
    retrieved = [(documents[idx], dist) for idx, dist in zip(indices[0], distances[0]) if dist > 0.65]

    if not retrieved:
        query_lower = query.lower()
        keyword_matches = []
        for i, doc in enumerate(documents):
            if any(keyword in doc.lower() for keyword in query_lower.split()):
                keyword_matches.append((doc, 0.5))
        keyword_matches = sorted(keyword_matches, key=lambda x: x[1], reverse=True)[:top_k]
        retrieved = keyword_matches

    print(f"RAG Retrieved: {[doc for doc, dist in retrieved]} (Scores: {[dist for doc, dist in retrieved]})")
    return [doc for doc, dist in retrieved]

# --- Prompt Template with Enhanced Context ---
previous_response_type = None
previous_prompt_text = None
last_empathy_phrase = None
_last_full_response = None

empathy_phrases = {
    "sadness": {
        "very_high": [
            "This must feel overwhelming to carry.",
            "I’m so sorry you’re facing such heavy emotions.",
            "This sounds incredibly painful for you.",
            "My heart aches for what you’re going through."
        ],
        "high": [
            "I can see how difficult this must be for you.",
            "That sounds really tough to handle.",
            "It’s understandable to feel so down about this.",
            "This must be weighing heavily on you."
        ],
        "low": [
            "I hear you’re feeling a bit low.",
            "That sounds like it’s been tough.",
            "It’s okay to feel down right now.",
            "That must be a bit challenging."
        ]
    },
    "anger": {
        "very_high": [
            "This must be absolutely infuriating for you.",
            "I can feel how intense this frustration is.",
            "That sounds incredibly aggravating to deal with.",
            "Your anger is so understandable right now."
        ],
        "high": [
            "I get how frustrating this must be.",
            "That sounds really irritating.",
            "It’s understandable to feel upset about this.",
            "This must be tough to deal with."
        ],
        "low": [
            "I hear you’re a bit annoyed.",
            "That sounds a little frustrating.",
            "It’s okay to feel irritated.",
            "That must be mildly upsetting."
        ]
    },
    "fear": {
        "very_high": [
            "This must feel incredibly scary for you.",
            "I can see how overwhelming this worry is.",
            "That sounds deeply unsettling.",
            "Your fear is completely understandable."
        ],
        "high": [
            "That must be really worrying for you.",
            "I can see why this feels scary.",
            "It’s understandable to feel anxious.",
            "This must be tough to face."
        ],
        "low": [
            "That sounds a bit unsettling.",
            "I hear you’re feeling a little anxious.",
            "It’s okay to feel a bit worried.",
            "That must be slightly nerve-wracking."
        ]
    },
    "joy": {
        "very_high": [
            "That’s absolutely fantastic to hear!",
            "I’m so thrilled for you!",
            "What an incredible moment!",
            "This must feel amazing!"
        ],
        "high": [
            "That’s wonderful to hear!",
            "I’m really happy for you!",
            "What a great moment!",
            "This must feel so good!"
        ],
        "low": [
            "That’s nice to hear!",
            "I’m glad you’re feeling good.",
            "Sounds like a happy moment.",
            "That’s great to know!"
        ]
    },
    "love": {
        "very_high": [
            "That’s so beautiful and heartwarming!",
            "Your love is truly special!",
            "What an incredible feeling to share!",
            "This must feel so warm and wonderful!"
        ],
        "high": [
            "That’s really sweet to hear!",
            "I’m happy for your connection!",
            "What a lovely feeling!",
            "This must feel so special!"
        ],
        "low": [
            "That’s sweet to hear.",
            "I’m glad you’re feeling this warmth.",
            "Sounds like a nice moment.",
            "That’s lovely to know."
        ]
    },
    "surprise": {
        "very_high": [
            "Wow, that must be a huge shock!",
            "I can’t believe how surprising this is!",
            "That sounds like an incredible twist!",
            "What a wild moment for you!"
        ],
        "high": [
            "That’s really surprising!",
            "I can see why that caught you off guard!",
            "What an unexpected turn!",
            "That must be quite a shock!"
        ],
        "low": [
            "That’s a bit surprising!",
            "Sounds like that was unexpected.",
            "I hear it caught you off guard.",
            "That must be a little startling."
        ]
    },
    "neutral": {
        "very_high": ["I hear you."],
        "high": ["I understand."],
        "low": ["Got it."]
    },
    "disgust": {
        "very_high": [
            "That sounds incredibly upsetting to experience.",
            "I can see how this feels so distasteful.",
            "This must be really hard to stomach.",
            "Your reaction is completely understandable."
        ],
        "high": [
            "That sounds really unpleasant to deal with.",
            "I get how this feels off-putting.",
            "It’s understandable to feel this way.",
            "This must be tough to handle."
        ],
        "low": [
            "That sounds a bit unsettling.",
            "I hear you’re feeling uneasy.",
            "It’s okay to feel bothered.",
            "That must be slightly off-putting."
        ]
    }
}

def get_unique_empathy_phrase(emotion, intensity="low"):
    if emotion not in empathy_phrases:
        emotion = "neutral"
    phrases = empathy_phrases[emotion][intensity]
    available_phrases = [p for p in phrases if p != last_empathy_phrase]
    if not available_phrases:
        available_phrases = phrases
    phrase = random.choice(available_phrases)
    global last_empathy_phrase
    last_empathy_phrase = phrase
    return phrase

def is_advice_question(user_input):
    return bool(re.search(r"\b(what should I do|how can I|how do I|any tips|suggest|recommend|what can I|advice|help me|what to do)\b", user_input.lower()))

def detect_intent(text):
    closure_cues = r"\b(thanks|thank you|sure|okay|ok|will do|got it|appreciate|alright|gotcha|great|bye|goodbye)\b"
    if re.search(closure_cues, text.lower()):
        return "closing"
    question_cues = r"\b(what|how|why|when|where|can you|could you|do you)\b"
    if re.search(question_cues, text.lower()) and not is_advice_question(text):
        return "question"
    return None

def build_prompt(user_query, emotion=None, emotion_score=0.0, aspects=None, intent=None, prev_user_messages=None, retrieved_docs=None):
    global previous_response_type, previous_prompt_text

    response_styles = ["empathetic", "informative", "reflective", "closing"]
    if intent == "closing":
        chosen_style = "closing"
    elif intent == "question":
        chosen_style = "reflective"
    elif is_advice_question(user_query):
        chosen_style = "informative"
    else:
        possible_styles = [s for s in response_styles if s != previous_response_type and s != "closing"]
        chosen_style = random.choice(possible_styles)
    previous_response_type = chosen_style

    primary_aspect = aspects[0]['aspect'] if aspects and aspects[0]['confidence'] >= 0.7 else "situation"
    emotion_line = f"Emotion: {emotion} (score: {emotion_score:.2f})."
    history_line = (
        f"Previous: {prev_user_messages[-1]}" + "\n"
        if prev_user_messages and len(user_query.split()) > 3 else ""
    )
    rag_line = (
        f"Context: {retrieved_docs[0][:100].strip()}..." + "\n"
        if retrieved_docs else ""
    )
    key_terms = " ".join([word for word in user_query.split() if len(word) > 3]) if len(user_query.split()) <= 3 else ""
    key_terms_line = f"Key terms: {key_terms}\n" if key_terms else ""

    intensity = "very_high" if emotion_score > 0.9 else "high" if emotion_score > 0.7 else "low"

    system_instruction = "Respond in 8–12 words, addressing user’s input, emotion, and context."
    if chosen_style == "empathetic":
        empathy_phrase = get_unique_empathy_phrase(emotion, intensity)
        style_instruction = f"Start with '{empathy_phrase}' Address {primary_aspect}."
    elif chosen_style == "informative":
        style_instruction = f"Provide actionable advice about {primary_aspect}. Use context if available."
    elif chosen_style == "closing":
        style_instruction = "Acknowledge user’s intent briefly."
    else:
        style_instruction = f"Ask a question about {primary_aspect} to deepen conversation."

    prompt = (
        f"{system_instruction}\n{history_line}{emotion_line}{rag_line}{key_terms_line}User: {user_query}\n"
        f"Instruction: {style_instruction}\nAssistant:"
    )

    if prompt == previous_prompt_text:
        prompt += " Provide a unique response."
    previous_prompt_text = prompt

    return prompt

# --- RESPONSE GENERATION LOGIC ---
# (No changes from previous)
_last_empathy_line = None
_last_quote_turn = -3
_turn_counter = 0
_last_advice_line = None
_last_question_line = None
_last_response_category = None
_last_category_turn = -5
_last_full_response = None
_last_structure = None
_last_hear_you_turn = -5

def detect_emotion(text, min_confidence=0.4):
    try:
        results = emotion_pipeline(text)
        if isinstance(results, list) and results and isinstance(results[0], list):
            top_emotion = max(results[0], key=lambda x: x['score'])
            if top_emotion['score'] < min_confidence:
                return "neutral", round(top_emotion['score'], 2)
            return top_emotion['label'], round(top_emotion['score'], 2)
        return "neutral", 0.0
    except Exception as e:
        print(f"⚠️ Emotion detection error: {e}")
        return "neutral", 0.0

def detect_sentiment(text):
    try:
        result = sentiment_pipeline(text)[0]
        return result['label'].lower(), round(result['score'], 2)
    except Exception as e:
        print(f"⚠️ Sentiment detection error: {e}")
        return "neutral", 0.0

def detect_absa(text):
    try:
        def normalize_aspect(aspect):
            synonym_map = {
                "test": "exam",
                "marks": "exam",
                "grades": "exam",
                "score": "exam",
                "studies": "study",
                "sickness": "sick",
                "illness": "sick",
                "disease": "sick",
                "edd": "workshops",
                "employment": "workshops"
            }
            return synonym_map.get(aspect.lower(), aspect.lower())

        def extract_aspects(text):
            words = text.lower().split()
            potential_aspects = []
            aspect_keywords = [
                'girlfriend', 'boyfriend', 'partner', 'husband', 'wife',
                'relationship', 'marriage', 'breakup', 'divorce',
                'family', 'mother', 'father', 'parent', 'sibling', 'brother', 'sister',
                'friend', 'friendship',
                'job', 'career', 'work', 'boss', 'manager', 'colleague',
                'study', 'school', 'college', 'university', 'exam', 'test', 'marks', 'grades',
                'depression', 'depressed', 'anxiety', 'stressed', 'stress',
                'fear', 'worry', 'lonely', 'loneliness', 'sad', 'sadness',
                'angry', 'anger', 'frustrated', 'confused', 'hopeless',
                'health', 'illness', 'sick', 'tired', 'fatigue',
                'loss', 'grief', 'death', 'trauma', 'change', 'moving', 'new place',
                'workshops', 'edd', 'employment'
            ]
            for word in words:
                normalized = normalize_aspect(word)
                if normalized in aspect_keywords:
                    potential_aspects.append(normalized)
            text_lower = text.lower()
            if 'break up' in text_lower:
                potential_aspects.append('breakup')
            if 'lost job' in text_lower:
                potential_aspects.append('job loss')
            if 'best friend' in text_lower:
                potential_aspects.append('friend')
            if 'family issue' in text_lower:
                potential_aspects.append('family')
            if 'love life' in text_lower:
                potential_aspects.append('relationship')
            if 'work stress' in text_lower:
                potential_aspects.append('stress')
            if 'career change' in text_lower:
                potential_aspects.append('career')
            if 'mental health' in text_lower:
                potential_aspects.append('mental health')
            if 'workshop' in text_lower or 'edd' in text_lower:
                potential_aspects.append('workshops')
            if not potential_aspects:
                potential_aspects = ['situation']
            prioritized = [a for a in potential_aspects if a not in ['sad', 'sadness', 'anger', 'fear', 'joy', 'love', 'surprise']]
            return list(set(prioritized if prioritized else potential_aspects[:1]))

        potential_aspects = extract_aspects(text)
        results = []
        for aspect in potential_aspects:
            absa_input = f"[CLS] {aspect} [SEP] {text} [SEP]"
            pipeline_output = absa_pipeline(absa_input)
            if isinstance(pipeline_output, list) and pipeline_output and isinstance(pipeline_output[0], dict):
                result = pipeline_output[0]
                sentiment = result.get('label', 'neutral').lower()
                confidence = round(result.get('score', 0.0), 2)
                if confidence >= 0.7:
                    results.append({
                        'aspect': aspect,
                        'sentiment': sentiment,
                        'confidence': confidence
                    })
            else:
                results.append({
                    'aspect': aspect,
                    'sentiment': 'neutral',
                    'confidence': 0.0
                })
        results = sorted(
            results,
            key=lambda x: (x['confidence'], x['aspect'] != 'situation', x['sentiment'] == 'negative'),
            reverse=True
        )
        return results if results else [{'aspect': 'situation', 'sentiment': 'neutral', 'confidence': 0.0}]
    except Exception as e:
        print(f"⚠️ ABSA error: {e}")
        return [{'aspect': 'situation', 'sentiment': 'neutral', 'confidence': 0.0}]

def reconcile_emotion_sentiment(emotion, sentiment, text, emotion_score):
    if emotion_score > 0.7:
        return emotion
    if len(text.split()) < 10:
        if re.search(r"\b(thanks|thank you|sure|okay|ok|will do|got it|appreciate|alright|gotcha|great)\b", text.lower()):
            return "joy" if sentiment == "positive" else "neutral"
        if re.search(r"\b(what should I do|how can I|how do I|any tips|suggest|recommend|what can I|advice|help me|what to do)\b", text.lower()):
            return "neutral"
    if re.search(r"\b(miss|long for|yearn|depressed|depression)\b", text.lower()):
        return "sadness"
    if re.search(r"\b(embarrassed|disgusted|ashamed)\b", text.lower()):
        return "disgust"
    if sentiment == "negative" and emotion in ["joy", "love", "surprise"]:
        return "sadness"
    if sentiment == "positive" and emotion in ["sadness", "fear", "anger"]:
        return "joy"
    return emotion

def is_safe(text, threshold=0.65):
    try:
        text_emb = embedding_model.encode(text, convert_to_tensor=True)
        cos_sim = util.pytorch_cos_sim(text_emb, unsafe_embeddings)
        return torch.max(cos_sim).item() < threshold
    except Exception:
        return True

def paraphrase_snippet(text, max_len=10):
    words = text.split()
    snippet = " ".join(words[:max_len])
    if len(words) > max_len:
        snippet += "..."
    snippet = snippet.replace("I ", "you ").replace("my ", "your ")
    return snippet

# --- Enhanced Fallback Response ---
def generate_fallback_response(user_input, retrieved_docs, primary_aspect, emotion, emotion_score, intent):
    intensity = "very_high" if emotion_score > 0.9 else "high" if emotion_score > 0.7 else "low"
    if intent == "closing":
        closing_templates = {
            "exam": f"Glad I could help with your {primary_aspect} concerns!",
            "study": f"Great to support your {primary_aspect} efforts!",
            "workshops": "Happy to help with your EDD workshop questions!",
            "default": "Glad I could help you out!"
        }
        return closing_templates.get(primary_aspect, closing_templates["default"])

    empathy_phrase = get_unique_empathy_phrase(emotion, intensity)

    if retrieved_docs and is_advice_question(user_input):
        doc_snippet = retrieved_docs[0][:100].strip() + "..." if len(retrieved_docs[0]) > 100 else retrieved_docs[0]
        return f"{empathy_phrase} Try this: {doc_snippet}"

    if intent == "question" and "workshop" in user_input.lower() or "edd" in user_input.lower():
        return f"{empathy_phrase} EDD offers workshops on resumes, job search, and interviews."

    advice_templates = {
        "exam": [
            "Try a study schedule to boost exam confidence.",
            "Focus on key concepts to prepare effectively.",
            "Break study sessions into smaller chunks."
        ],
        "study": [
            "Break study sessions into manageable chunks.",
            "Set small, achievable study goals.",
            "Use flashcards to reinforce key concepts."
        ],
        "loneliness": [
            "Try connecting with a friend or loved one.",
            "Join a club or group to meet new people.",
            "Journaling can help process feelings of loneliness."
        ],
        "stress": [
            "Practice deep breathing to manage stress.",
            "Take short breaks to reset your mind.",
            "Try mindfulness exercises to stay calm."
        ],
        "workshops": [
            "Explore EDD’s resume and interview workshops.",
            "Check CalJOBS for local EDD workshop schedules.",
            "EDD offers virtual job search workshops."
        ],
        "default": [
            "A short walk might lift your mood.",
            "Try a favorite activity to feel better.",
            "Writing down thoughts can help."
        ]
    }
    question_templates = {
        "exam": [
            "What’s making studying for exams tough?",
            "Have you tried any study strategies?",
            "How can I support your exam prep?"
        ],
        "study": [
            "What’s challenging about your study routine?",
            "Which subjects are toughest for you?",
            "What study methods have you tried?"
        ],
        "loneliness": [
            "What’s been making you feel lonely lately?",
            "Have you connected with anyone recently?",
            "What helps you feel less alone?"
        ],
        "stress": [
            "What’s causing your stress right now?",
            "What helps you unwind when stressed?",
            "Have you tried any relaxation techniques?"
        ],
        "workshops": [
            "Which EDD workshops interest you most?",
            "Have you checked CalJOBS for workshop details?",
            "What job skills are you looking to improve?"
        ],
        "default": [
            "What’s been on your mind lately?",
            "How can I support you right now?",
            "What’s making things feel tough?"
        ]
    }

    if len(user_input.split()) <= 3 and emotion != "neutral":
        templates = question_templates.get(primary_aspect, question_templates["default"])
        return f"{empathy_phrase} {random.choice(templates)}"
    elif is_advice_question(user_input):
        templates = advice_templates.get(primary_aspect, advice_templates["default"])
        return f"{empathy_phrase} {random.choice(templates)}"
    elif intent == "question":
        templates = question_templates.get(primary_aspect, question_templates["default"])
        return f"{empathy_phrase} {random.choice(templates)}"
    return f"{empathy_phrase} Can you share more about what’s going on?"

# --- Style Post-Processor ---
def post_process_style(response, response_style, primary_aspect, emotion, emotion_score, intent=None):
    intensity = "very_high" if emotion_score > 0.9 else "high" if emotion_score > 0.7 else "low"

    if intent == "closing":
        closing_templates = {
            "exam": f"Glad I could help with your {primary_aspect} concerns!",
            "study": f"Great to support your {primary_aspect} efforts!",
            "workshops": "Happy to help with your EDD workshop questions!",
            "default": "Glad I could help you out!"
        }
        return closing_templates.get(primary_aspect, closing_templates["default"])

    empathy_phrase = get_unique_empathy_phrase(emotion, intensity)

    if response_style == "empathetic" and re.search(r"(feel|sounds|understand|empathize)", response.lower()):
        return response
    elif response_style == "informative" and re.search(r"(try|consider|might help|suggest)", response.lower()):
        return response
    elif response_style == "reflective" and re.search(r"\?$|how|what|why", response.lower()):
        return response

    if response_style == "empathetic":
        if primary_aspect in ["exam", "marks", "test", "grades"]:
            return f"{empathy_phrase} It sounds tough to face your {primary_aspect} challenges."
        return f"{empathy_phrase} It’s hard dealing with this {primary_aspect}."
    elif response_style == "informative":
        advice_templates = {
            "exam": "Try a study schedule to boost exam confidence.",
            "marks": "Review key concepts to improve your grades.",
            "test": "Focus on one topic at a time for your test.",
            "study": "Break study sessions into small, manageable chunks.",
            "workshops": "Explore EDD’s resume and interview workshops.",
            "default": "A short break might lift your mood."
        }
        return f"{empathy_phrase} {advice_templates.get(primary_aspect, advice_templates['default'])}"
    elif response_style == "reflective":
        question_templates = {
            "exam": f"What’s been toughest about your {primary_aspect}?",
            "marks": f"How are you feeling about your {primary_aspect} now?",
            "test": f"How’s your confidence after this {primary_aspect}?",
            "study": f"What’s challenging about your {primary_aspect}?",
            "workshops": f"Which EDD workshops interest you most?",
            "default": f"What’s been on your mind lately?"
        }
        return f"{empathy_phrase} {question_templates.get(primary_aspect, question_templates['default'])}"
    return response

# --- Empathy / Advice Augmenters ---
def add_empathy_filter(response, user_emotion, sentiment_score, emotion_score, user_input=None, response_style="empathetic", intent=None):
    global _last_empathy_line, _last_quote_turn, _turn_counter, _last_hear_you_turn
    if intent == "closing":
        return response
    intensity = "very_high" if emotion_score > 0.9 else "high" if emotion_score > 0.7 else "low"
    empathy_lines = empathy_phrases.get(user_emotion.lower(), {"very_high": ["I understand."], "high": ["I understand."], "low": ["I understand."]})[intensity]
    available_lines = [e for e in empathy_lines if e != _last_empathy_line] or empathy_lines
    empathy_line = random.choice(available_lines)
    _last_empathy_line = empathy_line
    if user_input and random.random() < 0.3 and (_turn_counter - _last_quote_turn >= 2):
        empathy_line += f" You mentioned: \"{paraphrase_snippet(user_input)}\"."
        _last_quote_turn = _turn_counter
    if response_style == "reflective":
        response = f"{response} Perhaps we can explore why this feels so."
    return f"{empathy_line} {response}".strip()

def enrich_with_advice_and_questions(bot_reply, user_emotion, sentiment_label, advice_priority=False, user_input=None, response_style="empathetic", aspects=None, intent=None):
    global _last_advice_line, _last_question_line, _last_response_category, _turn_counter, _last_category_turn, _last_structure
    if intent == "closing":
        return bot_reply
    advice_templates = {
        "sadness": {
            "exam": ["Try a study schedule to boost your exam confidence.", "A short break might help you reset."],
            "marks": ["Review key concepts to improve your grades.", "Try a study schedule for better results."],
            "test": ["Focus on one topic at a time for your test.", "A tutor might help with tough concepts."],
            "study": ["Break study sessions into small, manageable chunks.", "Set small study goals to stay motivated."],
            "loneliness": ["Try connecting with a friend or loved one.", "Join a club to meet new people."],
            "workshops": ["Explore EDD’s resume and interview workshops.", "Check CalJOBS for EDD workshop schedules."],
            "default": ["A favorite activity might lift your mood.", "Journaling can help process your feelings."]
        },
        "fear": {
            "exam": ["Practice deep breathing to reduce exam anxiety.", "Break study sessions into smaller chunks."],
            "study": ["Try short study sessions to ease anxiety.", "Use flashcards to build confidence."],
            "workshops": ["EDD’s job search workshops might ease your concerns.", "Check CalJOBS for workshop details."],
            "default": ["Try the 5-4-3-2-1 grounding technique.", "Writing down worries can help."]
        },
        "anger": {
            "work": ["Take a short break to release work tension.", "Write a letter you don’t send to vent."],
            "workshops": ["EDD’s workshops might help channel your energy.", "Explore CalJOBS for job search support."],
            "default": ["A short walk can release tension.", "Try a creative activity to channel energy."]
        },
        "disgust": {
            "default": ["Take a moment to breathe and refocus.", "Try a calming activity to shift your mood."]
        }
    }
    question_templates = {
        "sadness": {
            "exam": ["What’s been hardest about your exams?", "Have you found any study methods that work?"],
            "marks": ["What’s been toughest about your grades?", "How are you feeling about your progress?"],
            "test": ["How’s your confidence after this test?", "What’s been toughest about preparing?"],
            "study": ["What’s challenging about your study routine?", "What study methods have you tried?"],
            "loneliness": ["What’s been making you feel lonely lately?", "Have you connected with anyone recently?"],
            "workshops": ["Which EDD workshops interest you most?", "What job skills do you want to improve?"],
            "default": ["What’s been challenging lately?", "What helps you feel better?"]
        },
        "fear": {
            "exam": ["What’s worrying you most about your exams?", "Have you tried any study strategies?"],
            "study": ["What’s making studying feel overwhelming?", "What helps you stay calm while studying?"],
            "workshops": ["What job concerns do EDD workshops address for you?", "Which skills do you want to learn?"],
            "default": ["What’s the main source of your worry?", "What might help you feel calmer?"]
        },
        "anger": {
            "work": ["What’s frustrating you most at work?", "What helps you unwind after a tough day?"],
            "workshops": ["Which EDD workshops might help your job search?", "What skills are you focusing on?"],
            "default": ["What’s the main source of your frustration?", "What helps you calm down?"]
        },
        "disgust": {
            "default": ["What’s been upsetting you lately?", "What helps you feel more at ease?"]
        },
        "default": {
            "default": ["What’s been challenging lately?", "What helps you cope?"]
        }
    }
    if response_style == "empathetic":
        return bot_reply
    primary_aspect = aspects[0]['aspect'] if aspects and aspects[0]['confidence'] >= 0.7 else "default"
    category = "advice" if response_style == "informative" or advice_priority else "question"
    if category == "advice":
        aspect_templates = advice_templates.get(user_emotion, advice_templates["sadness"]).get(primary_aspect, advice_templates["sadness"]["default"])
        possible_advice = [a for a in aspect_templates if a != _last_advice_line] or aspect_templates
        advice_line = random.choice(possible_advice)
        bot_reply = f"{advice_line} This might help because it focuses your efforts."
        _last_advice_line = advice_line
    elif category == "question":
        aspect_templates = question_templates.get(user_emotion, question_templates["default"]).get(primary_aspect, question_templates["default"]["default"])
        possible_questions = [q for q in aspect_templates if q != _last_question_line] or aspect_templates
        question_line = random.choice(possible_questions)
        bot_reply = f"{question_line} This could help us understand your needs."
        _last_question_line = question_line
    _last_structure = f"{response_style}+{_last_category_turn}"
    _last_response_category = category
    _last_category_turn = _turn_counter
    return bot_reply

# --- Main Generation Function ---
def generate_response(user_input, chat_history_ids=None, prev_user_messages=None, max_history_tokens=512):
    global _turn_counter, _last_full_response
    _turn_counter += 1

    try:
        if not user_input or not isinstance(user_input, str):
            raise ValueError("Invalid or empty user input")

        emotion_label, emotion_score = detect_emotion(user_input)
        sentiment_label, sentiment_score = detect_sentiment(user_input)
        intent = detect_intent(user_input)
        absa_results = detect_absa(user_input)
        emotion_label = reconcile_emotion_sentiment(emotion_label, sentiment_label, user_input, emotion_score)

        if not is_safe(user_input) or (prev_user_messages and any(not is_safe(msg) for msg in prev_user_messages)):
            safety_msg = (
                "I’m concerned about your safety. Please consider talking to a mental health professional "
                "or contacting a helpline for immediate support."
            )
            return safety_msg, chat_history_ids, emotion_label, emotion_score, sentiment_label, sentiment_score, absa_results

        retrieved_docs = retrieve_docs(user_input) if is_advice_question(user_input) or len(user_input.split()) > 5 or "edd" in user_input.lower() or "workshop" in user_input.lower() else []

        final_prompt = build_prompt(
            user_query=user_input,
            emotion=emotion_label,
            emotion_score=emotion_score,
            aspects=absa_results,
            intent=intent,
            prev_user_messages=prev_user_messages,
            retrieved_docs=retrieved_docs
        )

        input_encodings = chat_tokenizer(final_prompt + chat_tokenizer.eos_token, return_tensors="pt", truncation=True).to(device)
        if chat_history_ids is not None:
            bot_input_ids = torch.cat([chat_history_ids, input_encodings["input_ids"]], dim=-1)
            if bot_input_ids.shape[-1] > max_history_tokens:
                bot_input_ids = bot_input_ids[:, -max_history_tokens:]
        else:
            bot_input_ids = input_encodings["input_ids"]

        chat_history_ids = chat_model.generate(
            bot_input_ids,
            max_new_tokens=40,
            pad_token_id=chat_tokenizer.eos_token_id,
            do_sample=True,
            top_p=0.95,
            top_k=50,
            temperature=1.0,
            no_repeat_ngram_size=3
        )
        bot_reply = chat_tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True).strip()

        primary_aspect = absa_results[0]['aspect'] if absa_results and absa_results[0]['confidence'] >= 0.7 else "situation"
        if len(bot_reply.split()) < 4 or not re.search(r"[.!?]$", bot_reply):
            bot_reply = generate_fallback_response(
                user_input, retrieved_docs, primary_aspect, emotion_label, emotion_score, intent
            )

        bot_reply = add_empathy_filter(
            bot_reply, emotion_label, sentiment_score, emotion_score, user_input=user_input, response_style=previous_response_type, intent=intent
        )
        bot_reply = enrich_with_advice_and_questions(
            bot_reply, emotion_label, sentiment_label, advice_priority=is_advice_question(user_input),
            user_input=user_input, response_style=previous_response_type, aspects=absa_results, intent=intent
        )
        bot_reply = post_process_style(
            bot_reply, previous_response_type, primary_aspect, emotion_label, emotion_score, intent
        )

        if _last_full_response and bot_reply and SequenceMatcher(None, bot_reply, _last_full_response).ratio() > 0.75:
            bot_reply = f"Here’s another thought: {bot_reply}"

        _last_full_response = bot_reply or "I understand, let’s explore this further."

        return bot_reply, chat_history_ids, emotion_label, emotion_score, sentiment_label, sentiment_score, absa_results

    except Exception as e:
        print(f"⚠️ Response generation failed for input '{user_input}': {e}")
        fallback_reply = generate_fallback_response(
            user_input, [], "situation", "neutral", 0.0, intent
        )
        _last_full_response = fallback_reply
        return fallback_reply, chat_history_ids, "neutral", 0.0, "neutral", 0.0, [{'aspect': 'situation', 'sentiment': 'neutral', 'confidence': 0.0}]

# --- EVALUATE TEST DATASET WITH PIPELINE ---
def batch_process_inputs(inputs, batch_size=32):
    for i in range(0, len(inputs), batch_size):
        yield inputs[i:i + batch_size]

def evaluate_test_dataset_with_pipeline(dataset_path, model_path="/content/drive/MyDrive/AIchatbotmodels/mental_health_model/checkpoint-18000", max_conversations=1000, max_history_tokens=512):
    try:
        evaluator = ImprovedMentalHealthEvaluator()

        df = pd.read_csv(dataset_path)

        required_columns = ['input']
        if not all(col in df.columns for col in required_columns):
            raise ValueError("Dataset must contain 'input' column")

        df = df.head(max_conversations)

        has_reference = 'reference_response' in df.columns

        chat_history_ids = None
        prev_user_messages = []

        inputs = [str(row['input']) for _, row in df.iterrows() if row['input'] and isinstance(row['input'], str)]
        if not inputs:
            raise ValueError("No valid inputs found in dataset")

        for batch_inputs in batch_process_inputs(inputs, batch_size=32):
            batch_emotions = []
            batch_sentiments = []
            batch_intents = []
            batch_absa = []
            batch_docs = []
            for inp in batch_inputs:
                try:
                    batch_emotions.append(detect_emotion(inp))
                    batch_sentiments.append(detect_sentiment(inp))
                    batch_intents.append(detect_intent(inp))
                    batch_absa.append(detect_absa(inp))
                    batch_docs.append(retrieve_docs(inp) if is_advice_question(inp) or len(inp.split()) > 5 or "edd" in inp.lower() or "workshop" in inp.lower() else [])
                except Exception as e:
                    print(f"⚠️ Preprocessing failed for input '{inp}': {e}")
                    batch_emotions.append(("neutral", 0.0))
                    batch_sentiments.append(("neutral", 0.0))
                    batch_intents.append(None)
                    batch_absa.append([{'aspect': 'situation', 'sentiment': 'neutral', 'confidence': 0.0}])
                    batch_docs.append([])

            for user_input, (emotion_label, emotion_score), (sentiment_label, sentiment_score), intent, absa_results, retrieved_docs in zip(
                batch_inputs, batch_emotions, batch_sentiments, batch_intents, batch_absa, batch_docs
            ):
                start_time = time.time()

                try:
                    emotion_label = reconcile_emotion_sentiment(emotion_label, sentiment_label, user_input, emotion_score)

                    if not is_safe(user_input) or (prev_user_messages and any(not is_safe(msg) for msg in prev_user_messages)):
                        bot_reply = (
                            "I’m concerned about your safety. Please consider talking to a mental health professional "
                            "or contacting a helpline for immediate support."
                        )
                        chat_history_ids = None
                    else:
                        final_prompt = build_prompt(
                            user_query=user_input,
                            emotion=emotion_label,
                            emotion_score=emotion_score,
                            aspects=absa_results,
                            intent=intent,
                            prev_user_messages=prev_user_messages,
                            retrieved_docs=retrieved_docs
                        )

                        print(f"Input: {user_input}")
                        print(f"Prompt: {final_prompt}")
                        print(f"RAG Docs: {[doc[:100] + '...' for doc in retrieved_docs] if retrieved_docs else []}")
                        print(f"ABSA: {absa_results}")

                        input_encodings = chat_tokenizer(final_prompt + chat_tokenizer.eos_token, return_tensors="pt", truncation=True).to(device)
                        if chat_history_ids is not None:
                            bot_input_ids = torch.cat([chat_history_ids, input_encodings["input_ids"]], dim=-1)
                            if bot_input_ids.shape[-1] > max_history_tokens:
                                bot_input_ids = bot_input_ids[:, -max_history_tokens:]
                        else:
                            bot_input_ids = input_encodings["input_ids"]

                        chat_history_ids = chat_model.generate(
                            bot_input_ids,
                            max_new_tokens=40,
                            pad_token_id=chat_tokenizer.eos_token_id,
                            do_sample=True,
                            top_p=0.95,
                            top_k=50,
                            temperature=1.0,
                            no_repeat_ngram_size=3
                        )
                        bot_reply = chat_tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True).strip()

                        primary_aspect = absa_results[0]['aspect'] if absa_results and absa_results[0]['confidence'] >= 0.7 else "situation"
                        if len(bot_reply.split()) < 4 or not re.search(r"[.!?]$", bot_reply):
                            bot_reply = generate_fallback_response(
                                user_input, retrieved_docs, primary_aspect, emotion_label, emotion_score, intent
                            )

                        bot_reply = add_empathy_filter(
                            bot_reply, emotion_label, sentiment_score, emotion_score, user_input=user_input, response_style=previous_response_type, intent=intent
                        )
                        bot_reply = enrich_with_advice_and_questions(
                            bot_reply, emotion_label, sentiment_label, advice_priority=is_advice_question(user_input),
                            user_input=user_input, response_style=previous_response_type, aspects=absa_results, intent=intent
                        )
                        bot_reply = post_process_style(
                            bot_reply, previous_response_type, primary_aspect, emotion_label, emotion_score, intent
                        )

                        if _last_full_response and bot_reply and SequenceMatcher(None, bot_reply, _last_full_response).ratio() > 0.75:
                            bot_reply = f"Here’s another thought: {bot_reply}"

                        _last_full_response = bot_reply or "I understand, let’s explore this further."

                        print(f"Response: {bot_reply}")
                        print(f"Emotion: {emotion_label} ({emotion_score})")

                except Exception as e:
                    print(f"⚠️ Processing failed for input '{user_input}': {e}")
                    bot_reply = generate_fallback_response(
                        user_input, retrieved_docs, primary_aspect, emotion_label, emotion_score, intent
                    )
                    _last_full_response = bot_reply
                    print(f"Fallback Response: {bot_reply}")

                response_time = time.time() - start_time

                prev_user_messages.append(user_input)
                if len(prev_user_messages) > 3:
                    prev_user_messages.pop(0)

                reference_response = df.iloc[inputs.index(user_input)]['reference_response'] if has_reference else ""

                evaluator.add_conversation(
                    user_input=user_input,
                    bot_response=bot_reply,
                    reference_response=reference_response,
                    response_time=response_time
                )

        results = evaluator.evaluate_comprehensive()
        evaluator.display_results(results)

        return results

    except Exception as e:
        print(f"⚠️ Evaluation failed: {e}")
        return {}

# Run evaluation
if __name__ == "__main__":
    dataset_path = "/content/drive/MyDrive/AIchatbotmodels/test_dataset.csv"
    results = evaluate_test_dataset_with_pipeline(dataset_path, max_conversations=1000, max_history_tokens=512)
    print("\n✅ Evaluation complete. Results:", results)


Device set to use cuda:0
Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0
Device set to use cuda:0


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
User: apprehensive
Instruction: Provide actionable advice about situation. Use context if available.
Assistant:
RAG Docs: []
ABSA: [{'aspect': 'situation', 'sentiment': 'negative', 'confidence': 0.89}]
Response: A favorite activity might lift your mood. This might help because it focuses your efforts.
Emotion: fear (0.59)
Input: confident
Prompt: Respond in 8–12 words, addressing user’s input, emotion, and context.
Emotion: neutral (score: 0.81).Key terms: confident
User: confident
Instruction: Start with 'I understand.' Address situation.
Assistant:
RAG Docs: []
ABSA: [{'aspect': 'situation', 'sentiment': 'positive', 'confidence': 0.97}]
Response: I understand. I understand. Can you share more about what’s going on?
Emotion: neutral (0.81)
Input:  Our class doesn't have enough money . We already checked into it . 
Prompt: Respond in 8–12 words, addressing user’s input, emotion, and context.
Previous: confident
Emotion: n