<a href="https://colab.research.google.com/github/ArpanCharola/Emotion-Aware-Therapy-System/blob/main/RealTimeMoodDetection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Step 1: Install Hugging Face Transformers and Datasets
!pip install transformers datasets accelerate -q
print("Libraries installed successfully!")

Libraries installed successfully!


In [2]:
# Step 2: Load the Tokenizer and Model
from transformers import pipeline

# We select a known, efficient DistilBERT model for 7-class emotion detection.
# This model balances accuracy with low VRAM usage.
MODEL_NAME = "j-hartmann/emotion-english-distilroberta-base"

# Create a Hugging Face pipeline for text classification
# The pipeline handles all pre-processing and post-processing for us.
emotion_classifier = pipeline(
    "text-classification",
    model=MODEL_NAME,
    top_k=None # Instructs the pipeline to return all class scores
)

print(f"Model '{MODEL_NAME}' loaded successfully.")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/329M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/294 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/329M [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Device set to use cpu


Model 'j-hartmann/emotion-english-distilroberta-base' loaded successfully.


In [4]:
# Step 3: Test the Model with Therapy-Relevant Input

test_inputs = [
    "I'm so frustrated that I can't seem to make any progress on my goals this week.",
    "I finally had a good conversation with my sister and feel a huge sense of relief.",
    "I don't really have anything to talk about today. Everything is just... fine.",
    "The thought of going to that appointment makes my stomach clench with dread."
]

print("--- Model Predictions ---")
for text in test_inputs:
    # The emotion_classifier pipeline makes the prediction
    results = emotion_classifier(text)[0]

    # Sort the results to show the top 3 emotions most clearly
    top_3 = sorted(results, key=lambda x: x['score'], reverse=True)[:3]

    # Format the output to be readable
    print(f"\n[Input]: {text}")
    print("  [Top 3 Emotions]:")

    for rank, emotion in enumerate(top_3):
        # Format score as percentage
        score_percent = f"{emotion['score'] * 100:.1f}%"
        print(f"    {rank + 1}. {emotion['label'].capitalize():<10} Score: {score_percent}")
    print("-" * 20)

--- Model Predictions ---

[Input]: I'm so frustrated that I can't seem to make any progress on my goals this week.
  [Top 3 Emotions]:
    1. Anger      Score: 96.8%
    2. Sadness    Score: 1.7%
    3. Neutral    Score: 0.5%
--------------------

[Input]: I finally had a good conversation with my sister and feel a huge sense of relief.
  [Top 3 Emotions]:
    1. Joy        Score: 99.1%
    2. Sadness    Score: 0.3%
    3. Neutral    Score: 0.2%
--------------------

[Input]: I don't really have anything to talk about today. Everything is just... fine.
  [Top 3 Emotions]:
    1. Neutral    Score: 83.4%
    2. Joy        Score: 4.6%
    3. Sadness    Score: 3.9%
--------------------

[Input]: The thought of going to that appointment makes my stomach clench with dread.
  [Top 3 Emotions]:
    1. Fear       Score: 98.1%
    2. Neutral    Score: 0.6%
    3. Disgust    Score: 0.5%
--------------------


In [5]:
# Step 4: Prepare the foundation for the Personalized Adaptive Therapy (PAT) Model

# Import the necessary deep learning libraries (TensorFlow/Keras)
# These are often pre-installed in Colab, but we import them here to confirm.
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
import numpy as np

# --- 1. Define the Problem State ---
# This is the kind of data the PAT model will learn from.
# In a real app, this data would come from the Mood Detector and User Activity.

# Example Data Structure (Time-Series of a single user over several days)
# Features: [Mood_Score, Activity_Level, Previous_Action_Taken]
# Target: [Outcome_Mood_Change]

# Simplified Example: 10 days of user data
# (Mood: 0=Sadness, 1=Neutral, 2=Joy, Action: 0=None, 1=Journal, 2=Meditation)
input_data = np.array([
    [0.7, 0.5, 1], # Day 1: Mood(70% Sad), Activity(50%), Action(Journal)
    [0.5, 0.6, 2], # Day 2: Mood(50% Sad), Activity(60%), Action(Meditation)
    [0.4, 0.7, 1], # Day 3: Mood(40% Sad), Activity(70%), Action(Journal)
    [0.3, 0.8, 0], # Day 4: ...
    # ... In a real scenario, we'd have hundreds of these for many users
])

# --- 2. Define the Sequential Model Architecture (LSTM/GRU) ---
# This model is designed to remember the sequence of events.
pat_model = Sequential([
    # The LSTM layer remembers patterns over time (the sequence of days)
    # We use a TimeDistributed wrapper to process the sequence data correctly
    LSTM(64, activation='relu', input_shape=(None, input_data.shape[1]), return_sequences=False),
    Dropout(0.2),
    # Dense output layer: predicts one of the possible therapeutic actions (e.g., 5 total actions)
    Dense(5, activation='softmax')
])

pat_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

print("TensorFlow/Keras framework loaded.")
print("The foundational PAT (Sequential LSTM) model architecture is defined.")
pat_model.summary()

TensorFlow/Keras framework loaded.
The foundational PAT (Sequential LSTM) model architecture is defined.


  super().__init__(**kwargs)


In [6]:
# Step 5: Integrate Speech-to-Text (STT) and Text-to-Speech (TTS)
# This prepares the environment for the audio-to-audio requirement.

# 1. Speech-to-Text (STT) - User's Voice Input
# We use a highly efficient model like Distil-Whisper (a distilled version of OpenAI Whisper)
# to convert the user's spoken words into the text needed by our Mood Detector (DistilBERT).
# We define a placeholder for the STT pipeline creation here, as this model is slightly larger.
try:
    from transformers import pipeline as hf_pipeline

    # Placeholder for a lightweight STT model setup (e.g., Distil-Whisper)
    stt_pipeline_placeholder = "openai/whisper-tiny.en"  # Tiny is very fast and efficient
    print(f"STT Model Placeholder defined: {stt_pipeline_placeholder}")

except ImportError:
    print("Transformers library not fully loaded. Please re-run Step 1 if needed.")


# 2. Text-to-Speech (TTS) - System's Audio Response
# We need a TTS model to convert the PAT model's *text recommendation* back into speech.
# We will use a dedicated TTS library, as the full Transformer TTS models are too big for free Colab.
!pip install pyttsx3 -q

# We will use the pyttsx3 library as a local, CPU-based TTS solution for prototyping.
# This avoids needing a massive cloud-based TTS model and keeps the demo runnable on Colab's CPU.
import pyttsx3

print("TTS Prototyping library (pyttsx3) installed.")
print("\nEnvironment is now configured for both text-based ML and multimodal audio processing!")

STT Model Placeholder defined: openai/whisper-tiny.en
TTS Prototyping library (pyttsx3) installed.

Environment is now configured for both text-based ML and multimodal audio processing!


In [1]:
# Step 5: Training the LSTM and Making a First Prediction

# Import necessary libraries (Ensuring they are available in this cell)
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.utils import to_categorical
import numpy as np

# --- 1. Define Model and Data ---
# Features (per timestep): [Mood_Score, Activity_Level, Previous_Action_Taken]
X_train = np.array([
    # Simplified sequences of 3 steps (days/interactions)
    [[0.7, 0.5, 1], [0.5, 0.6, 2], [0.4, 0.7, 1]],
    [[0.2, 0.9, 0], [0.1, 0.9, 2], [0.3, 0.8, 2]],
    [[0.9, 0.2, 1], [0.8, 0.3, 1], [0.7, 0.4, 0]],
])

# Target Actions: 5 possible therapeutic actions
# [1, 4, 3] corresponds to the ideal action for each training sequence
Y_train = to_categorical([1, 4, 3], num_classes=5)

# Define the Sequential Model Architecture (The Therapy Guide)
pat_model = Sequential([
    # The LSTM layer learns patterns in the sequence of steps
    LSTM(64, activation='relu', input_shape=(None, X_train.shape[2]), return_sequences=False),
    Dropout(0.2),
    # Dense output predicts one of the 5 therapeutic actions
    Dense(5, activation='softmax')
])
pat_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# --- 2. Train the Model ---
print("Simulating quick training of the Sequential LSTM model...")
pat_model.fit(X_train, Y_train, epochs=10, verbose=0)
print("Training simulation complete. Model has learned from the dummy data.")

# --- 3. Make a Personalized Prediction ---
ACTION_MAP = {
    0: "Suggest a brief moment of silent reflection.",
    1: "Suggest a guided journaling prompt.",
    2: "Suggest a 5-minute compassion meditation.",
    3: "Suggest challenging a negative thought pattern (CBT technique).",
    4: "Suggest a 2-minute box breathing exercise."
}

# Simulate a new 3-step sequence for a current user
current_user_state = np.array([[[0.8, 0.4, 1], [0.7, 0.5, 0], [0.6, 0.4, 1]]])

# Get the model's prediction (a probability distribution over the 5 actions)
prediction = pat_model.predict(current_user_state, verbose=0)[0]
predicted_action_index = np.argmax(prediction)
predicted_action_text = ACTION_MAP[predicted_action_index]

print("\n--- Personalized Adaptive Therapy (PAT) Prediction ---")
print(f"Input State Size: {current_user_state.shape}")
print(f"Recommended Therapeutic Action: {predicted_action_text}")

  super().__init__(**kwargs)


Simulating quick training of the Sequential LSTM model...
Training simulation complete. Model has learned from the dummy data.

--- Personalized Adaptive Therapy (PAT) Prediction ---
Input State Size: (1, 3, 3)
Recommended Therapeutic Action: Suggest a 2-minute box breathing exercise.


In [2]:
#stt and tts
# Step 6: Finalizing the Multimodal Architecture

# Import core libraries (Ensuring they are available for this final integration)
import numpy as np
import requests
import json
import base64
import soundfile as sf
import io
import torch
from transformers import pipeline as hf_pipeline
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.utils import to_categorical
from IPython.display import Audio as DisplayAudio

# --- 1. Load All Models and Data Structures ---

# Mood Detector (DistilBERT)
MODEL_NAME = "j-hartmann/emotion-english-distilroberta-base"
emotion_classifier = hf_pipeline("text-classification", model=MODEL_NAME, top_k=None)

# Therapy Guide (LSTM) Architecture (Redefined for self-contained execution)
X_train = np.array([
    [[0.7, 0.5, 1], [0.5, 0.6, 2], [0.4, 0.7, 1]],
    [[0.2, 0.9, 0], [0.1, 0.9, 2], [0.3, 0.8, 2]],
    [[0.9, 0.2, 1], [0.8, 0.3, 1], [0.7, 0.4, 0]],
])
pat_model = Sequential([
    LSTM(64, activation='relu', input_shape=(None, X_train.shape[2]), return_sequences=False),
    Dropout(0.2),
    Dense(5, activation='softmax')
])
pat_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
pat_model.fit(X_train, to_categorical([1, 4, 3], num_classes=5), epochs=10, verbose=0)

# Action Map
ACTION_MAP = {
    0: "Suggest a brief moment of silent reflection.",
    1: "Suggest a guided journaling prompt.",
    2: "Suggest a 5-minute compassion meditation.",
    3: "Suggest challenging a negative thought pattern (CBT technique).",
    4: "Suggest a 2-minute box breathing exercise."
}

# --- 2. Audio Functions (Conceptual & API-Driven) ---

# STT (Audio to Text) - Conceptual Function for deployment
def transcribe_audio_input(audio_file_data):
    """
    Placeholder for a robust STT service call (e.g., Whisper API or self-hosted).

    In this prototype, we return a fixed simulation text due to Colab environment blocks.
    In production, this would use the Whisper-Tiny model.
    """
    print("\n[STT Simulation]: Using a fixed text input due to Colab audio I/O failure.")
    return "I feel really exhausted and sad today, and all my work feels pointless."

# TTS (Text to Audio) - Production API Call
def generate_tts_audio_api(text_to_speak, speaker_name="Kore"):
    """Converts text recommendation into playable audio via the Gemini TTS API."""
    API_KEY = ""
    TTS_MODEL = "gemini-2.5-flash-preview-tts"
    TTS_API_URL = f"https://generativelanguage.googleapis.com/v1beta/models/{TTS_MODEL}:generateContent?key={API_KEY}"

    payload = {
        "contents": [{"parts": [{"text": text_to_speak}]}],
        "generationConfig": {
            "responseModalities": ["AUDIO"],
            "speechConfig": {"voiceConfig": {"prebuiltVoiceConfig": {"voiceName": speaker_name}}}
        }
    }

    try:
        response = requests.post(TTS_API_URL, headers={'Content-Type': 'application/json'}, data=json.dumps(payload))
        response.raise_for_status()

        result = response.json()
        audio_data_base64 = result['candidates'][0]['content']['parts'][0]['inlineData']['data']
        pcm_data = base64.b64decode(audio_data_base64)
        sample_rate = 24000
        pcm_16bit = np.frombuffer(pcm_data, dtype=np.int16)

        wav_io = io.BytesIO()
        sf.write(wav_io, pcm_16bit, sample_rate, format='WAV', subtype='PCM_16')
        wav_io.seek(0)
        return wav_io.read()
    except Exception as e:
        # Expected error in Colab due to 403 Forbidden on external API call
        return None

# --- 3. Full Multimodal Pipeline Function ---

def run_therapy_cycle(history_sequence):
    """Executes the full chain: Audio-In -> Mood -> PAT -> Audio-Out"""

    # 1. AUDIO-TO-TEXT (STT)
    # We pass None as audio data since we are using the simulation placeholder
    transcribed_text = transcribe_audio_input(None)

    # 2. TEXT-TO-EMOTION (DistilBERT)
    mood_prediction = emotion_classifier(transcribed_text)[0]
    sadness_score = next((e['score'] for e in mood_prediction if e['label'] == 'sadness'), 0.0)
    top_emotion = sorted(mood_prediction, key=lambda x: x['score'], reverse=True)[0]

    # 3. EMOTION-TO-ACTION (LSTM PAT)
    # Simulate a current activity level (e.g., 0.5) and previous action (extracted from history)
    current_activity = 0.5
    new_state_input = np.array([sadness_score, current_activity, np.argmax(history_sequence[0][-1])])
    pat_input = np.array([history_sequence[0][1:].tolist() + [new_state_input.tolist()]])

    prediction_vector = pat_model.predict(pat_input, verbose=0)[0]
    predicted_action_index = np.argmax(prediction_vector)
    predicted_action_text = ACTION_MAP[predicted_action_index]

    # 4. TEXT-TO-AUDIO (TTS)
    audio_output = generate_tts_audio_api(predicted_action_text)

    return transcribed_text, top_emotion, predicted_action_text, audio_output

# --- 4. EXECUTION ---

# Initialize a dummy 3-step history sequence
DUMMY_HISTORY_SEQUENCE = np.array([
    [[0.7, 0.5, 1], [0.5, 0.6, 2], [0.4, 0.7, 1]]
])[np.newaxis, ...] # Ensure correct shape (1, 3, 3)

print("==============================================================")
print("     FINAL ARCHITECTURE TEST: MULTIMODAL THERAPY CYCLE")
print("==============================================================")

stt_text, top_emotion, pat_recommendation, therapy_audio = run_therapy_cycle(DUMMY_HISTORY_SEQUENCE)

print(f"\n[1] User Said (STT Conceptual): {stt_text}")
print(f"[2] Emotion Detected (DistilBERT): {top_emotion['label']} ({top_emotion['score']*100:.1f}%)")
print(f"[3] Recommended Action (LSTM PAT): {pat_recommendation}")
print("-" * 60)


if therapy_audio:
    print("STATUS: Audio Output Successful!")
    display(DisplayAudio(therapy_audio, rate=24000))
else:
    print("STATUS: Audio Output Failed (Expected in Colab - API call blocked).")
    print("The system successfully formulated the response, but could not convert it to audio here.")

print("==============================================================")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/329M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/329M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/294 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Device set to use cpu
  super().__init__(**kwargs)


     FINAL ARCHITECTURE TEST: MULTIMODAL THERAPY CYCLE

[STT Simulation]: Using a fixed text input due to Colab audio I/O failure.

[1] User Said (STT Conceptual): I feel really exhausted and sad today, and all my work feels pointless.
[2] Emotion Detected (DistilBERT): sadness (98.9%)
[3] Recommended Action (LSTM PAT): Suggest a 2-minute box breathing exercise.
------------------------------------------------------------
STATUS: Audio Output Failed (Expected in Colab - API call blocked).
The system successfully formulated the response, but could not convert it to audio here.
