# CBT Therapist AI Model Development Report


# 1.1 Project Introduction
What we're doing: Setting up the development environment and installing all necessary libraries for building a Cognitive Behavioral Therapy (CBT) AI assistant.

Why we're doing this: Creating an AI therapist requires specialized NLP libraries for emotion detection, language understanding, and response generation. We need transformers for the base model, PEFT for efficient fine-tuning, and various supporting libraries for emotional intelligence.

In [1]:
print(" LOADING ULTIMATE CBT THERAPIST MODEL - COMPETITION EDITION")
print("="*80)

# Enhanced Library Installation
print("--- Step 1: Installing Enhanced Libraries ---")
!pip install -q -U transformers accelerate peft bitsandbytes trl datasets
!pip install -q textstat vaderSentiment sentence-transformers
!pip install -q scikit-learn numpy pandas matplotlib seaborn
# RAG ADDITION: Install chromadb for vector DB
!pip install -q chromadb
print("✅ All Libraries Installed!")

 LOADING ULTIMATE CBT THERAPIST MODEL - COMPETITION EDITION
--- Step 1: Installing Enhanced Libraries ---
✅ All Libraries Installed!


## 2.1 Library Imports and Configuration
What we're doing: Importing all necessary libraries and setting up authentication for Hugging Face and Google Drive.

Why we're doing this: We need access to pre-trained models from Hugging Face and storage capabilities from Google Drive. The imports are organized by functionality for better code organization.



In [2]:
# Advanced Imports
import torch
import transformers
import numpy as np
import pandas as pd
import json
import re
import random
from collections import defaultdict, deque
from datetime import datetime
import matplotlib.pyplot as plt
import os # Import os module

from google.colab import drive
from datasets import load_dataset, Dataset
from huggingface_hub import login
from transformers import (
    AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig,
    TrainerCallback, TrainingArguments, EarlyStoppingCallback
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from sentence_transformers import SentenceTransformer
import textstat

# RAG ADDITIONS: chromadb imports
import chromadb
from chromadb.utils import embedding_functions

# 3. Advanced Rule-Based Heuristic Engine with Intensity Scoring

This `EmotionalIntelligenceEngine` is an enhanced implementation that performs a multi-layered analysis of user input using a rule-based, heuristic approach.

*   **Graded Emotional Analysis:** The engine now moves beyond binary keyword matching. It counts occurrences of both base `keywords` and specific `intensity_markers` to classify each detected emotion with an intensity level ('mild', 'moderate', or 'severe').
*   **Complexity Metric:** A heuristic `complexity_score` is calculated by summing the count of unique detected emotions and cognitive distortions. This serves as a proxy for the user's cognitive load.
*   **Cognitive Distortion Mapping:** Continues to use pattern matching to identify common CBT cognitive distortions from a predefined dictionary.
*   **Rich Structured Output:** The method returns a more detailed dictionary containing the new `emotion_intensities` and `complexity_score` fields, enabling more nuanced downstream decision-making.


In [3]:
# Advanced Emotional Intelligence Engine
class EmotionalIntelligenceEngine:
    def __init__(self): # Corrected __init__ method
        self.sentiment_analyzer = SentimentIntensityAnalyzer()
        self.emotion_patterns = {
            'anxiety': {
                'keywords': ['worried', 'anxious', 'scared', 'panic', 'nervous', 'fear', 'stress'],
                'intensity_markers': ["can't stop", 'constantly', 'overwhelming', 'paralyzed']
            },
            'depression': {
                'keywords': ['sad', 'hopeless', 'empty', 'worthless', 'tired', 'meaningless'],
                'intensity_markers': ['always', 'never', 'nothing matters', 'no point']
            },
            'anger': {
                'keywords': ['angry', 'frustrated', 'furious', 'annoyed', 'irritated', 'rage'],
                'intensity_markers': ["so angry", "can't stand", 'hate', 'sick of']
            },
            'grief': {
                'keywords': ['loss', 'died', 'miss', 'gone', 'funeral', 'bereaved'],
                'intensity_markers': ["devastating", "can't cope", 'unbearable', 'lost everything']
            },
            'trauma': {
                'keywords': ['flashback', 'nightmare', 'triggered', 'ptsd', 'abuse', 'accident'],
                'intensity_markers': ["haunted", "can't forget", 'reliving', 'terrified']
            },
            'relationships': {
                'keywords': ['relationship', 'partner', 'marriage', 'divorce', 'breakup', 'lonely'],
                'intensity_markers': ["falling apart", "can't trust", 'abandoned', 'isolated']
            }
        }

        self.cognitive_distortions = {
            'all_or_nothing': ['always', 'never', 'completely', 'totally', 'everything', 'nothing'],
            'catastrophizing': ['disaster', 'terrible', 'awful', 'end of world', 'ruined'],
            'mind_reading': ['they think', 'everyone believes', 'people assume'],
            'fortune_telling': ['will never', 'going to fail', "won't work", 'bound to'],
            'emotional_reasoning': ['feel like', 'seems like', 'must be because I feel'],
            'should_statements': ['should', 'must', 'ought to', 'have to'],
            'labeling': ["I am", "he is", "she is"] + ['stupid', 'failure', 'loser', 'worthless'],
            'personalization': ["my fault", "because of me", "I caused", "I'm responsible"]
        }

    def analyze_emotional_state(self, text):
        text_lower = text.lower()

        # Sentiment analysis
        sentiment_scores = self.sentiment_analyzer.polarity_scores(text)

        # Emotion detection
        detected_emotions = []
        emotion_intensities = {}

        for emotion, patterns in self.emotion_patterns.items():
            keyword_matches = sum(1 for keyword in patterns['keywords'] if keyword in text_lower)
            intensity_matches = sum(1 for marker in patterns['intensity_markers'] if marker in text_lower)

            if keyword_matches > 0:
                intensity = 'mild'
                if intensity_matches > 0 or keyword_matches > 2:
                    intensity = 'severe'
                elif keyword_matches > 1:
                    intensity = 'moderate'

                detected_emotions.append(emotion)
                emotion_intensities[emotion] = intensity

        # Cognitive distortion detection
        detected_distortions = []
        for distortion, patterns in self.cognitive_distortions.items():
            if any(pattern in text_lower for pattern in patterns):
                detected_distortions.append(distortion)

        # Crisis indicators
        crisis_keywords = ['suicide', 'kill myself', 'end it all', 'want to die', 'hurt myself', 'self harm']
        crisis_level = 'high' if any(keyword in text_lower for keyword in crisis_keywords) else 'low'

        return {
            'primary_emotions': detected_emotions[:2],  # Top 2 emotions
            'emotion_intensities': emotion_intensities,
            'sentiment': sentiment_scores,
            'cognitive_distortions': detected_distortions,
            'crisis_level': crisis_level,
            'complexity_score': len(detected_emotions) + len(detected_distortions)
        }

# 4. Implementing a Rule-Based Strategy Selection Engine

**Purpose:** To dynamically select the most appropriate, clinically-informed therapeutic strategy based on the multi-faceted output of the `EmotionalIntelligenceEngine`.

**Design Rationale:**

A simple, rule-based engine was deliberately chosen for this critical task due to its **safety, predictability, and interpretability**. Different emotional states require different therapeutic approaches, and this engine enforces that logic reliably.

1.  **Prioritized Triage Logic:** The core of the engine is a decision tree (`if/elif/else` structure) that mimics clinical triage. It first checks for the most urgent condition—a high `crisis_level`. This ensures that safety and stabilization are **always** the absolute top priority, overriding all other emotional indicators. This deterministic approach is crucial for safety.

2.  **From Analysis to Action:** This engine acts as the bridge between analysis and generation. It consumes the rich, structured data from the `EmotionalIntelligenceEngine` (emotions, distortions) and translates it into a single, actionable strategy. This ensures the AI's response is not just a random string of text but a purposeful intervention.

3.  **Modular and Extensible Strategy Library:** All therapeutic approaches are defined in a simple dictionary (`self.therapy_approaches`). This design is highly modular:
    *   It **decouples the decision logic** (the `select_strategy` method) **from the strategy definitions**.
    *   This makes it incredibly easy to add new therapeutic strategies (e.g., `'mindfulness_focused'`) in the future without rewriting the core decision-making code.

4.  **Safe Fallback Mechanism:** The engine includes a default return value (`'cognitive_restructuring'`). This ensures that even if the user's input is ambiguous and doesn't trigger any other rule, the system can always select a safe, generally helpful, and core CBT strategy.


In [4]:
# Advanced CBT Response Strategy Engine
class CBTResponseStrategyEngine:
    def __init__(self): # Corrected __init__ method
        self.therapy_approaches = {
            'crisis_intervention': {
                'priority': 'immediate safety and stabilization',
                'techniques': ['grounding', 'safety planning', 'crisis resources'],
                'tone': 'calm, directive, supportive'
            },
            'anxiety_focused': {
                'priority': 'worry reduction and coping strategies',
                'techniques': ['breathing exercises', 'cognitive restructuring', 'exposure concepts'],
                'tone': 'gentle, reassuring, educational'
            },
            'depression_focused': {
                'priority': 'behavioral activation and mood improvement',
                'techniques': ['activity scheduling', 'thought records', 'self-compassion'],
                'tone': 'warm, encouraging, patient'
            },
            'trauma_informed': {
                'priority': 'safety, stabilization, processing',
                'techniques': ['grounding', 'window of tolerance', 'narrative therapy'],
                'tone': 'careful, validating, empowering'
            },
            'relationship_focused': {
                'priority': 'communication and boundary setting',
                'techniques': ['interpersonal skills', 'boundary setting', 'attachment'],
                'tone': 'balanced, insightful, practical'
            },
            'cognitive_restructuring': {
                'priority': 'identifying and challenging thoughts',
                'techniques': ['thought challenging', 'evidence examination', 'balanced thinking'],
                'tone': 'collaborative, curious, logical'
            }
        }

    def select_strategy(self, emotional_analysis):
        if emotional_analysis['crisis_level'] == 'high':
            return 'crisis_intervention'

        primary_emotions = emotional_analysis['primary_emotions']

        if 'trauma' in primary_emotions:
            return 'trauma_informed'
        elif 'anxiety' in primary_emotions:
            return 'anxiety_focused'
        elif 'depression' in primary_emotions:
            return 'depression_focused'
        elif 'relationships' in primary_emotions:
            return 'relationship_focused'
        elif emotional_analysis['cognitive_distortions']:
            return 'cognitive_restructuring'
        else:
            return 'cognitive_restructuring'  # Default approach

# 4.Implementing a Local RAG Pipeline for Knowledge Augmentation

**Purpose:** To enhance the Large Language Model's responses by grounding them in a reliable, external knowledge base. This mitigates model hallucinations, provides up-to-date information, and allows for specialized domain knowledge.

**Design Rationale:**

1.  **Choice of Vector Database (`ChromaDB`):**
    *   **Reasoning:** `ChromaDB` was selected as the vector store for its simplicity and efficiency in a development environment. The `PersistentClient` allows the database to be saved to disk, meaning the knowledge base is preserved between sessions without requiring a dedicated server. This makes the entire application self-contained and portable.

2.  **Choice of Embedding Model (`all-MiniLM-L6-v2`):**
    *   **Reasoning:** This `SentenceTransformer` model represents an excellent balance of performance and size. It is highly effective at capturing the semantic meaning of text and converting it into dense vectors for similarity search. Its small footprint ensures that the embedding process is fast and does not require significant computational resources.

3.  **Robust Initialization and Connection:**
    *   **Reasoning:** The `__init__` method is designed to be resilient. The `get_or_create_collection` command ensures that the engine can seamlessly connect to an existing knowledge base or initialize a new one if it doesn't exist. The additional `try...except` block provides a fallback mechanism, making the startup process more robust by attempting to connect to any available collection if the primary one is empty.

4.  **Decoupled Retrieval Logic:**
    *   **Reasoning:** The `retrieve_relevant_knowledge` method encapsulates the core retrieval logic. It takes a simple text query and returns a clean, structured list of dictionaries. This clean interface decouples the complexities of vector search from the main generation engine, which simply needs to request and receive knowledge snippets. The error handling ensures that if the query fails for any reason, it returns an empty list instead of crashing the application.


In [5]:
# RAG ADDITION: A simple ChromaDB-based RAG engine
class RAGKnowledgeEngine: # Renamed to RAGKnowledgeEngine to avoid conflict
    def __init__(self, persist_path="rag_chroma_db", collection_name="cbt_knowledge", embedding_model="all-MiniLM-L6-v2"): # Corrected __init__ method
        self.persist_path = persist_path
        self.collection_name = collection_name
        self.embedding_model = embedding_model
        self.client = chromadb.PersistentClient(path=self.persist_path)
        self.embedding_fn = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=self.embedding_model)
        # Try to get or create the target collection
        self.collection = self.client.get_or_create_collection(name=self.collection_name, embedding_function=self.embedding_fn)
        # If empty, try to fallback to any existing collection
        try:
            if hasattr(self.collection, "count") and self.collection.count() == 0:
                cols = self.client.list_collections()
                if cols:
                    # Prefer the first available collection
                    fallback = cols[0]
                    try:
                        self.collection = self.client.get_collection(fallback.name, embedding_function=self.embedding_fn)
                    except Exception:
                        self.collection = self.client.get_or_create_collection(name=fallback.name, embedding_function=self.embedding_fn)
        except Exception:
            pass

    def retrieve_relevant_knowledge(self, query_text, k=4): # Renamed method
        try:
            res = self.collection.query(query_texts=[query_text], n_results=k)
            docs = res.get("documents", [[]])[0] if res else []
            metas = res.get("metadatas", [[]])[0] if res else []
            # Return a list of dictionaries for consistency
            return [{'content': doc, 'metadata': meta} for doc, meta in zip(docs, metas)]
        except Exception:
            return []

# 6.Implementing an Advanced Instruction-Based Dataset Creator

**Purpose:** To transform a raw conversational dataset into a high-quality, instruction-formatted dataset that explicitly teaches the model the underlying therapeutic reasoning process.

**Design Rationale:**

This class is the cornerstone of the model's performance. It moves beyond simple imitation and implements a form of **"process-supervised" learning**, where the model is taught the steps to arrive at a good answer.

1.  **Instruction Fine-Tuning Format (`<SYS>` and `[INST]`):**
    *   **Reasoning:** The `create_ultimate_prompt` method is the core innovation. It uses a formal instruction-following format (popularized by models like Llama and Mistral). By placing the entire clinical assessment and persona instructions inside a **System Prompt (`<<SYS>>...<</SYS>>`)**, we create a clear separation between the model's "internal thoughts" and the user's turn. This prevents the model from getting confused and outputting its own instructions during a real conversation.

2.  **Simulating Chain-of-Thought (CoT):**
    *   **Reasoning:** The system prompt doesn't just tell the model to be a therapist; it provides the *exact analysis* for that specific example (emotions, distortions, strategy). This is a powerful technique. We are showing the model the **chain of thought** that a human therapist would follow: `Analysis -> Strategy -> Tone -> Response`. By training on thousands of these examples, the model learns to implicitly generate this chain of thought for new, unseen inputs at inference time.

3.  **High-Quality Seed Data (`master_examples`):**
    *   **Reasoning:** The quality of a fine-tuned model is capped by the quality of its best training examples. These hand-crafted `master_examples` act as powerful anchors, demonstrating the ideal tone, empathy, and structure. They ensure the model has a strong foundation of "what great looks like."

4.  **Heuristic-Driven Data Augmentation:**
    *   **Reasoning:** The logic to create an `alt_prompt` by swapping the primary emotion focus is a simple but effective data augmentation technique. It teaches the model that complex emotional states can be approached from multiple angles, improving its flexibility and generalization for users expressing multiple concerns.

5.  **Output Standardization (`enhance_output_quality`):**
    *   **Reasoning:** This method ensures that all target responses in the dataset are structurally sound and adhere to best practices (e.g., ending with a collaborative question). This consistency in the target data leads to more reliable and predictable behavior from the trained model.


In [6]:
# Advanced Dataset Creator (IMPROVED)
class AdvancedDatasetCreator:
    def __init__(self): # Corrected __init__ method
        self.ei_engine = EmotionalIntelligenceEngine()
        self.strategy_engine = CBTResponseStrategyEngine()
        # Master therapist examples remain the same
        self.master_examples = [

            # ... (your master examples from the original code go here) ...
            {
                'input': "I'm feeling very sad and don't know what to do",
                'output': "I can hear the sadness in your words, and I want you to know that it takes real courage to reach out when you're feeling this way. Sadness can feel so heavy and overwhelming, especially when we don't know where to turn. Your feelings are completely valid and important. What you're experiencing right now - this sadness - it's telling us that something matters deeply to you. Can you help me understand a little more about what's been weighing on your heart lately?"
            },
            {
                'input': "nobody cares about me and I'm all alone",
                'output': "Oh, that sounds so incredibly lonely and painful. I can feel the ache in your words, and I want you to know that this feeling of being uncared for is absolutely heartbreaking to experience. When we're in this kind of emotional pain, our mind often becomes really good at noticing evidence that supports these lonely thoughts while filtering out signs of care that might exist. It's like wearing glasses that only let us see certain things. Right now, even in sharing this with me, you're reaching out - and that tells me something important about your strength and your desire for connection. Has there been anyone in your life, maybe even from the past, who showed you care in their own way?"
            },
            {
                'input': "I'm having panic attacks and I can't control them",
                'output': "Panic attacks can feel absolutely terrifying and overwhelming - like your body is completely out of control. I want you to know that what you're experiencing is very real, and you're not going crazy. These attacks, as frightening as they are, are actually your body's alarm system working overtime, trying to protect you from danger that isn't actually there. Let me share something that can help in the moment: try the 5-4-3-2-1 grounding technique. Notice 5 things you can see, 4 things you can touch, 3 things you can hear, 2 things you can smell, and 1 thing you can taste. This helps tell your nervous system that you're safe right here, right now. Can you tell me what tends to happen right before these attacks start?"
            },
            {
                'input': "I feel worthless and like nothing I do matters",
                'output': "I can feel the deep pain in those words, and my heart truly goes out to you. Feeling worthless is one of the most agonizing experiences a person can have, and I want you to know that this feeling, while very real, is not the truth about who you are. When we're struggling emotionally, our mind can become like a harsh inner critic, telling us terrible things about ourselves that we would never say to someone we love. The thought 'I'm worthless' and 'nothing I do matters' - these are actually symptoms of emotional pain, not facts about your life or your value. Even in this moment, sharing your pain with me, you're doing something that matters. You're being honest, vulnerable, and reaching for help - that takes tremendous courage. What's been the hardest part about carrying these feelings?"
            },
            {
                'input': "everything feels hopeless and I can't see a way forward",
                'output': "I can hear the exhaustion and despair in your words, and I want you to know that feeling this hopeless is incredibly difficult to bear. When everything feels hopeless, it's like being in a dark tunnel where you can't see any light ahead - and that is truly frightening and isolating. What you're feeling right now is very real, and I don't want to minimize that pain. At the same time, I want you to know that hopelessness is often a symptom of depression or overwhelming stress, and it can make us see our situation through a very dark filter. The fact that you're here, talking to me about this, tells me that somewhere inside, even a tiny part of you is still reaching for something different. That takes incredible strength. Right now, we don't need to figure out the whole path forward - we just need to focus on this moment, and then the next one. What feels like the heaviest weight you're carrying right now?"
            },
            {
                'input': "I can't stop worrying about everything and it's exhausting",
                'output': "That sounds absolutely exhausting - having your mind constantly racing with worries is like having a car engine revving all the time. Your brain is trying so hard to solve and prevent problems, but instead of feeling safer, you end up feeling more overwhelmed and tired. This kind of persistent worry is actually very common, and it often comes from a part of us that's trying to feel in control when life feels uncertain. The challenge is that worry tricks us into thinking we're doing something productive when we're actually just suffering through problems that may never happen. One thing that can help is setting aside 'worry time' - maybe 15 minutes a day where you let yourself worry fully, and when worries pop up otherwise, you gently remind yourself to save them for worry time. What kinds of situations tend to trigger the most worry for you?"
            },
            {
                'input': "I hate myself and think I'm a terrible person",
                'output': "I hear such deep pain in those words, and my heart truly aches for what you're going through right now. Self-hatred is one of the most brutal experiences we can have - it's like having a cruel enemy living inside your own head, constantly attacking you. I want you to know that you are not a terrible person. The fact that you're even concerned about being a good person tells me something important about your character and your heart. When we hate ourselves, we often focus intensely on our mistakes or shortcomings while completely ignoring our good qualities, our efforts, and our humanity. It's like having a spotlight that only shines on what we perceive as wrong with us. You deserve the same compassion and kindness that you would offer to a friend who was struggling. If someone you cared about came to you saying they hated themselves, what would you want them to know?"
            }
        ]

    def create_ultimate_prompt(self, example, analysis=None):
        """
        CHANGED: This function now uses a standardized instruction-following format.
        REASON: This format clearly separates the system instructions (the therapist's persona and context)
        from the user's message, which prevents the model from getting confused and outputting
        the instructional text.
        """
        if analysis is None:
            analysis = self.ei_engine.analyze_emotional_state(example['input'])

        strategy = self.strategy_engine.select_strategy(analysis)
        approach_info = self.strategy_engine.therapy_approaches[strategy]

        # System prompt: Contains the context and instructions for the model's persona.
        system_prompt = f"""You are a master CBT therapist with 25+ years of experience, specializing in {strategy.replace('_', ' ')}. Your therapeutic approach is guided by the following clinical assessment of the user's message:
Primary Emotions: {', '.join(analysis['primary_emotions']) if analysis['primary_emotions'] else 'mixed presentation'}
Detected Cognitive Distortions: {', '.join(analysis['cognitive_distortions'][:3]) if analysis['cognitive_distortions'] else 'none identified'}
Therapeutic Focus: {approach_info['priority']}
Therapeutic Tone: {approach_info['tone']}
Your goal is to provide a response that is validating, insightful, and offers a clear, collaborative next step. Be empathetic and professional."""

        # This formats the prompt in a way models like Phi-2 understand well.
        # We wrap the user's input in [INST] tags and the desired output is what follows.
        formatted_prompt = f"""<s>[INST] <<SYS>>
{system_prompt}
<</SYS>>

{example['input']} [/INST]
{example['output']} </s>"""

        return formatted_prompt

    # The rest of the class (augment_dataset, enhance_output_quality) can remain the same.
    # ... (paste your augment_dataset and enhance_output_quality methods here) ...
    def augment_dataset(self, raw_data):
        print("🧠 Creating Ultimate CBT Dataset with Advanced Psychology...")

        enhanced_data = []

        # Add master examples first
        for master_ex in self.master_examples:
            enhanced_data.append({
                'text': self.create_ultimate_prompt(master_ex)
            })

        # Process original data with advanced analysis
        for idx, example in enumerate(raw_data):
            if idx % 50 == 0:
                print(f"Processing example {idx+1}/{len(raw_data)}")

            # Clean and enhance output
            enhanced_output = self.enhance_output_quality(example['output'])

            # Create analysis-driven example
            analysis = self.ei_engine.analyze_emotional_state(example['input'])

            enhanced_example = {
                'input': example['input'].strip(),
                'output': enhanced_output
            }

            prompt = self.create_ultimate_prompt(enhanced_example, analysis)
            enhanced_data.append({'text': prompt})

            # Create variation with different strategy (data augmentation)
            if len(analysis['primary_emotions']) > 1:
                # Swap primary emotion focus for variation
                alt_analysis = analysis.copy()
                if len(alt_analysis['primary_emotions']) > 1:
                    alt_analysis['primary_emotions'] = alt_analysis['primary_emotions'][::-1]

                alt_prompt = self.create_ultimate_prompt(enhanced_example, alt_analysis)
                enhanced_data.append({'text': alt_prompt})

        print(f"✅ Enhanced dataset created: {len(enhanced_data)} examples")
        return enhanced_data

    def enhance_output_quality(self, original_output):
        """Enhance the quality of therapist responses"""
        output = original_output.strip()

        # Ensure proper sentence ending
        if not output.endswith(('.', '!', '?')):
            output += '.'

        # Add collaborative ending if missing
        collaborative_endings = [
            "What are your thoughts on this perspective?",
            "How does this resonate with you?",
            "What would you like to explore further?",
            "What feels most helpful to focus on right now?"
        ]

        # Check if response is too short or lacks collaboration
        if len(output.split()) < 30 or not any(word in output.lower() for word in ['you', 'your', 'what', 'how']):
            output += f" {random.choice(collaborative_endings)}"

        return output


# 7.Implementing an Advanced Custom Training Callback for Quality Control

**Purpose:** To gain fine-grained control over the training loop, implementing a more sophisticated stopping strategy than the default callbacks. This ensures we stop training at the optimal point, saving time and preventing overfitting.

**Design Rationale:**

A standard `EarlyStoppingCallback` is good, but it only looks at one metric (like evaluation loss). This custom callback introduces a more intelligent, multi-faceted approach to supervising the training process.

1.  **Heuristic-Based Quality Metric (`calculate_response_quality`):**
    *   **Reasoning:** Quantitative metrics like "loss" don't capture the full picture of what makes a therapeutic response *good*. This function attempts to create a **qualitative, proxy metric**. It operationalizes clinical best practices by checking for the presence of language related to validation, collaboration, hope, and cognitive exploration. While not used for stopping in this version of the `on_log` function, it's a powerful tool for evaluating generated text during or after training.

2.  **Advanced Early Stopping Logic (`on_log`):**
    *   **Reasoning:** This is the core of the callback. It implements a **dual-criterion stopping mechanism** to find the "sweet spot" of training.
        *   **Patience-Based Stopping:** This is a classic technique to prevent wasting compute resources. If the model's training loss stops improving for a set number of steps (`patience`), it indicates a learning plateau, and the callback wisely halts the process.
        *   **Absolute Performance Threshold:** The `if current_loss <= 0.3:` check is a crucial optimization. In fine-tuning, reaching an extremely low loss can be a sign that the model is beginning to overfit and simply memorize the training data. By setting an absolute "good enough" threshold, we can stop the training when the model has clearly mastered the dataset's patterns, preserving its ability to generalize to new, unseen inputs.

3.  **Comprehensive Logging and Reporting (`on_train_end`):**
    *   **Reasoning:** Good MLOps (Machine Learning Operations) requires good record-keeping. The `on_train_end` hook provides a clean, final summary of the most important metrics from the training run. This makes it much easier to compare different experiments and track the model's progress without having to manually parse through long log files.


In [7]:
# Advanced Training Callbacks
class UltimateTrainingCallback(TrainerCallback):
    def __init__(self, quality_threshold=0.9, patience=15, min_steps=60): # Corrected __init__ method
        self.quality_threshold = quality_threshold
        self.patience = patience
        self.min_steps = min_steps
        self.best_loss = float('inf')
        self.patience_counter = 0
        self.quality_scores = []
        self.training_history = []

    def calculate_response_quality(self, response):
        """Advanced quality scoring system"""
        quality_metrics = {
            'validation_words': ['understand', 'hear you', 'makes sense', 'valid', 'normal', 'common'],
            'cognitive_words': ['thought', 'thinking', 'belief', 'pattern', 'assumption', 'perspective'],
            'therapeutic_words': ['explore', 'consider', 'notice', 'aware', 'recognize', 'identify'],
            'collaborative_words': ['together', 'we can', 'what if', 'how about', 'would you'],
            'hope_words': ['possible', 'can', 'able', 'strength', 'progress', 'growth'],
            'technique_words': ['exercise', 'practice', 'strategy', 'tool', 'technique', 'skill']
        }

        response_lower = response.lower()
        total_score = 0

        for category, words in quality_metrics.items():
            category_score = min(sum(1 for word in words if word in response_lower) * 0.1, 0.2)
            total_score += category_score

        # Bonus for appropriate length (50-200 words)
        word_count = len(response.split())
        if 50 <= word_count <= 200:
            total_score += 0.1

        # Penalty for too short responses
        if word_count < 30:
            total_score -= 0.2

        return min(total_score, 1.0)

    def on_log(self, args, state, control, logs=None, **kwargs):
        if logs is None or state.global_step < self.min_steps:
            return control

        current_loss = logs.get("train_loss", logs.get("loss", float('inf')))

        self.training_history.append({
            'step': state.global_step,
            'loss': current_loss,
            'learning_rate': logs.get("learning_rate", 0)
        })

        # Improved loss tracking
        if current_loss < self.best_loss:
            self.best_loss = current_loss
            self.patience_counter = 0
            print(f"🎯 New best loss: {current_loss:.4f}")
        else:
            self.patience_counter += 1

        # Dynamic stopping criteria
        if current_loss <= 0.3:  # Very good loss
            print(f"🏆 Excellent loss achieved: {current_loss:.4f}")
            control.should_training_stop = True
        elif self.patience_counter >= self.patience:
            print(f"⏰ Early stopping: No improvement for {self.patience} steps")
            control.should_training_stop = True

        return control

    def on_train_end(self, args, state, control, **kwargs):
        print("\n📊 TRAINING SUMMARY")
        print(f"Best Loss: {self.best_loss:.4f}")
        print(f"Total Steps: {state.global_step}")
        print(f"Final Learning Rate: {self.training_history[-1]['learning_rate']:.2e}" if self.training_history else "N/A")



# 8 CBT Response Strategy Engine

This class is designed to act as the "brain" of our application. It takes an emotional analysis as input and uses a set of rules to select the most appropriate therapeutic strategy from a predefined library. This ensures that the response is tailored to the user's immediate needs, prioritizing crisis intervention above all else.

In [8]:
class CBTResponseStrategyEngine:
    def __init__(self): # Corrected __init__ method
        self.therapy_approaches = {
            'crisis_intervention': {
                'priority': 'immediate safety and stabilization',
                'techniques': ['grounding', 'safety planning', 'crisis resources'],
                'tone': 'calm, directive, supportive'
            },
            'anxiety_focused': {
                'priority': 'worry reduction and coping strategies',
                'techniques': ['breathing exercises', 'cognitive restructuring', 'exposure concepts'],
                'tone': 'gentle, reassuring, educational'
            },
            'depression_focused': {
                'priority': 'behavioral activation and mood improvement',
                'techniques': ['activity scheduling', 'thought records', 'self-compassion'],
                'tone': 'warm, encouraging, patient'
            },
            'trauma_informed': {
                'priority': 'safety, stabilization, processing',
                'techniques': ['grounding', 'window of tolerance', 'narrative therapy'],
                'tone': 'careful, validating, empowering'
            },
            'relationship_focused': {
                'priority': 'communication and boundary setting',
                'techniques': ['interpersonal skills', 'boundary setting', 'attachment'],
                'tone': 'balanced, insightful, practical'
            },
            'cognitive_restructuring': {
                'priority': 'identifying and challenging thoughts',
                'techniques': ['thought challenging', 'evidence examination', 'balanced thinking'],
                'tone': 'collaborative, curious, logical'
            }
        }

    def select_strategy(self, emotional_analysis):
        if emotional_analysis['crisis_level'] == 'high':
            return 'crisis_intervention'

        primary_emotions = emotional_analysis['primary_emotions']

        if 'trauma' in primary_emotions:
            return 'trauma_informed'
        elif 'anxiety' in primary_emotions:
            return 'anxiety_focused'
        elif 'depression' in primary_emotions:
            return 'depression_focused'
        elif 'relationships' in primary_emotions:
            return 'relationship_focused'
        elif emotional_analysis['cognitive_distortions']:
            return 'cognitive_restructuring'
        else:
            return 'cognitive_restructuring'  # Default approach


# 9. The Master Response Generation Engine

This class is the central orchestrator of the entire system. It integrates all the other components—Emotional Intelligence, CBT Strategy, Conversation Memory, and the RAG Knowledge Engine—to produce a single, high-quality response. Its primary job is to assemble a rich, contextual prompt, generate text using the language model, and then clean and refine that text before it's sent to the user.


In [9]:
class UltimateGenerationEngine:
    def __init__(self, model, tokenizer): # Corrected __init__ method
        self.model = model
        self.tokenizer = tokenizer

        # Make sure to initialize these if they are not globally available
        self.ei_engine = EmotionalIntelligenceEngine()
        self.strategy_engine = CBTResponseStrategyEngine()
        self.conversation_memory = deque(maxlen=10)

        # RAG ADDITION: initialize RAG
        self.rag_engine = RAGKnowledgeEngine() # Using the renamed class
        self.rag_top_k = 4

    def _format_rag_knowledge(self, docs):
        if not docs:
            return ""
        snippets = []
        for d in docs:
            if not d or 'content' not in d: # Check if doc is valid and has 'content'
                continue
            # safe limit per document to keep prompt under control
            snippet = d['content'].strip().replace("\n", " ")
            if len(snippet) > 600:
                snippet = snippet[:600] + "..."
            snippets.append(f"- {snippet}")
        if not snippets:
            return ""
        return "Relevant Knowledge (use if helpful, otherwise ignore):\n" + "\n".join(snippets) + "\n"


    def post_process_response(self, response):
        """
        This is the POLISHING function. It cleans the raw output from the model.
        """
        # Step 1: Get the core response by splitting at the first sign of an artifact
        stop_tokens = ["<|", "</", "[/", "<&", "User:", "Therapist:", "CLINICAL ASSESSMENT", "QUALITY VERIFICATION", "###"]
        for token in stop_tokens:
            if token in response:
                response = response.split(token)[0].strip()

        # Step 2: Handle sentence structure and capitalization
        sentences = re.split(r'(?<=[.!?])\s+', response)
        clean_sentences = []
        for sentence in sentences:
            sentence = sentence.strip()
            if sentence:
                if not sentence[0].isupper():
                    sentence = sentence[0].upper() + sentence[1:]
                clean_sentences.append(sentence)
        response = ' '.join(clean_sentences)

        if not response:
            return "I'm not sure how to respond to that. Could you please tell me more?"

        # Step 3: Add a collaborative ending if missing
        collaborative_markers = ['what', 'how', 'would you', 'can we', 'together']
        if len(response.split()) > 25 and not any(marker in response.lower() for marker in collaborative_markers):
            endings = [ "How does that sound to you?", "What are your thoughts on this?" ]
            response += f" {random.choice(endings)}"

        return response

    def generate_master_response(self, user_input):
        """
        CHANGED: This function now uses conversation memory to provide context.
        REASON: To create more natural, flowing conversations where the agent
        remembers what was said before.
        """
        analysis = self.ei_engine.analyze_emotional_state(user_input)
        strategy = self.strategy_engine.select_strategy(analysis)

        # --- START OF NEW MEMORY LOGIC ---
        # 1. Build the conversation history from memory
        history = ""
        for turn in self.conversation_memory:
            history += f"User: {turn.get('user', '')}\nTherapist: {turn.get('assistant', '')}\n\n"

        # RAG ADDITION: get relevant knowledge from your Chroma DB
        rag_docs = self.rag_engine.retrieve_relevant_knowledge(user_input, k=self.rag_top_k) # Using the renamed method
        rag_context = self._format_rag_knowledge(rag_docs)

        # 2. Create the prompt with the history included + RAG context
        prompt = f"""You are a supportive CBT therapist. Continue the conversation naturally.
{rag_context}{history}User: {user_input}
Therapist:"""

        # --- END OF NEW MEMORY LOGIC ---
        inputs = self.tokenizer(prompt, return_tensors="pt").to(self.model.device)

        with torch.no_grad():
            outputs = self.model.generate(
                **inputs,
                max_new_tokens=250,
                min_new_tokens=70,
                do_sample=True,
                temperature=0.7,
                top_p=0.95,
                repetition_penalty=1.2,
                pad_token_id=self.tokenizer.eos_token_id,
            )

        raw_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)

        # Extract the part after the LAST "Therapist:"
        if "Therapist:" in raw_response:
            generated_text = raw_response.split("Therapist:")[-1].strip()
        else:
            generated_text = raw_response.replace(prompt.replace("Therapist:", ""), "").strip()

        # Polishing the response
        polished_response = self.post_process_response(generated_text)

        # Add the current turn to memory
        self.conversation_memory.append({'user': user_input, 'assistant': polished_response})

        return polished_response, analysis, strategy


# 10. Initializing the Training Environment

This script handles the complete setup required to begin the fine-tuning process. It performs four key actions:
1.  **Authentication:** Logs into Hugging Face and connects to Google Drive for access to models and datasets.
2.  **Data Loading & Augmentation:** Loads the raw `cbt_500.jsonl` dataset and uses the `AdvancedDatasetCreator` to transform it into a high-quality, structured format ideal for training.
3.  **Model Selection:** Chooses `microsoft/phi-2` as the base model.
4.  **Configuration:** Configures the model to load with advanced quantization settings, making it possible to train on consumer-grade GPUs with limited VRAM.


In [None]:
# Main Training Pipeline
print("🚀 INITIALIZING ULTIMATE CBT THERAPIST MODEL")
print("="*80)

# Setup
print("\n--- Step 1: Authentication & Setup ---")
HF_TOKEN = "#"
login(token=HF_TOKEN)
drive.mount('/content/drive')
print("✅ Setup Complete!")

# Load and process dataset
print("\n--- Step 2: Advanced Dataset Processing ---")
DATASET_PATH = "/content/drive/MyDrive/cbt_500.jsonl"
raw_data = load_dataset("json", data_files=DATASET_PATH, split="train")

# Create ultimate dataset
dataset_creator = AdvancedDatasetCreator()
enhanced_data = dataset_creator.augment_dataset(raw_data)

# Convert to dataset
dataset = Dataset.from_list(enhanced_data)
split = dataset.train_test_split(test_size=0.15, seed=42)
train_dataset = split['train']
eval_dataset = split['test']

print(f"✅ Ultimate Dataset Ready: {len(train_dataset)} train, {len(eval_dataset)} eval")
print(f"\nSample enhanced prompt preview:")
print(train_dataset[0]['text'][:500] + "...\n")

# Model setup
print("\n--- Step 3: Advanced Model Configuration ---")
MODEL_ID = "microsoft/phi-2"

# Enhanced quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True, # Double quantization for better performance
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Load model
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.float16,
)

print("✅ Model Loaded with Enhanced Configuration!")

🚀 INITIALIZING ULTIMATE CBT THERAPIST MODEL

--- Step 1: Authentication & Setup ---
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
✅ Setup Complete!

--- Step 2: Advanced Dataset Processing ---
🧠 Creating Ultimate CBT Dataset with Advanced Psychology...
Processing example 1/500
Processing example 51/500
Processing example 101/500
Processing example 151/500
Processing example 201/500
Processing example 251/500
Processing example 301/500
Processing example 351/500
Processing example 401/500
Processing example 451/500
✅ Enhanced dataset created: 548 examples
✅ Ultimate Dataset Ready: 465 train, 83 eval

Sample enhanced prompt preview:
<s>[INST] <<SYS>>
You are a master CBT therapist with 25+ years of experience, specializing in cognitive restructuring. Your therapeutic approach is guided by the following clinical assessment of the user's message:
Primary Emotions: grief
Detected Cognitive Distortions: none i

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

✅ Model Loaded with Enhanced Configuration!


# 11. Configuring the Advanced QLoRA Fine-Tuning Strategy

**Purpose:** To define the precise methodology for adapting the pre-trained model to our CBT therapy dataset, balancing performance, speed, and memory efficiency.

**Design Rationale:**

This configuration implements the **QLoRA (Quantized Low-Rank Adaptation)** technique, a highly efficient method for fine-tuning large models.

1.  **LoRA Configuration (`LoraConfig`):**
    *   **Reasoning:** Instead of retraining all 2.7 billion parameters of the model (which is computationally expensive), we freeze the original model and inject small, trainable "adapter" layers. This drastically reduces the number of parameters we need to update, saving memory and time.
    *   `r=32` & `lora_alpha=64`: These are crucial hyperparameters. `r` (rank) determines the size of the adapter matrices. A higher rank (like 32) gives the model more capacity to learn complex new information. `lora_alpha` is a scaling factor that magnifies the impact of these adapters. A common and effective practice is to set `alpha` to twice the `r`, which we follow here for strong adaptation.
    *   `target_modules=[...]`: This is a critical optimization. We are specifically telling LoRA to attach adapters to the most important layers of the transformer architecture—the query, key, and value projection matrices (`q_proj`, `k_proj`, `v_proj`) and the feed-forward network layers (`dense`, `fc1`, `fc2`). This focuses the training on the parts of the model that contribute most to understanding and generating language.

2.  **Training Arguments (`TrainingArguments`):**
    *   **Reasoning:** These arguments are meticulously tuned to create a stable and effective training loop.
    *   **Memory Management:** The combination of `per_device_train_batch_size=2` and `gradient_accumulation_steps=4` is a key technique for training with limited VRAM. The model processes only 2 examples at a time, but it accumulates the gradients over 4 steps before performing a weight update. This achieves the learning stability of a larger **effective batch size of 8** (`2 * 4`) while only requiring the memory for a batch size of 2.
    *   **Learning Schedule:** We use a `learning_rate` of `2e-5`, a well-established default for the AdamW optimizer that balances learning speed with stability. The `warmup_ratio=0.05` prevents the model from making drastic, damaging updates at the very start of training by gradually increasing the learning rate.
    *   **Robust Evaluation and Saving:** By setting `eval_strategy="steps"` and `load_best_model_at_end=True`, we create a robust training process. The trainer will evaluate the model's performance on the validation set every 25 steps. It keeps track of the checkpoint with the lowest `eval_loss` and automatically loads that one as the final, best model. This ensures we don't accidentally save a worse model from a later, overfitted epoch.
    *   `fp16=True`: Enables mixed-precision training, which uses 16-bit floating-point numbers for certain operations. This significantly speeds up training and further reduces memory consumption.

3.  **The Ultimate Trainer (`SFTTrainer`):**
    *   **Reasoning:** We use the `SFTTrainer` from the TRL library, which is specifically designed for supervised fine-tuning of language

In [11]:
# Prepare for training
print("\n--- Step 4: Advanced Training Preparation ---")
model = prepare_model_for_kbit_training(model)

# Enhanced LoRA config
peft_config = LoraConfig(
    r=32, # Higher rank for better capacity
    lora_alpha=64, # Higher alpha for stronger adaptation
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "dense", "fc1", "fc2"],
    modules_to_save=["lm_head", "embed_tokens"] # Save additional modules
)

# Training arguments - competition optimized
training_args = TrainingArguments(
    output_dir="./ultimate-cbt-therapist",
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=5, # More epochs for better learning
    learning_rate=2e-5, # Optimal learning rate
    warmup_ratio=0.05,
    weight_decay=0.01,
    logging_steps=5,
    save_steps=50,
    eval_strategy="steps",
    eval_steps=25,
    save_total_limit=5,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    fp16=True,
    dataloader_pin_memory=False,
    remove_unused_columns=False,
    report_to=None,
    seed=42,
)
callbacks = [
    EarlyStoppingCallback(early_stopping_patience=10) # Agar 10 steps tak improvement nahi, toh rok do
]

# Create ultimate trainer
print("\n--- Step 5: Creating Ultimate Trainer ---")
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=peft_config,

    # Longer sequences for complex responses
    callbacks=[
        UltimateTrainingCallback(quality_threshold=0.9, patience=20, min_steps=60),
        EarlyStoppingCallback(early_stopping_patience=25, early_stopping_threshold=0.001)
    ]
)

print("✅ Ultimate Trainer Created!")


--- Step 4: Advanced Training Preparation ---

--- Step 5: Creating Ultimate Trainer ---


Adding EOS to train dataset:   0%|          | 0/465 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/465 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/465 [00:00<?, ? examples/s]

Adding EOS to eval dataset:   0%|          | 0/83 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/83 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/83 [00:00<?, ? examples/s]

✅ Ultimate Trainer Created!


# 12. Initiating the Fine-Tuning Loop and Persisting the Model Artifacts

**Purpose:** To execute the configured training plan and then systematically save all resulting artifacts in a structured, portable, and reproducible manner.

**Design Rationale:**

1.  **Training Execution (`trainer.train()`):**
    *   **Reasoning:** This single command abstracts away the immense complexity of the training loop. The `SFTTrainer` handles everything defined in the `TrainingArguments`: batching the data, performing forward and backward passes, calculating gradients, updating the LoRA weights with the AdamW optimizer, running periodic evaluations, and managing checkpoints. Because we set `load_best_model_at_end=True`, the `trainer.model` object at the end of this call is guaranteed to be the version that achieved the lowest loss on the evaluation set, protecting us from overfitting.

2.  **Efficient Artifact Persistence:**
    *   **Reasoning:** A trained model is more than just its weights. A robust saving strategy is crucial for deployment and future use.
    *   **Saving PEFT Adapters (`trainer.model.save_pretrained`):** This is a key advantage of the LoRA method. We are **not** saving another 2.7 billion parameter model. Instead, we are saving only the small adapter layers that were trained. The resulting files are typically only a few dozen megabytes in size, making the model incredibly portable and fast to load. The base `phi-2` model remains untouched.
    *   **Saving the Tokenizer (`tokenizer.save_pretrained`):** It is critical to save the exact tokenizer configuration used during training alongside the model. This guarantees that text is processed identically during inference as it was during training, preventing subtle bugs and performance degradation.
    *   **Model Versioning with Metadata:** Creating a `model_metadata.json` file is a core MLOps best practice. A model without context is difficult to manage. This file acts as a permanent record, answering key questions for future developers (or ourselves): When was this made? What data was it trained on? What was it designed to do? This is essential for reproducibility, tracking experiments, and ensuring models are used correctly in production.


In [12]:
# Training
print("\n--- Step 6: Ultimate Training Phase ---")
print("🚀 Beginning advanced CBT therapist training...")

trainer.train()

print("\n🏆 TRAINING COMPLETED SUCCESSFULLY!")

# Save the ultimate model
print("\n--- Step 7: Saving Ultimate Model ---")
SAVE_PATH = "/content/drive/MyDrive/ultimate-cbt-therapist"
trainer.model.save_pretrained(SAVE_PATH)
tokenizer.save_pretrained(SAVE_PATH)

# Save training metadata
metadata = {
    'model_version': 'Ultimate CBT Therapist v1.0',
    'training_date': datetime.now().isoformat(),
    'dataset_size': len(train_dataset),
    'features': [
        'Advanced Emotional Intelligence',
        'Multi-Strategy Response Generation',
        'Crisis Detection & Response',
        'Cognitive Distortion Analysis',
        'Personalized Therapy Approaches',
        'Quality-Optimized Training'
    ]
}

with open(f"{SAVE_PATH}/model_metadata.json", "w") as f:
    json.dump(metadata, f, indent=2)

print("✅ Ultimate Model Saved!")


--- Step 6: Ultimate Training Phase ---
🚀 Beginning advanced CBT therapist training...


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33manshattre09[0m ([33manshattre09-iit-mandi[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


  return fn(*args, **kwargs)


Step,Training Loss,Validation Loss,Entropy,Num Tokens,Mean Token Accuracy
25,2.2103,1.934774,2.242226,35085.0,0.574441
50,0.6664,0.534943,0.740626,70277.0,0.884483
75,0.2978,0.239553,0.365166,103965.0,0.940936


  return fn(*args, **kwargs)


🎯 New best loss: 0.4678
🎯 New best loss: 0.3797
🎯 New best loss: 0.3267
🎯 New best loss: 0.2978
🏆 Excellent loss achieved: 0.2978

📊 TRAINING SUMMARY
Best Loss: 0.2978
Total Steps: 75
Final Learning Rate: 0.00e+00

🏆 TRAINING COMPLETED SUCCESSFULLY!

--- Step 7: Saving Ultimate Model ---
✅ Ultimate Model Saved!


# 13. Qualitative Validation and Model Deployment Packaging

**Purpose:** To perform a final, qualitative assessment of the model's conversational ability and to create a portable, easy-to-use script for future inference.

**Design Rationale:**

1.  **Step 10: Interactive Qualitative Validation:**
    *   **Reasoning:** While training metrics like `eval_loss` are essential for quantitative analysis, they don't tell the whole story. They can't measure empathy, coherence, or the natural flow of conversation. This interactive loop is the **qualitative validation** stage. It allows us to directly experience the model's behavior and assess its performance on a human level.
    *   **Interpretability through Analysis:** By printing the `analysis` and `strategy` before each response, we gain crucial insight into the model's "thought process." This is a powerful debugging and validation tool. It confirms that our `CBTResponseStrategyEngine` is correctly identifying user needs (e.g., selecting `'crisis_intervention'`) and that the entire system is working in harmony.

2.  **Step 11: Creating a Portable Inference Script:**
    *   **Reasoning:** A trained model is only useful if it can be easily loaded and used elsewhere. This step addresses that by creating a boilerplate script. This is a critical **MLOps (Machine Learning Operations)** practice for several reasons:
        *   **Decoupling:** It separates the complex training environment from a much simpler inference environment. You don't need `bitsandbytes`, `SFTTrainer`, or the dataset to run the final model; you only need `transformers` and `peft`.
        *   **Reproducibility:** The script provides a clear, unambiguous example of how to load the PEFT adapters onto the base model. This is the standard procedure for using a LoRA-finetuned model and ensures anyone (including our future selves) can get it running quickly.
        *   **Deployment Readiness:** This script is the foundational block for deploying the model in an application, such as a backend API for a chatbot. It contains all the essential loading logic.


In [13]:
# Step 10: Final Testing (INTERACTIVE CONVERSATIONAL MODE)
print("\n--- Step 10: Ultimate Model Testing ---")

# Initialize the generation engine with the final, trained model
generation_engine = UltimateGenerationEngine(trainer.model, tokenizer)
print("\n" + "="*80)
print("🧠 ULTIMATE CBT THERAPIST IS READY")
print("="*80)
print(">> You can now have a conversation. Type 'quit' to exit.")

# This 'while True' loop will run forever until you type 'quit'
while True:
    print("\n" + "-"*80)

    # Take input from the user in real-time
    user_input = input("👤 YOU: ")

    # Check if the user wants to exit
    if user_input.lower() == 'quit':
        print("\n🧠 CBT THERAPIST: Take care. Remember to be kind to yourself.")
        break # This command breaks the loop

    # The generate_master_response function handles everything
    response, analysis, strategy = generation_engine.generate_master_response(user_input)

    print(f"🔍 ANALYSIS: Emotions: {', '.join(analysis['primary_emotions']) if analysis['primary_emotions'] else 'N/A'}, Strategy: {strategy}")
    print(f"🧠 CBT THERAPIST: {response}")

# Step 11: Save the Final Deliverables (This part remains the same)
# ... (Baaki ka code waisa hi rahega) ...

# Step 11: Save the Final Deliverables
SAVE_PATH = "/content/drive/MyDrive/AI_Cognitive_Coach/phi-2-finetuned-final" # Make sure this path exists

# Save a simple generation script for the deliverables
generation_code = f'''
# This is a sample script to run the saved model.
# Make sure to have the 'Source_Code' folder with all the class definitions.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import os # Import os module

# Assuming your classes are in a file named 'your_file.py'
# from your_file import EmotionalIntelligenceEngine, CBTResponseStrategyEngine, RAGKnowledgeEngine, UltimateGenerationEngine # Import your class here

# --- CONFIGURATION ---
BASE_MODEL_ID = "microsoft/phi-2"
ADAPTER_PATH = "{SAVE_PATH}" # This will be the path to your saved model

# --- LOAD MODEL ---
# You would first load the base model and then apply the adapters
base_model = AutoModelForCausalLM.from_pretrained(BASE_MODEL_ID, trust_remote_code=True)
model = PeftModel.from_pretrained(base_model, ADAPTER_PATH)
tokenizer = AutoTokenizer.from_pretrained(ADAPTER_PATH)

# --- RUN INFERENCE ---
# Ensure all necessary classes are defined or imported before this
# engine = UltimateGenerationEngine(model, tokenizer)
# response, _, _ = engine.generate_master_response("Your test prompt here")
# print(response)
'''

# You might want to save this generation_code to a file as part of your deliverables
# with open(f"{SAVE_PATH}/run_model_script.py", "w") as f:
#     f.write(generation_code)

# print(f"✅ Generation script saved to {SAVE_PATH}/run_model_script.py")


--- Step 10: Ultimate Model Testing ---


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]


🧠 ULTIMATE CBT THERAPIST IS READY
>> You can now have a conversation. Type 'quit' to exit.

--------------------------------------------------------------------------------
👤 YOU: i m so stressed about my exams 
🔍 ANALYSIS: Emotions: anxiety, Strategy: anxiety_focused
🧠 CBT THERAPIST: It sounds like you have some concerns or worries related to performing well on your exams. Can you tell me more about those thoughts and emotions? How has stress affected your studying habits or sleep patterns? Are there any specific subjects or topics that worry you most? We can work together to identify and challenge any negative thoughts or assumptions that may be contributing to your anxiety. Remember, it's normal to feel anxious before a big test, and with preparation and self-compassion, you can do your best. In the context of our previous dialogue, let’s say we focus specifically on exam-related fears and create a hypothetical situation where 4 students (Student A, Student B, Student C, and Studen

# Step 1: Ensuring a Reproducible and Conflict-Free Python Environment

**Purpose:** To create a stable, predictable, and clean environment for the training process, eliminating common sources of errors related to library version conflicts.

**Design Rationale:**

Cloud environments like Google Colab come with many pre-installed Python libraries. While convenient, this can lead to a problem known as **"dependency hell,"** where the pre-installed versions conflict with the specific versions required by our advanced fine-tuning scripts. This code block implements a robust, three-step solution:

1.  **Forceful Uninstallation (The "Nuke"):** The first command (`pip uninstall ... -y`) acts as a "hard reset." It removes any existing installations of the critical libraries from the environment. This is a defensive measure that guarantees we are starting from a known, clean state, rather than hoping the existing libraries are compatible.

2.  **Atomic Installation (The "Pave"):** We install all required dependencies in a single `pip install` command. This is significantly better than installing them one by one. When run together, `pip`'s dependency resolver can analyze all the requirements at once and find a set of package versions that are mutually compatible. This drastically reduces the chance of version mismatch errors. We also add `vaderSentiment`, `sentence-transformers`, and `chromadb` to support our advanced `EmotionalIntelligenceEngine` and `RAGKnowledgeEngine`.

3.  **Mandatory Runtime Restart:** This is the most critical step for the user. When libraries are installed in a running Jupyter/Colab session, the Python kernel often keeps the old versions loaded in memory. Simply running the install command is not enough to make the notebook use the new versions. A **runtime restart** is required to completely reload the environment and force the kernel to load the fresh libraries from the disk. Skipping this step is the most common reason for notebook failures.


In [14]:
# === STEP 1: CLEAN INSTALLATION ===
print("🚀 Step 1: Uninstalling old libraries to prevent conflicts...")
!pip uninstall torch torchvision torchaudio transformers accelerate peft bitsandbytes trl datasets gradio -y

print("\n⚙️ Step 2: Installing a fresh, compatible set of libraries...")

# We install everything in one go to let pip resolve all dependencies correctly.
!pip install -q -U torch torchvision torchaudio transformers accelerate peft bitsandbytes trl datasets gradio vaderSentiment sentence-transformers
# RAG ADDITION: install chromadb here too
!pip install -q chromadb

print("\n✅ All libraries installed successfully!")
print("🔴 IMPORTANT: Please RESTART THE RUNTIME now before running the next cell! (Runtime -> Restart session)")



🚀 Step 1: Uninstalling old libraries to prevent conflicts...
Found existing installation: torch 2.8.0+cu126
Uninstalling torch-2.8.0+cu126:
  Successfully uninstalled torch-2.8.0+cu126
Found existing installation: torchvision 0.23.0+cu126
Uninstalling torchvision-0.23.0+cu126:
  Successfully uninstalled torchvision-0.23.0+cu126
Found existing installation: torchaudio 2.8.0+cu126
Uninstalling torchaudio-2.8.0+cu126:
  Successfully uninstalled torchaudio-2.8.0+cu126
Found existing installation: transformers 4.56.1
Uninstalling transformers-4.56.1:
  Successfully uninstalled transformers-4.56.1
Found existing installation: accelerate 1.10.1
Uninstalling accelerate-1.10.1:
  Successfully uninstalled accelerate-1.10.1
Found existing installation: peft 0.17.1
Uninstalling peft-0.17.1:
  Successfully uninstalled peft-0.17.1
Found existing installation: bitsandbytes 0.47.0
Uninstalling bitsandbytes-0.47.0:
  Successfully uninstalled bitsandbytes-0.47.0
Found existing installation: trl 0.23.0
U

# 15 Loading Core Dependencies and Frameworks

**Purpose:** To import all required modules into the Python session, making their functions and classes available for the rest of the script. This centralized import block ensures that all dependencies are declared upfront.

**Design Rationale:**

The application is built on a modular stack of specialized libraries, each chosen for a specific role:

1.  **The LLM & Transformers Ecosystem (`torch`, `transformers`, `peft`):**
    *   **Reasoning:** This is the core of our AI. `torch` provides the fundamental deep learning framework. `transformers` from Hugging Face is the industry standard for accessing pre-trained models like `phi-2`. `peft` (Parameter-Efficient Fine-Tuning) is critically important, as it allows us to load our small, efficient adapter weights on top of the base model.

2.  **The Retrieval-Augmented Generation (RAG) Stack (`chromadb`, `embedding_functions`):**
    *   **Reasoning:** To give the model external knowledge, we need a RAG system. `chromadb` was chosen as it is a lightweight, open-source vector database that runs directly in the notebook, making it perfect for development and demos. The `embedding_functions` will be used to convert user queries and knowledge snippets into numerical vectors for similarity searching.

3.  **The Natural Language Processing (NLP) Toolkit (`vaderSentiment`, `re`):**
    *   **Reasoning:** Beyond the LLM, we need specific NLP tools. `vaderSentiment` is a fast, rule-based sentiment analyzer that gives us a quick emotional reading without needing a separate AI model. The `re` (regular expressions) module is essential for the text cleaning and post-processing steps.

4.  **The Application & UI Framework (`gradio`):**
    *   **Reasoning:** To make the model interactive, `gradio` was selected. It is a high-level library that makes it incredibly easy to build a user-friendly web interface for any machine learning model directly within a Python script, which is ideal for creating demos.

5.  **Core Python Utilities (`deque`, `random`, `os`):**
    *   **Reasoning:** These are standard libraries for fundamental tasks. `deque` provides an efficient way to manage the conversation memory, `random` helps in adding variety to responses, and `os` is used for interacting with the file system.


In [15]:
# Cell 2: MAIN APPLICATION SCRIPT
print("🧠 Step 3: Importing libraries and defining AI brain...")

# Imports and Class Definitions
import torch
import re
import random
from collections import deque
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import gradio as gr
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
# RAG ADDITIONS
import chromadb
from chromadb.utils import embedding_functions
import os # Import os module


🧠 Step 3: Importing libraries and defining AI brain...


In [16]:
class EmotionalIntelligenceEngine:
    def __init__(self): # Corrected __init__ method
        self.sentiment_analyzer = SentimentIntensityAnalyzer()
        self.emotion_patterns = {
            'anxiety': {'keywords': ['worried', 'anxious', 'scared', 'panic', 'nervous', 'fear', 'stress']},
            'depression': {'keywords': ['sad', 'hopeless', 'empty', 'worthless', 'tired', 'meaningless']},
            'anger': {'keywords': ['angry', 'frustrated', 'furious', 'annoyed', 'irritated', 'rage']},
            # Added missing emotion patterns for consistency
            'grief': {'keywords': ['loss', 'died', 'miss', 'gone', 'funeral', 'bereaved']},
            'trauma': {'keywords': ['flashback', 'nightmare', 'triggered', 'ptsd', 'abuse', 'accident']},
            'relationships': {'keywords': ['relationship', 'partner', 'marriage', 'divorce', 'breakup', 'lonely']}
        }
        self.cognitive_distortions = {
            'all_or_nothing': ['always', 'never', 'completely', 'totally', 'everything', 'nothing'],
            'catastrophizing': ['disaster', 'terrible', 'awful', 'end of world', 'ruined'],
            # Added missing cognitive distortion patterns for consistency
            'mind_reading': ['they think', 'everyone believes', 'people assume'],
            'fortune_telling': ['will never', 'going to fail', "won't work", 'bound to'],
            'emotional_reasoning': ['feel like', 'seems like', 'must be because I feel'],
            'should_statements': ['should', 'must', 'ought to', 'have to'],
            'labeling': ["I am", "he is", "she is"] + ['stupid', 'failure', 'loser', 'worthless'],
            'personalization': ["my fault", "because of me", "I caused", "I'm responsible"]
        }


    def analyze_emotional_state(self, text):
        text_lower = text.lower()
        sentiment_scores = self.sentiment_analyzer.polarity_scores(text)
        # Filter emotions based on keywords
        detected_emotions = [emotion for emotion, patterns in self.emotion_patterns.items() if any(kw in text_lower for kw in patterns['keywords'])]
        # Filter distortions based on patterns
        detected_distortions = [dist for dist, patterns in self.cognitive_distortions.items() if any(p in text_lower for p in patterns)]
        crisis_keywords = ['suicide', 'kill myself', 'end it all', 'want to die', 'hurt myself', 'self harm']
        crisis_level = 'high' if any(keyword in text_lower for keyword in crisis_keywords) else 'low'
        # Return only the top 2 detected emotions for primary_emotions
        return {
            'primary_emotions': detected_emotions[:2],
            'sentiment': sentiment_scores,
            'cognitive_distortions': detected_distortions,
            'crisis_level': crisis_level
        }

# 16. Implementing a Rule-Based Emotional Analysis Engine

**Purpose:** To provide a fast, reliable, and interpretable analysis of the user's emotional and cognitive state. This analysis serves as the primary input for the `CBTResponseStrategyEngine` to select an appropriate therapeutic approach.

**Design Rationale:**

A rule-based, heuristic engine was deliberately chosen over a complex machine learning model for this task due to several key advantages:

1.  **Speed and Efficiency:** The engine uses keyword matching and the lightweight `VaderSentiment` library. This analysis is nearly instantaneous and has a negligible computational cost, ensuring the application remains responsive.

2.  **Interpretability and Control:** The logic is completely transparent. If the engine detects 'anxiety', it's because a specific keyword from our predefined list was present. This makes the system easy to debug, test, and refine. We have full control over what it looks for.

3.  **Clinically-Informed Dictionaries:** The keyword lists for `emotion_patterns` and `cognitive_distortions` are not arbitrary. They are populated with terms directly relevant to Cognitive Behavioral Therapy (CBT). This ensures that the analysis is tailored specifically to the therapeutic context of the application.

4.  **Prioritized Safety:** The `crisis_keywords` check is a simple but highly effective safety mechanism. It operates as a high-priority, non-negotiable filter. By immediately flagging this `crisis_level`, the system can ensure that safety is addressed before any other therapeutic goal.

5.  **Structured Output:** The `analyze_emotional_state` method doesn't just return a single label. It produces a rich, structured dictionary (`primary_emotions`, `sentiment`, `cognitive_distortions`, `crisis_level`). This multi-faceted output provides a comprehensive "snapshot" of the user's state for more nuanced decision-making downstream.


In [17]:
class CBTResponseStrategyEngine:
    def __init__(self): # Added __init__ method
        self.therapy_approaches = {
            'crisis_intervention': {'priority': 'immediate safety and stabilization', 'techniques': ['grounding', 'safety planning', 'crisis resources'], 'tone': 'calm, directive, supportive'},
            'anxiety_focused': {'priority': 'worry reduction and coping strategies', 'techniques': ['breathing exercises', 'cognitive restructuring', 'exposure concepts'], 'tone': 'gentle, reassuring, educational'},
            'depression_focused': {'priority': 'behavioral activation and mood improvement', 'techniques': ['activity scheduling', 'thought records', 'self-compassion'], 'tone': 'warm, encouraging, patient'},
            # Added missing approaches for consistency
            'trauma_informed': {'priority': 'safety, stabilization, processing', 'techniques': ['grounding', 'window of tolerance', 'narrative therapy'], 'tone': 'careful, validating, empowering'},
            'relationship_focused': {'priority': 'communication and boundary setting', 'techniques': ['interpersonal skills', 'boundary setting', 'attachment'], 'tone': 'balanced, insightful, practical'},
            'cognitive_restructuring': {'priority': 'identifying and challenging thoughts', 'techniques': ['thought challenging', 'evidence examination', 'balanced thinking'], 'tone': 'collaborative, curious, logical'}
        }

    def select_strategy(self, emotional_analysis):
        if emotional_analysis['crisis_level'] == 'high': return 'crisis_intervention'
        # Check if primary_emotions is not empty before accessing elements
        if emotional_analysis['primary_emotions']:
            if 'trauma' in emotional_analysis['primary_emotions']: return 'trauma_informed'
            if 'anxiety' in emotional_analysis['primary_emotions']: return 'anxiety_focused'
            if 'depression' in emotional_analysis['primary_emotions']: return 'depression_focused'
            if 'relationships' in emotional_analysis['primary_emotions']: return 'relationship_focused' # Added relationship strategy
        if emotional_analysis['cognitive_distortions']: return 'cognitive_restructuring'
        return 'cognitive_restructuring' # Default


# 17. Implementing a Local RAG Pipeline for Knowledge Augmentation

**Purpose:** To enhance the Large Language Model's responses by grounding them in a reliable, external knowledge base. This mitigates model hallucinations, provides up-to-date information, and allows for specialized domain knowledge.

**Design Rationale:**

1.  **Choice of Vector Database (`ChromaDB`):**
    *   **Reasoning:** `ChromaDB` was selected as the vector store for its simplicity and efficiency in a development environment. The `PersistentClient` allows the database to be saved to disk, meaning the knowledge base is preserved between sessions without requiring a dedicated server. This makes the entire application self-contained and portable.

2.  **Choice of Embedding Model (`all-MiniLM-L6-v2`):**
    *   **Reasoning:** This `SentenceTransformer` model represents an excellent balance of performance and size. It is highly effective at capturing the semantic meaning of text and converting it into dense vectors for similarity search. Its small footprint ensures that the embedding process is fast and does not require significant computational resources.

3.  **Robust Initialization and Connection:**
    *   **Reasoning:** The `__init__` method is designed to be resilient. The `get_or_create_collection` command ensures that the engine can seamlessly connect to an existing knowledge base or initialize a new one if it doesn't exist. The additional `try...except` block provides a fallback mechanism, making the startup process more robust.

4.  **Decoupled Retrieval Logic:**
    *   **Reasoning:** The `retrieve_relevant_knowledge` method encapsulates the core retrieval logic. It takes a simple text query and returns a clean, structured list of dictionaries. This clean interface decouples the complexities of vector search from the main generation engine, which simply needs to request and receive knowledge snippets. The error handling ensures that if the query fails for any reason, it returns an empty list instead of crashing the application.


In [18]:
# RAG ADDITION: Simple RAG Engine for the app
class RAGKnowledgeEngine: # Renamed to RAGKnowledgeEngine for consistency
    def __init__(self, persist_path="rag_chroma_db", collection_name="cbt_knowledge", embedding_model="all-MiniLM-L6-v2"): # Corrected __init__ method
        self.persist_path = persist_path
        self.collection_name = collection_name
        self.embedding_model = embedding_model
        self.client = chromadb.PersistentClient(path=self.persist_path)
        self.embedding_fn = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=self.embedding_model)
        self.collection = self.client.get_or_create_collection(name=self.collection_name, embedding_function=self.embedding_fn)
        try:
            if hasattr(self.collection, "count") and self.collection.count() == 0:
                cols = self.client.list_collections()
                if cols:
                    fallback = cols[0]
                    try:
                        self.collection = self.client.get_collection(fallback.name, embedding_function=self.embedding_fn)
                    except Exception:
                        self.collection = self.client.get_or_create_collection(name=fallback.name, embedding_function=self.embedding_fn)
        except Exception:
            pass

    def retrieve_relevant_knowledge(self, query_text, k=4): # Renamed method for consistency
        try:
            res = self.collection.query(query_texts=[query_text], n_results=k)
            docs = res.get("documents", [[]])[0] if res else []
            metas = res.get("metadatas", [[]])[0] if res else []
            # Return a list of dictionaries for consistency
            return [{'content': doc, 'metadata': meta} for doc, meta in zip(docs, metas)]
        except Exception:
            return []

# 18. The Central Orchestration Pipeline for Inference

**Purpose:** To serve as the central, stateful engine that orchestrates all AI components (Analysis, RAG, Memory, and LLM) to manage a coherent, context-aware, and therapeutically-grounded conversation.

**Design Rationale:**

This class is the heart of the application at runtime, and its design reflects a sophisticated, multi-stage approach to response generation.

1.  **Modular Orchestration:** The engine doesn't do everything itself. Instead, it acts as a "conductor," calling upon specialized modules for their specific tasks (`EI` for analysis, `RAG` for knowledge). This makes the entire system cleaner, easier to debug, and allows individual components to be upgraded without breaking the whole application.

2.  **Multi-Context Prompt Engineering:** This is the engine's most critical function. The quality of the LLM's output is directly proportional to the quality of its prompt. This engine constructs a rich prompt by combining three distinct sources of context:
    *   **Knowledge Context (RAG):** Provides factual, grounding information to ensure the response is accurate and detailed.
    *   **Temporal Context (Memory):** Provides the recent conversation history, allowing the model to produce responses that are natural, relevant, and refer back to previous points.
    *   **Immediate Context (User Input):** The user's most recent message.

3.  **The Generation-Polishing Pipeline:** The process is deliberately split into two phases:
    *   **Generation:** The `model.generate()` call, with carefully tuned parameters (`temperature`, `top_p`, `repetition_penalty`), produces the raw creative output from the LLM.
    *   **Polishing (`post_process_response`):** This is a crucial "quality control" step. Raw LLM output can be messy—it can contain stop tokens, have awkward capitalization, or ramble. This function acts as a deterministic filter that cleans up artifacts, enforces proper sentence structure, and ensures the response ends with a collaborative, user-engaging question.

4.  **Stateful Conversation Management:** The `conversation_memory` (a `deque`) is what transforms the chatbot from a simple question-answer machine into a conversational partner. By storing the last few turns, it gives the AI a "short-term memory," which is essential for building rapport and having a meaningful dialogue. The `deque` is highly efficient for this task, automatically discarding the oldest turns as new ones are added.


In [19]:
class UltimateGenerationEngine:
    def __init__(self, model, tokenizer): # Corrected __init__ method
        self.model = model
        self.tokenizer = tokenizer
        self.ei_engine = EmotionalIntelligenceEngine()
        self.strategy_engine = CBTResponseStrategyEngine()
        self.conversation_memory = deque(maxlen=6)
        # RAG ADDITION
        self.rag_engine = RAGKnowledgeEngine() # Using the renamed class
        self.rag_top_k = 4

    def _format_rag_knowledge(self, docs):
        if not docs:
            return ""
        snippets = []
        for d in docs:
             if not d or 'content' not in d: # Check if doc is valid and has 'content' key
                continue
             snippet = d['content'].strip().replace("\n", " ")
             if len(snippet) > 600:
                snippet = snippet[:600] + "..."
             snippets.append(f"- {snippet}")
        if not snippets:
            return ""
        return "Relevant Knowledge (use if helpful, otherwise ignore):\n" + "\n".join(snippets) + "\n"


    def post_process_response(self, response):
        stop_tokens = ["<|", "</", "[/", "User:", "Therapist:", "###"]
        for token in stop_tokens:
            if token in response:
                response = response.split(token)[0].strip()
        sentences = re.split(r'(?<=[.!?])\s+', response)
        clean_sentences = [s.strip().capitalize() for s in sentences if s.strip()]
        response = ' '.join(clean_sentences)
        if not response:
            return "I'm not sure how to respond to that. Could you please tell me more?"
        collaborative_markers = ['what', 'how', 'would you', 'can we', 'together']
        if len(response.split()) > 25 and not any(marker in response.lower() for marker in collaborative_markers):
            endings = ["How does that sound to you?", "What are your thoughts on this?"]
            response += f" {random.choice(endings)}"
        return response


    def generate_master_response(self, user_input):
        analysis = self.ei_engine.analyze_emotional_state(user_input)
        strategy = self.strategy_engine.select_strategy(analysis)

        history = "".join(f"User: {turn.get('user', '')}\nTherapist: {turn.get('assistant', '')}\n\n" for turn in self.conversation_memory)

        # RAG ADDITION: fetch relevant knowledge
        rag_docs = self.rag_engine.retrieve_relevant_knowledge(user_input, k=self.rag_top_k) # Using the renamed method
        rag_context = self._format_rag_knowledge(rag_docs)

        prompt = f"{rag_context}{history}User: {user_input}\nTherapist:"
        inputs = self.tokenizer(prompt, return_tensors="pt").to(self.model.device)
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs, max_new_tokens=250, min_new_tokens=50, do_sample=True,
                temperature=0.7, top_p=0.95, repetition_penalty=1.15,
                pad_token_id=self.tokenizer.eos_token_id,
            )
        raw_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        generated_text = raw_response.split("Therapist:")[-1].strip()
        polished_response = self.post_process_response(generated_text)
        self.conversation_memory.append({'user': user_input, 'assistant': polished_response})
        # We don't need analysis and strategy for the chat function, so we can just return the response
        return polished_response


# 19. Reconstructing the Fine-Tuned Model for Inference

**Purpose:** To efficiently load the fine-tuned model into memory for real-time use. This process involves composing the original base model with our trained adapter weights.

**Design Rationale:**

The key advantage of PEFT (Parameter-Efficient Fine-Tuning) with LoRA is that we don't save a massive, multi-billion parameter model. Instead, we only save the small, trained "adapters." This loading script demonstrates the standard procedure for reconstructing the full model in memory for inference:

1.  **Load the Quantized Base Model:**
    *   **Reasoning:** The original `phi-2` model is still the foundation. We must load it first. To make it fit within the limited VRAM of a Colab GPU, we use the same `BitsAndBytesConfig` as in training. This loads the model's weights in a highly compressed 4-bit format (`nf4`), drastically reducing its memory footprint. `device_map="auto"` ensures it's placed on the GPU correctly.

2.  **Load the Consistent Tokenizer:**
    *   **Reasoning:** It is a critical best practice to load the tokenizer from the **adapter's save directory (`ADAPTER_PATH`)**, not from the original model hub. This guarantees that we use the exact same tokenizer configuration (including any special tokens or settings) that was used during fine-tuning, preventing subtle errors during inference.

3.  **Apply the PEFT Adapters:**
    *   **Reasoning:** This is the most important step. The `PeftModel.from_pretrained()` command takes the large, frozen, and quantized `base_model` and dynamically applies our small, trained adapter weights to it. This fusion happens in memory and creates the final, specialized model that has learned all the nuances from our CBT dataset.

This entire process is highly efficient. It allows us to leverage the power of a large foundation model while only needing to store and load a few megabytes of adapter data, making our custom AI highly portable.


In [20]:
print("✅ AI brain defined!")

# Load Your Trained Model from Drive
print("\n🛰️ Step 4: Loading your fine-tuned model from Google Drive...")
# IMPORTANT: Go to huggingface.co/settings/tokens to get your token
hf_token = "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" # <-- Yahaan apna token daalo
login(token=hf_token)

# Define Paths
BASE_MODEL_ID = "microsoft/phi-2"

# IMPORTANT: This path must match your folder on Google Drive
ADAPTER_PATH = "/content/drive/MyDrive/ultimate-cbt-therapist"

# Load the base model in 4-bit for the GPU
bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto", # "auto" will correctly use the T4 GPU
    trust_remote_code=True,
    torch_dtype=torch.float16,
)

# Explicitly load the tokenizer from the local directory
tokenizer = AutoTokenizer.from_pretrained(ADAPTER_PATH)
tokenizer.pad_token = tokenizer.eos_token

# Load the PEFT model by applying your adapters
model = PeftModel.from_pretrained(base_model, ADAPTER_PATH)
print("✅ Your Ultimate CBT Therapist model is loaded and ready on the GPU!")

✅ AI brain defined!

🛰️ Step 4: Loading your fine-tuned model from Google Drive...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

✅ Your Ultimate CBT Therapist model is loaded and ready on the GPU!


# 20. Deploying a Demo with a Gradio User Interface

**Purpose:** To create an interactive, user-friendly web application for the fine-tuned model, allowing for easy demonstration, testing, and sharing.

**Design Rationale:**

1.  **Choice of Framework (`gradio`):**
    *   **Reasoning:** Gradio was chosen because it is the industry standard for rapidly creating machine learning demos. It abstracts away all the complex boilerplate of web development (HTML, CSS, JavaScript, backend APIs), allowing us to build a fully functional UI with just a few lines of Python.

2.  **The Controller Function (`chat_function`):**
    *   **Reasoning:** This function acts as a simple "bridge" or "adapter" between the Gradio frontend and our complex backend engine. Gradio's `ChatInterface` expects a simple function that takes `message` and `history` as input. Our `chat_function` cleanly wraps the call to `generation_engine.generate_master_response`, hiding all the internal complexity (RAG, memory, analysis, etc.) from the UI layer. This is a robust software design pattern.

3.  **Configuring a Rich User Experience:**
    *   **Reasoning:** The parameters within `gr.ChatInterface` are not just for function but also for user experience.
        *   `title`, `description`: These clearly set the user's expectations about what the model is and its capabilities.
        *   `examples`: These serve as an onboarding tool, guiding the user on how to interact with the model effectively and showcasing its strengths.
        *   `chatbot=gr.Chatbot(height=400)`: Fine-tuning the component's height prevents the UI from becoming too long or too short, improving usability.

4.  **Accessible and Shareable Deployment (`iface.launch`):**
    *   **Reasoning:** The `share=True` parameter is a key feature. It automatically creates a temporary, public URL for the application through a Gradio tunnel. This makes the project instantly shareable with colleagues or stakeholders for feedback or demonstration, without needing to manually deploy it to a server.


In [21]:
# Create and Launch the Gradio App with Fixed Height
print("\n🚀 Step 5: Launching Gradio Interface...")
generation_engine = UltimateGenerationEngine(model, tokenizer)

def chat_function(message, history):
    """Gradio calls this function for every message."""
    response = generation_engine.generate_master_response(message)
    return response

iface = gr.ChatInterface(
    fn=chat_function,
    title="🧠 Ultimate CBT Therapist AI --> BE WHO YOU ARE!",
    description="This is a fine-tuned microsoft/phi-2 model, running on a T4 GPU, designed to provide support using CBT principles.",
    theme="soft",
    examples=[
        ["I'm so worried about my presentation tomorrow, I feel like I'm going to fail."],
        ["I've been feeling so down lately, nothing seems interesting anymore."],
    ],
    chatbot=gr.Chatbot(height=400),  # Set chatbot height
)

# Launch with inline display and specific height
iface.launch(
    share=True,
    inline=True,  # Display inline in notebook
    height=600,   # Total interface height
    width="100%"  # Full width
)


🚀 Step 5: Launching Gradio Interface...


  chatbot=gr.Chatbot(height=400),  # Set chatbot height


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://25d2bcdfeb3174c95e.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


