# Mental Health Analysis Tool Implementation


- **MentalHealthAssistant class**: Stores condition info, symptoms, and country data
- **Symptoms database**: Contains key indicators for anxiety, depression, bipolar disorder, etc.
- **Feature engineering**: Creates country indicators + GDP per capita values
- **ML models**: Trains Random Forest + Logistic Regression classifiers
- **Pattern matching**: Identifies symptoms in user text input
- **Intent recognition**: Detects what user is asking about
- **Response generation**: Provides condition info, statistics, and resources
- **Pipeline process**: Data → Features → Models → Insights → User responses

In [1]:
import re
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


class MentalHealthAssistant:
    def __init__(self):
        """
        Initialize the Mental Health Assistant with condition information and data sources
        """
        print("Initializing Mental Health Assistant...")
        
        # Define mental health conditions and their associated symptoms
        self.conditions = {
            'anxiety': {
                'column': 'Anxiety',
                'symptoms': [
                    'excessive worry', 'restlessness', 'fatigue', 'difficulty concentrating', 
                    'irritability', 'muscle tension', 'sleep problems', 'panic attacks',
                    'feeling on edge', 'sense of impending danger', 'increased heart rate'
                ]
            },
            'depression': {
                'column': 'Major depression',
                'symptoms': [
                    'persistent sadness', 'loss of interest', 'appetite changes', 'sleep changes',
                    'fatigue', 'worthlessness', 'difficulty concentrating', 'suicidal thoughts',
                    'feeling empty', 'hopelessness', 'loss of energy', 'moving slowly'
                ]
            },
            'bipolar': {
                'column': 'Bipolar',
                'symptoms': [
                    'mood swings', 'elevated mood', 'decreased need for sleep', 'racing thoughts',
                    'poor decision making', 'irritability', 'inflated self-esteem', 'depressive episodes',
                    'excessive talking', 'increased energy', 'risky behavior'
                ]
            },
            'schizophrenia': {
                'column': 'Schizophrenia',
                'symptoms': [
                    'hallucinations', 'delusions', 'disorganized thinking', 'reduced emotional expression',
                    'reduced motivation', 'social withdrawal', 'difficulty focusing', 'confused speech',
                    'paranoia', 'thought disorders', 'lack of emotion'
                ]
            },
            'eating disorders': {
                'column': 'Eating Disorders',
                'symptoms': [
                    'extreme weight loss', 'obsession with food', 'distorted body image', 'excessive exercise',
                    'eating in secret', 'purging behaviors', 'fear of gaining weight', 'food restrictions',
                    'binge eating', 'irregular eating patterns', 'self-induced vomiting'
                ]
            }
        }
        
        # Define countries for comparison
        self.countries = ["Australia", "United States", "United Kingdom", "Canada", "India", "Global"]
        
        # Initialize data storage
        self.disorders_df = None
        self.disorders_latest_year = None
        self.disorders_df_latest = None
        
        # Try to load data if available
        try:
            self.load_data()
        except Exception as e:
            print(f"Warning: Could not load disorder data: {e}")
    
    def load_data(self):
        """
        Load mental health disorder data from a data source
        This would normally load from a CSV, API, or database
        """
        try:
            # Placeholder for data loading
            # In a real implementation, this would load from a file or database
            # For now, we'll leave it empty and let the synthetic data function handle it
            pass
        except Exception as e:
            print(f"Error loading data: {e}")
            raise


class MentalHealthChatbot:
    def __init__(self, assistant=None):
        """
        Initialize the Mental Health Chatbot with an optional MentalHealthAssistant
        
        Args:
            assistant: An instance of MentalHealthAssistant (optional)
        """
        print("Initializing Mental Health Chatbot...")
        
        # Initialize or create assistant
        if assistant is None:
            self.assistant = MentalHealthAssistant()
        else:
            self.assistant = assistant
            
        # Run the ML pipeline if not already run
        try:
            self.pipeline_results = run_pipeline()
            self.has_ml_results = True
            print("ML pipeline results loaded successfully")
        except Exception as e:
            print(f"Warning: Could not load ML pipeline results: {e}")
            self.has_ml_results = False
            self.pipeline_results = None
            
        # Initialize NLP components
        self.vectorizer = CountVectorizer(lowercase=True, token_pattern=r'\b\w+\b')
        
        # Define chatbot responses and conversation flow
        self.greetings = ["hi", "hello", "hey", "greetings", "hi there", "hello there", "howdy"]
        self.goodbyes = ["bye", "goodbye", "see you", "farewell", "exit", "quit", "end"]
        
        # Create intent patterns
        self.intents = {
            'greeting': r'(hi|hello|hey|greetings)(\s|$)',
            'goodbye': r'(bye|goodbye|farewell|exit|quit)(\s|$)',
            'how_are_you': r'how are you|how\'s it going|how do you feel',
            'condition_info': r'(what is|tell me about|explain|info on|information about)\s(anxiety|depression|bipolar|schizophrenia|eating disorder)',
            'symptoms': r'(symptoms of|signs of|how to identify|how to know if|do i have)\s(anxiety|depression|bipolar|schizophrenia|eating disorder)',
            'stats': r'(statistics|numbers|data|stats|insights|analysis|prevalence|rates|information)',
            'countries': r'(country|countries|global|worldwide|international|nation)',
            'help_me': r'(help me|i need help|i\'m struggling|i am struggling|i feel|i\'m feeling|i am feeling|i think i have|i\'m worried|i am worried)',
            'treatment': r'(treatment|therapy|medication|how to treat|how to manage|coping|strategies|help for)',
            'resources': r'(resources|hotline|crisis|emergency|support|therapy|therapist|counselor|counseling)'
        }
        
        # Information about conditions
        self.condition_info = {
            'anxiety': {
                'description': "Anxiety disorders involve excessive worry, fear, or nervousness that can interfere with daily activities. They are the most common mental health concern worldwide.",
                'facts': [
                    "Anxiety disorders affect about a third of people at some point in their lives.",
                    "Physical symptoms can include rapid heartbeat, sweating, and shortness of breath.",
                    "Types include generalized anxiety disorder, social anxiety, panic disorder, and specific phobias."
                ]
            },
            'depression': {
                'description': "Depression (major depressive disorder) causes persistent feelings of sadness and loss of interest. It affects how you feel, think, and behave and can lead to various emotional and physical problems.",
                'facts': [
                    "Depression affects an estimated 5% of adults worldwide.",
                    "It's more common in women than in men.",
                    "Depression is a leading cause of disability worldwide."
                ]
            },
            'bipolar': {
                'description': "Bipolar disorder causes extreme mood swings that include emotional highs (mania or hypomania) and lows (depression).",
                'facts': [
                    "Bipolar disorder affects about 2% of the world's population.",
                    "The average age of onset is about 25 years.",
                    "There are several types: Bipolar I, Bipolar II, and Cyclothymic Disorder."
                ]
            },
            'schizophrenia': {
                'description': "Schizophrenia is a serious mental disorder in which people interpret reality abnormally. It may result in some combination of hallucinations, delusions, and extremely disordered thinking.",
                'facts': [
                    "Schizophrenia affects about 1% of people worldwide.",
                    "Typically appears in late adolescence or early adulthood.",
                    "It's often misunderstood and stigmatized despite being a treatable medical condition."
                ]
            },
            'eating disorders': {
                'description': "Eating disorders are behavioral conditions characterized by severe and persistent disturbance in eating behaviors and associated distressing thoughts and emotions.",
                'facts': [
                    "Common types include anorexia nervosa, bulimia nervosa, and binge-eating disorder.",
                    "Eating disorders have the highest mortality rate of any mental illness.",
                    "They affect people of all genders, ages, races, ethnicities, body shapes and weights."
                ]
            }
        }
        
        # Crisis resources
        self.crisis_resources = [
            "National Suicide Prevention Lifeline: 1-800-273-8255 (Available 24/7)",
            "Crisis Text Line: Text HOME to 741741 (Available 24/7)",
            "SAMHSA's National Helpline: 1-800-662-HELP (4357) (Available 24/7)",
            "National Alliance on Mental Illness (NAMI) Helpline: 1-800-950-NAMI (6264)",
            "Please remember that in a serious emergency, you should call your local emergency services (like 911 in the US)"
        ]
        
        # Treatment information
        self.treatment_info = {
            'general': [
                "Common treatments for mental health conditions include therapy, medication, lifestyle changes, and support groups.",
                "Cognitive Behavioral Therapy (CBT) is one of the most effective therapeutic approaches for many conditions.",
                "Self-care practices like regular exercise, healthy eating, and good sleep habits can help manage symptoms."
            ],
            'anxiety': [
                "Cognitive Behavioral Therapy is highly effective for anxiety disorders.",
                "Medications like SSRIs or benzodiazepines may be prescribed.",
                "Mindfulness practices, breathing exercises, and regular physical activity can help manage symptoms."
            ],
            'depression': [
                "Treatments include psychotherapy (especially CBT and Interpersonal Therapy), medication (antidepressants), or a combination.",
                "Regular exercise has been shown to be effective for mild to moderate depression.",
                "In severe cases, treatments like electroconvulsive therapy (ECT) might be used."
            ],
            'bipolar': [
                "Mood stabilizers, antipsychotics, and antidepressants are common medications.",
                "Psychotherapy helps patients recognize triggers and develop coping strategies.",
                "Regular sleep patterns and stress management are crucial for stability."
            ],
            'schizophrenia': [
                "Antipsychotic medications are the foundation of treatment.",
                "Psychosocial interventions like therapy, family education and support are important.",
                "Coordinated specialty care combines medication, therapy, case management, and support."
            ],
            'eating disorders': [
                "Treatment typically includes nutritional counseling, psychotherapy, and sometimes medication.",
                "Family-based treatment is effective, especially for younger patients.",
                "Medical monitoring may be necessary in severe cases."
            ]
        }
    
    def preprocess_text(self, text):
        """Preprocess user input text"""
        text = text.lower().strip()
        return text
    
    def match_intent(self, text):
        """Match user input to intents"""
        for intent, pattern in self.intents.items():
            if re.search(pattern, text, re.IGNORECASE):
                return intent
        return 'unknown'
    
    def extract_condition(self, text):
        """Extract mental health condition from text"""
        for condition in self.assistant.conditions.keys():
            if condition in text or condition.rstrip('s') in text:
                return condition
            
            # Special case for eating disorders
            if 'eating' in text and ('disorder' in text or 'disorders' in text):
                return 'eating disorders'
        return None
    
    def identify_potential_symptoms(self, text):
        """Identify potential symptoms in user text"""
        potential_conditions = []
        
        for condition, info in self.assistant.conditions.items():
            symptoms = info['symptoms']
            matched_symptoms = []
            
            for symptom in symptoms:
                if symptom in text.lower():
                    matched_symptoms.append(symptom)
            
            if matched_symptoms:
                potential_conditions.append({
                    'condition': condition,
                    'matched_symptoms': matched_symptoms,
                    'matched_count': len(matched_symptoms)
                })
        
        # Sort by number of symptoms matched
        potential_conditions.sort(key=lambda x: x['matched_count'], reverse=True)
        return potential_conditions
    
    def get_text_similarity(self, text1, text2):
        """Calculate text similarity using cosine similarity"""
        try:
            texts = [text1, text2]
            vectorizer = CountVectorizer().fit_transform(texts)
            vectors = vectorizer.toarray()
            return cosine_similarity(vectors)[0][1]
        except:
            return 0
    
    def get_condition_response(self, condition):
        """Get response about a condition"""
        if condition in self.condition_info:
            info = self.condition_info[condition]
            response = f"{info['description']}\n\nFacts about {condition}:\n"
            for fact in info['facts']:
                response += f"- {fact}\n"
            
            # Add symptoms
            if condition in self.assistant.conditions:
                response += f"\nCommon symptoms of {condition} include:\n"
                for symptom in self.assistant.conditions[condition]['symptoms']:
                    response += f"- {symptom}\n"
                    
            return response
        else:
            return f"I don't have detailed information about {condition}, but I'd be happy to discuss anxiety, depression, bipolar disorder, schizophrenia, or eating disorders."
    
    def get_symptoms_response(self, condition):
        """Get response about symptoms of a condition"""
        if condition in self.assistant.conditions:
            symptoms = self.assistant.conditions[condition]['symptoms']
            response = f"Common symptoms of {condition} include:\n"
            for symptom in symptoms:
                response += f"- {symptom}\n"
            
            response += "\nPlease note that experiencing some of these symptoms doesn't necessarily mean you have this condition. Only a qualified mental health professional can make a proper diagnosis."
            
            return response
        else:
            return f"I don't have information about symptoms of {condition}. I can provide information about anxiety, depression, bipolar disorder, schizophrenia, or eating disorders."
    
    def get_stats_response(self, condition=None):
        """Get statistical insights about conditions"""
        if not self.has_ml_results:
            return "I don't have up-to-date statistics available. For current statistics on mental health conditions, please consult resources like the WHO, NIMH, or other reputable health organizations."
        
        insights = self.pipeline_results['insights']
        
        if condition is None:
            # General stats across conditions
            response = "Based on our analysis:\n\n"
            
            # Model performance
            response += "Model Performance:\n"
            for cond, perf in insights['model_performance'].items():
                response += f"- {cond.title()}: Best model = {perf['best_model']}, F1 Score = {perf['f1_score']:.4f}\n"
            
            # Country comparisons
            if 'country_comparison' in insights and insights['country_comparison']:
                response += "\nGlobal Prevalence Comparison (from most recent data):\n"
                countries = list(insights['country_comparison'].keys())
                conditions = list(insights['country_comparison'][countries[0]].keys())
                
                for cond in conditions:
                    response += f"\n{cond.title()} prevalence:\n"
                    country_rates = []
                    for country in countries:
                        if country in insights['country_comparison']:
                            if cond in insights['country_comparison'][country]:
                                value = insights['country_comparison'][country][cond]
                                country_rates.append((country, value))
                    
                    # Sort by prevalence rate
                    country_rates.sort(key=lambda x: x[1], reverse=True)
                    
                    # Show top 3
                    for i in range(min(3, len(country_rates))):
                        country, value = country_rates[i]
                        response += f"  - {country}: {value:.2f}%\n"
            
            return response
        else:
            # Stats for specific condition
            if condition not in insights['top_predictors']:
                return f"I don't have specific statistics about {condition}."
            
            response = f"Insights about {condition}:\n\n"
            
            # Top predictors
            response += "Top predictors:\n"
            for pred in insights['top_predictors'][condition]:
                response += f"- {pred['feature']} (importance: {pred['importance']:.4f})\n"
            
            # Country comparison
            if 'country_comparison' in insights and insights['country_comparison']:
                response += f"\n{condition.title()} prevalence by country:\n"
                country_rates = []
                
                for country, cond_data in insights['country_comparison'].items():
                    if condition in cond_data:
                        country_rates.append((country, cond_data[condition]))
                
                # Sort by prevalence rate
                country_rates.sort(key=lambda x: x[1], reverse=True)
                
                for country, value in country_rates:
                    response += f"- {country}: {value:.2f}%\n"
            
            return response
    
    def get_country_response(self):
        """Get response about country comparisons"""
        if not self.has_ml_results:
            return "I don't have up-to-date country comparison data available."
        
        insights = self.pipeline_results['insights']
        
        if 'country_comparison' not in insights or not insights['country_comparison']:
            return "I don't have country comparison data available."
        
        countries = list(insights['country_comparison'].keys())
        conditions = list(insights['country_comparison'][countries[0]].keys())
        
        response = "Mental Health Condition Prevalence by Country:\n\n"
        
        # For each country, list prevalence of conditions
        for country in countries:
            if country == "Global":
                continue
                
            response += f"{country}:\n"
            country_data = []
            
            for condition in conditions:
                if condition in insights['country_comparison'][country]:
                    value = insights['country_comparison'][country][condition]
                    country_data.append((condition, value))
            
            # Sort by prevalence
            country_data.sort(key=lambda x: x[1], reverse=True)
            
            for condition, value in country_data:
                response += f"- {condition.title()}: {value:.2f}%\n"
            
            response += "\n"
        
        return response
    
    def get_help_response(self, text):
        """Generate response when user is asking for help"""
        # Identify potential conditions
        potential_conditions = self.identify_potential_symptoms(text)
        
        if not potential_conditions:
            return ("I'm not sure what you might be experiencing based on what you've shared. " 
                    "If you're struggling with your mental health, it's important to reach out to a healthcare professional. " 
                    "Would you like me to provide some general resources for mental health support?")
        
        # Get the condition with the most symptom matches
        top_condition = potential_conditions[0]['condition']
        matched_symptoms = potential_conditions[0]['matched_symptoms']
        
        response = f"Based on what you've shared, I notice you mentioned some experiences that can be associated with {top_condition}. "
        response += f"Specifically, you mentioned: {', '.join(matched_symptoms)}.\n\n"
        
        response += ("Please remember that I cannot diagnose conditions, and experiencing these symptoms doesn't necessarily mean "
                    f"you have {top_condition}. Only a qualified healthcare professional can provide a proper assessment.\n\n")
        
        # Add treatment suggestions
        if top_condition in self.treatment_info:
            response += f"If you're concerned about {top_condition}, here are some general approaches that help many people:\n"
            for item in self.treatment_info[top_condition]:
                response += f"- {item}\n"
        else:
            response += "If you're concerned about your mental health, here are some general approaches that help many people:\n"
            for item in self.treatment_info['general']:
                response += f"- {item}\n"
        
        response += "\nWould you like me to provide some resources for mental health support?"
        
        return response
    
    def get_treatment_response(self, condition=None):
        """Get treatment information for a condition"""
        if condition is None:
            # General treatment information
            response = "General approaches to treating mental health conditions:\n\n"
            for item in self.treatment_info['general']:
                response += f"- {item}\n"
        elif condition in self.treatment_info:
            response = f"Treatment approaches for {condition}:\n\n"
            for item in self.treatment_info[condition]:
                response += f"- {item}\n"
        else:
            response = ("I don't have specific treatment information for that condition. "
                       "Here are general approaches to treating mental health conditions:\n\n")
            for item in self.treatment_info['general']:
                response += f"- {item}\n"
        
        response += "\nRemember that treatment should always be guided by a qualified healthcare professional."
        return response
    
    def get_resources_response(self):
        """Provide mental health resources"""
        response = "Here are some mental health resources that might be helpful:\n\n"
        
        # Crisis resources
        response += "Crisis Resources:\n"
        for resource in self.crisis_resources:
            response += f"- {resource}\n"
        
        # General resources
        response += "\nGeneral Mental Health Resources:\n"
        response += "- National Alliance on Mental Illness (NAMI): www.nami.org\n"
        response += "- Mental Health America: www.mhanational.org\n"
        response += "- Psychology Today Therapist Finder: www.psychologytoday.com/us/therapists\n"
        response += "- National Institute of Mental Health: www.nimh.nih.gov\n"
        
        response += "\nRemember that in a serious emergency, you should call your local emergency services."
        
        return response
    
    def generate_response(self, user_input):
        """Generate chatbot response based on user input"""
        # Preprocess text
        text = self.preprocess_text(user_input)
        
        # Check for greeting
        if text in self.greetings:
            return "Hello! I'm a mental health chatbot. How can I help you today? I can provide information about mental health conditions, symptoms, treatments, and resources."
        
        # Check for goodbye
        if text in self.goodbyes:
            return "Take care! Remember that if you're struggling with your mental health, it's important to reach out to a healthcare professional."
        
        # Match intent
        intent = self.match_intent(text)
        condition = self.extract_condition(text)
        
        # Generate response based on intent
        if intent == 'greeting':
            return "Hello! I'm a mental health chatbot. How can I help you today? I can provide information about mental health conditions, symptoms, treatments, and resources."
        
        elif intent == 'goodbye':
            return "Take care! Remember that if you're struggling with your mental health, it's important to reach out to a healthcare professional."
        
        elif intent == 'how_are_you':
            return "I'm just a chatbot, but I'm here and ready to help you with mental health information. How are you feeling today?"
        
        elif intent == 'condition_info':
            if condition:
                return self.get_condition_response(condition)
            else:
                return "I can provide information about anxiety, depression, bipolar disorder, schizophrenia, or eating disorders. Which condition would you like to learn about?"
        
        elif intent == 'symptoms':
            if condition:
                return self.get_symptoms_response(condition)
            else:
                return "I can provide information about symptoms of anxiety, depression, bipolar disorder, schizophrenia, or eating disorders. Which condition are you interested in?"
        
        elif intent == 'stats':
            return self.get_stats_response(condition)
        
        elif intent == 'countries':
            return self.get_country_response()
        
        elif intent == 'help_me':
            return self.get_help_response(text)
        
        elif intent == 'treatment':
            return self.get_treatment_response(condition)
        
        elif intent == 'resources':
            return self.get_resources_response()
        
        else:
            # Fallback: try to identify if asking about symptoms
            potential_conditions = self.identify_potential_symptoms(text)
            
            if potential_conditions:
                return self.get_help_response(text)
            
            return "I'm not sure I understand. I can provide information about mental health conditions like anxiety, depression, bipolar disorder, schizophrenia, or eating disorders. I can also tell you about symptoms, treatments, and resources. How can I help you?"


def run_pipeline():
    """
    Run the mental health analysis pipeline:
    1. Load and initialize the assistant
    2. Preprocess the data
    3. Train models for each mental health condition
    4. Evaluate the models
    5. Generate insights
    
    Returns:
        dict: Dictionary containing results from the pipeline run
    """
    print("Starting mental health ML pipeline...")
    
    # Initialize the assistant
    assistant = MentalHealthAssistant()
    
    # Check if data is available
    if assistant.disorders_df is None:
        print("Warning: No disorders data available. Using synthetic data for demonstration.")
        # Create synthetic data for demonstration
        assistant.disorders_df = create_synthetic_data()
        assistant.disorders_latest_year = 2023
        assistant.disorders_df_latest = assistant.disorders_df[assistant.disorders_df['Year'] == 2023]
    
    # Step 1: Preprocess data
    print("\nStep 1: Preprocessing data...")
    X, y_dict = preprocess_data(assistant)
    
    # Step 2: Train and evaluate models for each condition
    print("\nStep 2: Training and evaluating models...")
    model_results = {}
    
    for condition in assistant.conditions:
        print(f"\nProcessing {condition}...")
        column_name = assistant.conditions[condition]['column']
        
        # Skip if no data available for this condition
        if column_name not in y_dict:
            print(f"No data available for {condition}. Skipping.")
            continue
            
        y = y_dict[column_name]
        
        # Split data
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
        
        # Scale features
        scaler = StandardScaler()
        X_train_scaled = scaler.fit_transform(X_train)
        X_test_scaled = scaler.transform(X_test)
        
        # Train Random Forest
        rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
        rf_model.fit(X_train_scaled, y_train)
        rf_preds = rf_model.predict(X_test_scaled)
        
        # Train Logistic Regression
        lr_model = LogisticRegression(max_iter=1000, random_state=42)
        lr_model.fit(X_train_scaled, y_train)
        lr_preds = lr_model.predict(X_test_scaled)
        
        # Evaluate models
        rf_metrics = {
            'accuracy': accuracy_score(y_test, rf_preds),
            'precision': precision_score(y_test, rf_preds, zero_division=0),
            'recall': recall_score(y_test, rf_preds, zero_division=0),
            'f1': f1_score(y_test, rf_preds, zero_division=0)
        }
        
        lr_metrics = {
            'accuracy': accuracy_score(y_test, lr_preds),
            'precision': precision_score(y_test, lr_preds, zero_division=0),
            'recall': recall_score(y_test, lr_preds, zero_division=0),
            'f1': f1_score(y_test, lr_preds, zero_division=0)
        }
        
        # Store results
        model_results[condition] = {
            'random_forest': rf_metrics,
            'logistic_regression': lr_metrics,
            'feature_importance': {
                'features': X.columns.tolist(),
                'importance': rf_model.feature_importances_.tolist()
            }
        }
    
    # Step 3: Generate insights
    print("\nStep 3: Generating insights...")
    insights = generate_insights(assistant, model_results)
    
    # Prepare results
    results = {
        'model_results': model_results,
        'insights': insights
    }
    
    return results


def preprocess_data(assistant):
    """
    Preprocess the data for ML modeling:
    1. Select relevant features
    2. Handle missing values
    3. Encode categorical variables
    4. Create target variables
    
    Args:
        assistant: Initialized MentalHealthAssistant instance
        
    Returns:
        tuple: (X, y_dict) containing features and target variables
    """
    df = assistant.disorders_df.copy()
    
    # Create feature dataframe
    features = []
    
    # Add demographic features
    features.append(pd.get_dummies(df['Entity'], prefix='country', drop_first=True))
    
    # Add year as a feature
    features.append(df[['Year']])
    
    # Add additional computed features
    if 'Population' in df.columns and 'GDP' in df.columns:
        df['GDP_per_capita'] = df['GDP'] / df['Population']
        features.append(df[['GDP_per_capita']])
    
    # Combine all features
    X = pd.concat(features, axis=1)
    
    # Create target variables for each condition
    y_dict = {}
    for condition, info in assistant.conditions.items():
        column = info['column']
        if column in df.columns:
            # Binarize the target - assuming values above median indicate presence of condition
            threshold = df[column].median()
            y_dict[column] = (df[column] > threshold).astype(int)
    
    return X, y_dict


def create_synthetic_data():
    """Create synthetic data for demonstration when real data is not available"""
    import pandas as pd
    import numpy as np
    np.random.seed(42)
    
    countries = ["Australia", "United States", "United Kingdom", "Canada", "India", "Global"]
    years = list(range(2010, 2024))
    
    # Create empty dataframe
    rows = []
    
    for country in countries:
        for year in years:
            # Base values for this country
            base_anxiety = np.random.uniform(2, 10)
            base_depression = np.random.uniform(3, 12)
            base_bipolar = np.random.uniform(0.5, 3)
            base_schizophrenia = np.random.uniform(0.2, 1.5)
            base_eating = np.random.uniform(1, 5)
            
            # Add trend over years
            year_factor = (year - 2010) / 13  # Normalize to 0-1
            
            # Population and GDP
            population = np.random.randint(1000000, 1000000000)
            gdp = population * np.random.uniform(1000, 50000)
            
            row = {
                'Entity': country,
                'Year': year,
                'Population': population,
                'GDP': gdp,
                'Anxiety': base_anxiety + year_factor * np.random.uniform(0, 4),
                'Major depression': base_depression + year_factor * np.random.uniform(0, 3),
                'Bipolar': base_bipolar + year_factor * np.random.uniform(0, 1),
                'Schizophrenia': base_schizophrenia + year_factor * np.random.uniform(0, 0.5),
                'Eating Disorders': base_eating + year_factor * np.random.uniform(0, 2)
            }
            
            rows.append(row)
    
    df = pd.DataFrame(rows)
    return df


def generate_insights(assistant, model_results):
    """
    Generate insights from the trained models and data
    
    Args:
        assistant: Initialized MentalHealthAssistant instance
        model_results: Dictionary containing model evaluation results
        
    Returns:
        dict: Dictionary containing insights from the analysis
    """
    insights = {
        'top_predictors': {},
        'model_performance': {},
        'country_comparison': {}
    }
    
    # Find top predictors for each condition
    for condition, results in model_results.items():
        feature_importance = results['feature_importance']
        features = feature_importance['features']
        importance = feature_importance['importance']
        
        # Get top 5 features
        top_indices = sorted(range(len(importance)), key=lambda i: importance[i], reverse=True)[:5]
        insights['top_predictors'][condition] = [
            {'feature': features[i], 'importance': importance[i]} 
            for i in top_indices
        ]
        
    # Compare model performance
    for condition, results in model_results.items():
        rf_f1 = results['random_forest']['f1']
        lr_f1 = results['logistic_regression']['f1']
        
        insights['model_performance'][condition] = {
            'best_model': 'random_forest' if rf_f1 > lr_f1 else 'logistic_regression',
            'f1_score': max(rf_f1, lr_f1)
        }
    
    # Add country comparison if data allows
    if assistant.disorders_df_latest is not None:
        latest_data = assistant.disorders_df_latest
        
        # Get average prevalence by country for all conditions
        country_data = {}
        
        for country in assistant.countries:
            country_row = latest_data[latest_data['Entity'] == country]
            if len(country_row) > 0:
                condition_values = {}
                for condition, info in assistant.conditions.items():
                    column = info['column']
                    if column in country_row.columns:
                        condition_values[condition] = country_row[column].values[0]
                
                country_data[country] = condition_values
        
        insights['country_comparison'] = country_data
    
    return insights

# Data Generation & Correlation Analysis Code Notes:

- **Synthetic data creation**: Generates realistic mental health data with predefined correlations
- **Correlation structure**: Implements strong anxiety-depression correlation (0.7) and other realistic relationships
- **Country-specific effects**: Models higher anxiety/depression in US/UK, higher eating disorders in India
- **Temporal trends**: Includes increasing prevalence trends over time (2010-2023)
- **Economic factors**: Models GDP per capita relationships with mental health conditions
- **Visualization functions**: Creates correlation heatmaps with masking options
- **Cholesky decomposition**: Used to generate correlated random variables
- **Population modeling**: Uses realistic population values for each country
- **Data export**: Saves generated data to CSV for further analysis

In [2]:

# Set up plot style
plt.style.use('ggplot')
sns.set_palette("colorblind")
plt.rcParams.update({'font.size': 8})  # Reduce the default font size

# Create a directory for saving visualizations
if not os.path.exists('visualizations'):
    os.makedirs('visualizations')

def create_realistic_mental_health_data():
    """
    Create synthetic mental health data with realistic correlations between variables
    """
    np.random.seed(42)
    
    countries = ["Australia", "United States", "United Kingdom", "Canada", "India", "Global"]
    years = list(range(2010, 2024))
    
    n_samples = len(countries) * len(years)
    
    # Create base variables with controlled correlations
    # 1. Create independent base variables
    base_gdp = np.random.normal(size=n_samples)
    time_trend = np.linspace(0, 1, n_samples)  # Increasing trend over time
    
    # 2. Create correlated mental health conditions
    # We'll create base mental health variables with specific correlations
    
    # Correlation matrix (approximate) for mental health conditions
    # Order: anxiety, depression, bipolar, schizophrenia, eating_disorders
    corr_matrix = np.array([
        [1.0, 0.7, 0.4, 0.3, 0.35],  # anxiety
        [0.7, 1.0, 0.45, 0.35, 0.4],  # depression
        [0.4, 0.45, 1.0, 0.5, 0.25],  # bipolar
        [0.3, 0.35, 0.5, 1.0, 0.2],   # schizophrenia
        [0.35, 0.4, 0.25, 0.2, 1.0]   # eating disorders
    ])
    
    # Cholesky decomposition for generating correlated variables
    L = np.linalg.cholesky(corr_matrix)
    
    # Generate uncorrelated variables
    uncorrelated = np.random.normal(size=(5, n_samples))
    
    # Generate correlated variables according to the correlation matrix
    correlated = L @ uncorrelated
    
    # Extract the correlated variables
    anxiety_base = correlated[0]
    depression_base = correlated[1]
    bipolar_base = correlated[2]
    schizophrenia_base = correlated[3]
    eating_disorders_base = correlated[4]
    
    # Now add the effects of GDP and time trend to each condition
    # Positive correlations with time (increasing prevalence)
    # Mixed correlations with GDP (some positive, some negative)
    
    anxiety = 5 + 0.5 * time_trend + 0.3 * base_gdp + 2 * anxiety_base
    depression = 6 + 0.6 * time_trend + 0.2 * base_gdp + 2.5 * depression_base
    bipolar = 1.5 + 0.2 * time_trend - 0.1 * base_gdp + 0.8 * bipolar_base
    schizophrenia = 0.9 + 0.1 * time_trend - 0.05 * base_gdp + 0.4 * schizophrenia_base
    eating_disorders = 2.5 + 0.3 * time_trend + 0.15 * base_gdp + 1.2 * eating_disorders_base
    
    # Convert to reasonable ranges
    anxiety = np.clip(anxiety, 2, 15)
    depression = np.clip(depression, 3, 18)
    bipolar = np.clip(bipolar, 0.3, 5)
    schizophrenia = np.clip(schizophrenia, 0.1, 3)
    eating_disorders = np.clip(eating_disorders, 0.5, 8)
    
    # Create the DataFrame
    index = 0
    rows = []
    
    for country in countries:
        country_factor = np.random.normal(scale=0.2)  # Country-specific random effect
        
        for year in years:
            year_index = year - 2010
            sample_index = index
            
            # Add some country-specific effects
            country_anxiety_effect = 1.0
            country_depression_effect = 1.0
            country_bipolar_effect = 1.0
            country_schizophrenia_effect = 1.0
            country_eating_effect = 1.0
            
            # Adjust for specific country effects
            if country == "United States":
                country_anxiety_effect = 1.2
                country_depression_effect = 1.3
            elif country == "United Kingdom":
                country_anxiety_effect = 1.1
                country_depression_effect = 1.2
            elif country == "India":
                country_anxiety_effect = 0.8
                country_depression_effect = 0.7
                country_eating_effect = 1.3
            
            # Population and GDP
            base_population = 10**np.random.uniform(6, 9)  # Between 1M and 1B
            if country == "United States":
                base_population = 331000000
            elif country == "United Kingdom":
                base_population = 67000000
            elif country == "India":
                base_population = 1380000000
            elif country == "Canada":
                base_population = 38000000
            elif country == "Australia":
                base_population = 25000000
            elif country == "Global":
                base_population = 7800000000
                
            population = base_population * (1 + 0.01 * (year - 2010))  # Small growth over time
            
            # GDP varies by country
            if country == "United States":
                gdp_per_capita = 60000 * (1 + 0.02 * (year - 2010) + 0.01 * np.random.normal())
            elif country == "United Kingdom":
                gdp_per_capita = 40000 * (1 + 0.015 * (year - 2010) + 0.01 * np.random.normal())
            elif country == "India":
                gdp_per_capita = 2000 * (1 + 0.05 * (year - 2010) + 0.01 * np.random.normal())
            elif country == "Canada":
                gdp_per_capita = 45000 * (1 + 0.018 * (year - 2010) + 0.01 * np.random.normal())
            elif country == "Australia":
                gdp_per_capita = 50000 * (1 + 0.02 * (year - 2010) + 0.01 * np.random.normal())
            elif country == "Global":
                gdp_per_capita = 12000 * (1 + 0.025 * (year - 2010) + 0.01 * np.random.normal())
            
            gdp = population * gdp_per_capita
            
            # Apply country-specific effects
            anxiety_val = anxiety[sample_index] * country_anxiety_effect
            depression_val = depression[sample_index] * country_depression_effect
            bipolar_val = bipolar[sample_index] * country_bipolar_effect
            schizophrenia_val = schizophrenia[sample_index] * country_schizophrenia_effect
            eating_val = eating_disorders[sample_index] * country_eating_effect
            
            row = {
                'Entity': country,
                'Year': year,
                'Population': population,
                'GDP': gdp,
                'GDP_per_capita': gdp_per_capita,
                'Anxiety': anxiety_val,
                'Major depression': depression_val,
                'Bipolar': bipolar_val,
                'Schizophrenia': schizophrenia_val,
                'Eating Disorders': eating_val
            }
            
            rows.append(row)
            index += 1
    
    df = pd.DataFrame(rows)
    return df

def plot_correlation_analysis(df):
    """Create correlation heatmap for mental health data with realistic correlations"""
    # Select relevant columns
    columns_to_correlate = ['Year', 'GDP_per_capita', 
                           'Anxiety', 'Major depression', 'Bipolar', 
                           'Schizophrenia', 'Eating Disorders']
    
    # Create correlation matrix
    corr_matrix = df[columns_to_correlate].corr()
    
    # Print correlation matrix for reference
    print("Correlation Matrix:")
    print(corr_matrix.round(2))
    
    # Plot heatmap
    plt.figure(figsize=(8, 6))  # Slightly larger for readability
    
    # Create mask for upper triangle
    mask = np.triu(np.ones_like(corr_matrix, dtype=bool))
    
    # Use a diverging color palette
    cmap = sns.diverging_palette(230, 20, as_cmap=True)
    
    # Create heatmap
    sns.heatmap(corr_matrix, mask=mask, cmap=cmap, vmax=1, vmin=-1, center=0,
                square=True, linewidths=.5, annot=True, fmt='.2f', annot_kws={"size": 8})
    
    plt.title('Correlation Between Economic Factors and Mental Health', fontsize=12)
    plt.tight_layout()
    plt.savefig('visualizations/correlation_heatmap_realistic.png', dpi=150)
    plt.close()
    
    # Return the correlation matrix for reference
    return corr_matrix

def plot_all_correlations(df):
    """Plot all correlations without mask for complete visibility"""
    # Select relevant columns
    columns_to_correlate = ['Year', 'GDP_per_capita', 
                           'Anxiety', 'Major depression', 'Bipolar', 
                           'Schizophrenia', 'Eating Disorders']
    
    # Create correlation matrix
    corr_matrix = df[columns_to_correlate].corr()
    
    # Plot heatmap without mask (showing full matrix)
    plt.figure(figsize=(8, 6))
    
    # Use a diverging color palette
    cmap = sns.diverging_palette(230, 20, as_cmap=True)
    
    # Create heatmap (no mask)
    sns.heatmap(corr_matrix, cmap=cmap, vmax=1, vmin=-1, center=0,
                square=True, linewidths=.5, annot=True, fmt='.2f', annot_kws={"size": 8})
    
    plt.title('Complete Correlation Matrix - Mental Health Factors', fontsize=12)
    plt.tight_layout()
    plt.savefig('visualizations/full_correlation_matrix.png', dpi=150)
    plt.close()

def main():
    # Create synthetic data with realistic correlations
    df = create_realistic_mental_health_data()
    
    # Save the synthetic data to CSV
    df.to_csv('mental_health_data_realistic.csv', index=False)
    print("Realistic mental health data created and saved to mental_health_data_realistic.csv")
    
    # Generate correlation plot with realistic correlations
    corr_matrix = plot_correlation_analysis(df)
    
    # Also create a full correlation matrix without masking
    plot_all_correlations(df)
    
    print("\nRealistic correlation matrix generated successfully!")

if __name__ == "__main__":
    main()

Realistic mental health data created and saved to mental_health_data_realistic.csv
Correlation Matrix:
                  Year  GDP_per_capita  Anxiety  Major depression  Bipolar  \
Year              1.00            0.12     0.11              0.18    -0.09   
GDP_per_capita    0.12            1.00     0.06              0.19    -0.23   
Anxiety           0.11            0.06     1.00              0.76     0.36   
Major depression  0.18            0.19     0.76              1.00     0.45   
Bipolar          -0.09           -0.23     0.36              0.45     1.00   
Schizophrenia     0.05           -0.15     0.44              0.48     0.56   
Eating Disorders -0.12           -0.35     0.14              0.07     0.26   

                  Schizophrenia  Eating Disorders  
Year                       0.05             -0.12  
GDP_per_capita            -0.15             -0.35  
Anxiety                    0.44              0.14  
Major depression           0.48              0.07  
Bipolar     

In [3]:
# Set up the visualization directory
if not os.path.exists('visualizations'):
    os.makedirs('visualizations')

# Set global matplotlib parameters for smaller figures
plt.rcParams.update({
    'figure.figsize': (5, 4),  # Smaller default figure size
    'font.size': 8,           # Smaller font size
    'axes.titlesize': 10,     # Smaller title
    'axes.labelsize': 8,      # Smaller axis labels
    'xtick.labelsize': 7,     # Smaller x tick labels
    'ytick.labelsize': 7,     # Smaller y tick labels
    'legend.fontsize': 7      # Smaller legend
})

# Define the correlation matrix
correlation_data = [
    [1.00, 0.12, 0.11, 0.18, -0.09, 0.05, -0.12],
    [0.12, 1.00, 0.06, 0.19, -0.23, -0.15, -0.35],
    [0.11, 0.06, 1.00, 0.76, 0.36, 0.44, 0.14],
    [0.18, 0.19, 0.76, 1.00, 0.45, 0.48, 0.07],
    [-0.09, -0.23, 0.36, 0.45, 1.00, 0.56, 0.26],
    [0.05, -0.15, 0.44, 0.48, 0.56, 1.00, 0.18],
    [-0.12, -0.35, 0.14, 0.07, 0.26, 0.18, 1.00]
]

# Create a DataFrame with the correlation matrix
labels = ['Year', 'GDP_per_capita', 'Anxiety', 'Major depression', 'Bipolar', 'Schizophrenia', 'Eating Disorders']
corr_df = pd.DataFrame(correlation_data, columns=labels, index=labels)

# Create a heatmap of the correlation matrix
def create_correlation_heatmap():
    plt.figure(figsize=(5, 4))  # Smaller figure size
    
    # Create the heatmap
    sns.heatmap(corr_df, annot=True, fmt=".2f", cmap="RdBu_r", vmin=-1, vmax=1, 
                linewidths=0.5, cbar_kws={"shrink": 0.8}, annot_kws={"size": 6})
    
    plt.title('Mental Health Factors Correlation Matrix', fontsize=10)
    plt.tight_layout()
    plt.savefig('visualizations/full_correlation_heatmap.png', dpi=150)  # Lower DPI
    plt.close()
    
    print("Correlation heatmap created: visualizations/full_correlation_heatmap.png")

# Create a heatmap with triangular mask
def create_masked_heatmap():
    plt.figure(figsize=(5, 4))  # Smaller figure size
    
    # Create a mask for the upper triangle
    mask = np.triu(np.ones_like(corr_df, dtype=bool))
    
    # Create the heatmap with mask
    sns.heatmap(corr_df, mask=mask, annot=True, fmt=".2f", cmap="RdBu_r", 
                vmin=-1, vmax=1, linewidths=0.5, cbar_kws={"shrink": 0.8}, annot_kws={"size": 6})
    
    plt.title('Mental Health Factors Correlation Matrix (Lower Triangle)', fontsize=10)
    plt.tight_layout()
    plt.savefig('visualizations/lower_triangle_heatmap.png', dpi=150)  # Lower DPI
    plt.close()
    
    print("Lower triangle heatmap created: visualizations/lower_triangle_heatmap.png")

# Create horizontal bar chart for Depression correlations
def create_depression_correlation_chart():
    # Extract correlations with Major depression
    depression_corrs = corr_df['Major depression'].drop('Major depression').sort_values(ascending=False)
    
    plt.figure(figsize=(5, 3))  # Smaller figure size
    
    # Create a colormap based on correlation values
    colors = ['#d7191c' if x < 0 else '#2c7bb6' for x in depression_corrs]
    
    # Create the horizontal bar chart
    bars = plt.barh(depression_corrs.index, depression_corrs, color=colors)
    
    # Add a vertical line at x=0
    plt.axvline(x=0, color='black', linestyle='-', alpha=0.3)
    
    # Add labels with correlation values
    for i, bar in enumerate(bars):
        width = bar.get_width()
        label_x_pos = width + 0.02 if width > 0 else width - 0.08
        plt.text(label_x_pos, i, f'{width:.2f}', va='center', fontsize=6)
    
    plt.title('Correlation with Major Depression', fontsize=10)
    plt.xlabel('Correlation Coefficient', fontsize=8)
    plt.xlim(-0.5, 1.0)  # Set x-axis limits
    plt.grid(axis='x', alpha=0.3)
    plt.tight_layout()
    plt.savefig('visualizations/depression_correlations.png', dpi=150)  # Lower DPI
    plt.close()
    
    print("Depression correlations chart created: visualizations/depression_correlations.png")

# Create horizontal bar chart for Anxiety correlations
def create_anxiety_correlation_chart():
    # Extract correlations with Anxiety
    anxiety_corrs = corr_df['Anxiety'].drop('Anxiety').sort_values(ascending=False)
    
    plt.figure(figsize=(5, 3))  # Smaller figure size
    
    # Create a colormap based on correlation values
    colors = ['#d7191c' if x < 0 else '#2c7bb6' for x in anxiety_corrs]
    
    # Create the horizontal bar chart
    bars = plt.barh(anxiety_corrs.index, anxiety_corrs, color=colors)
    
    # Add a vertical line at x=0
    plt.axvline(x=0, color='black', linestyle='-', alpha=0.3)
    
    # Add labels with correlation values
    for i, bar in enumerate(bars):
        width = bar.get_width()
        label_x_pos = width + 0.02 if width > 0 else width - 0.08
        plt.text(label_x_pos, i, f'{width:.2f}', va='center', fontsize=6)
    
    plt.title('Correlation with Anxiety', fontsize=10)
    plt.xlabel('Correlation Coefficient', fontsize=8)
    plt.xlim(-0.5, 1.0)  # Set x-axis limits
    plt.grid(axis='x', alpha=0.3)
    plt.tight_layout()
    plt.savefig('visualizations/anxiety_correlations.png', dpi=150)  # Lower DPI
    plt.close()
    
    print("Anxiety correlations chart created: visualizations/anxiety_correlations.png")

# Create a visual comparison of condition interrelationships
def create_condition_interrelationship_chart():
    # Extract just the mental health conditions
    conditions = ['Anxiety', 'Major depression', 'Bipolar', 'Schizophrenia', 'Eating Disorders']
    condition_corrs = corr_df.loc[conditions, conditions]
    
    plt.figure(figsize=(4.5, 4))  # Smaller figure size
    
    # Create a custom mask to show only the lower triangle
    mask = np.triu(np.ones_like(condition_corrs, dtype=bool))
    
    # Add a custom annotation function to make the diagonal elements clearer
    def annotation_func(val):
        if val == 1.0:
            return "1.00"
        else:
            return f"{val:.2f}"
    
    annotations = condition_corrs.round(2).applymap(annotation_func)
    
    # Create the heatmap
    sns.heatmap(condition_corrs, mask=mask, annot=annotations, fmt="", cmap="Blues", 
                vmin=0, vmax=1, linewidths=0.5, cbar_kws={"shrink": 0.8}, annot_kws={"size": 6})
    
    plt.title('Interrelationships Between Mental Health Conditions', fontsize=10)
    plt.tight_layout()
    plt.savefig('visualizations/condition_interrelationships.png', dpi=150)  # Lower DPI
    plt.close()
    
    print("Condition interrelationships chart created: visualizations/condition_interrelationships.png")

# Create a clustermap to show hierarchical relationships
def create_condition_clustermap():
    # Use just the mental health conditions for clustering
    conditions = ['Anxiety', 'Major depression', 'Bipolar', 'Schizophrenia', 'Eating Disorders']
    condition_corrs = corr_df.loc[conditions, conditions]
    
    # Create the clustermap with smaller size
    cluster = sns.clustermap(condition_corrs, cmap="Blues", vmin=0, vmax=1,
                          annot=True, fmt=".2f", figsize=(4.5, 4.5),
                          linewidths=0.5, cbar_kws={"shrink": 0.8}, 
                          annot_kws={"size": 6})
    
    # Adding title to clustermap
    plt.suptitle('Hierarchical Clustering of Mental Health Conditions', fontsize=10, y=0.95)
    plt.savefig('visualizations/condition_clustermap.png', dpi=150)  # Lower DPI
    plt.close()
    
    print("Condition clustermap created: visualizations/condition_clustermap.png")

# Main function to run all visualizations
def main():
    print("Creating Mental Health Correlation Visualizations (Smaller Sizes)...")
    
    # Create all the visualizations
    create_correlation_heatmap()
    create_masked_heatmap()
    create_depression_correlation_chart()
    create_anxiety_correlation_chart()
    create_condition_interrelationship_chart()
    create_condition_clustermap()
    
    print("\nAll visualizations have been created successfully with appropriate sizes!")

if __name__ == "__main__":
    main()

Creating Mental Health Correlation Visualizations (Smaller Sizes)...
Correlation heatmap created: visualizations/full_correlation_heatmap.png
Lower triangle heatmap created: visualizations/lower_triangle_heatmap.png
Depression correlations chart created: visualizations/depression_correlations.png
Anxiety correlations chart created: visualizations/anxiety_correlations.png
Condition interrelationships chart created: visualizations/condition_interrelationships.png
Condition clustermap created: visualizations/condition_clustermap.png

All visualizations have been created successfully with appropriate sizes!


  annotations = condition_corrs.round(2).applymap(annotation_func)


In [4]:
def run_mental_health_assistant():
    """
    Run the Mental Health Assistant with symptom analysis capabilities.
    This version analyzes user symptoms to identify potential mental health conditions.
    """
    
    # Define mental health conditions and their associated symptoms
    conditions = {
        'anxiety': {
            'symptoms': [
                'excessive worry', 'restlessness', 'fatigue', 'difficulty concentrating', 
                'irritability', 'muscle tension', 'sleep problems', 'panic attacks',
                'feeling on edge', 'sense of impending danger', 'increased heart rate',
                'nervousness', 'feeling nervous', 'anxiety', 'anxious', 'worry', 'worried'
            ],
            'description': "Anxiety disorders involve excessive worry, fear, or nervousness that can interfere with daily activities. They are the most common mental health concern worldwide.",
            'statistics': {
                'Global': 3.8,
                'United States': 6.3,
                'United Kingdom': 5.9,
                'Canada': 4.9,
                'Australia': 5.2,
                'India': 5.2
            },
            'strategies': [
                "Practicing relaxation techniques such as deep breathing, meditation, or progressive muscle relaxation",
                "Regular exercise, which releases tension and stress-reducing endorphins",
                "Maintaining adequate sleep and a balanced diet",
                "Cognitive-behavioral therapy (CBT) to identify and change negative thought patterns",
                "In some cases, medication prescribed by a healthcare provider"
            ]
        },
        'depression': {
            'symptoms': [
                'persistent sadness', 'loss of interest', 'appetite changes', 'sleep changes',
                'fatigue', 'worthlessness', 'difficulty concentrating', 'suicidal thoughts',
                'feeling empty', 'hopelessness', 'loss of energy', 'moving slowly',
                'depression', 'depressed', 'sad', 'sadness', 'low mood', 'lack of motivation'
            ],
            'description': "Depression (major depressive disorder) causes persistent feelings of sadness and loss of interest. It affects how you feel, think, and behave and can lead to various emotional and physical problems.",
            'statistics': {
                'Global': 3.4,
                'United States': 7.1,
                'United Kingdom': 4.5,
                'Canada': 5.4,
                'Australia': 5.9,
                'India': 3.3
            },
            'strategies': [
                "Psychotherapy (especially CBT and Interpersonal Therapy)",
                "Medication (antidepressants) when prescribed by a healthcare provider",
                "Regular physical activity, which has been shown to reduce symptoms",
                "Maintaining social connections and talking about your feelings",
                "Establishing routines and setting achievable goals"
            ]
        },
        'bipolar': {
            'symptoms': [
                'mood swings', 'elevated mood', 'decreased need for sleep', 'racing thoughts',
                'poor decision making', 'irritability', 'inflated self-esteem', 'depressive episodes',
                'excessive talking', 'increased energy', 'risky behavior', 'high and low moods',
                'bipolar', 'mania', 'manic', 'hypomania'
            ],
            'description': "Bipolar disorder causes extreme mood swings that include emotional highs (mania or hypomania) and lows (depression).",
            'statistics': {
                'Global': 0.7,
                'United States': 2.8,
                'United Kingdom': 2.0,
                'Canada': 2.2,
                'Australia': 1.8,
                'India': 0.6
            },
            'strategies': [
                "Mood stabilizers, antipsychotics, and sometimes antidepressants prescribed by a psychiatrist",
                "Regular therapy to develop coping strategies and recognize warning signs",
                "Maintaining a consistent sleep-wake schedule",
                "Managing stress through mindfulness and relaxation techniques",
                "Avoiding alcohol and recreational drugs which can trigger episodes"
            ]
        },
        'ptsd': {
            'symptoms': [
                'flashbacks', 'nightmares', 'severe anxiety', 'uncontrollable thoughts',
                'avoidance', 'negative thoughts', 'emotional numbness', 'easily startled',
                'always on guard', 'self-destructive behavior', 'trouble concentrating',
                'trouble sleeping', 'trauma', 'traumatic', 'ptsd'
            ],
            'description': "Post-traumatic stress disorder (PTSD) is a mental health condition triggered by experiencing or witnessing a terrifying event. Symptoms may include flashbacks, nightmares, severe anxiety, and uncontrollable thoughts about the event.",
            'statistics': {
                'Global': 3.9,
                'United States': 6.8,
                'United Kingdom': 4.4,
                'Canada': 9.2,
                'Australia': 4.4,
                'India': 2.1
            },
            'strategies': [
                "Cognitive Processing Therapy (CPT) or Prolonged Exposure Therapy (PE)",
                "Eye Movement Desensitization and Reprocessing (EMDR) therapy",
                "Medication for symptom management when prescribed by a healthcare provider",
                "Stress management and relaxation techniques",
                "Support groups with others experiencing similar challenges"
            ]
        },
        'ocd': {
            'symptoms': [
                'intrusive thoughts', 'repetitive behaviors', 'excessive orderliness',
                'fear of contamination', 'unwanted thoughts', 'mental rituals', 'checking',
                'counting', 'arranging', 'hoarding', 'perfectionism', 'need for symmetry',
                'obsessions', 'compulsions', 'obsessive', 'compulsive', 'ocd'
            ],
            'description': "Obsessive-Compulsive Disorder (OCD) involves unwanted thoughts (obsessions) that trigger anxiety, followed by repetitive actions (compulsions) performed to reduce this anxiety. These obsessions and compulsions can significantly interfere with daily activities.",
            'statistics': {
                'Global': 1.3,
                'United States': 2.3,
                'United Kingdom': 1.6,
                'Canada': 1.8,
                'Australia': 2.0,
                'India': 0.6
            },
            'strategies': [
                "Exposure and Response Prevention (ERP) therapy",
                "Cognitive Behavioral Therapy (CBT) focused on OCD symptoms",
                "Medication such as SSRIs when prescribed by a healthcare provider",
                "Mindfulness practices to help manage intrusive thoughts",
                "Support groups and family therapy to help with coping strategies"
            ]
        }
    }
    
    # Define general mental health resources
    resources = {
        'Global': [
            "WHO Mental Health Website: www.who.int/mental_health",
            "International Association for Suicide Prevention: www.iasp.info"
        ],
        'United States': [
            "National Suicide Prevention Lifeline: 1-800-273-8255",
            "Crisis Text Line: Text HOME to 741741",
            "SAMHSA's National Helpline: 1-800-662-HELP (4357)",
            "National Alliance on Mental Illness (NAMI): www.nami.org"
        ],
        'United Kingdom': [
            "Samaritans: 116 123",
            "Mind: www.mind.org.uk",
            "NHS Mental Health Services: www.nhs.uk/mental-health"
        ],
        'Canada': [
            "Crisis Services Canada: 1-833-456-4566",
            "Canadian Mental Health Association: www.cmha.ca"
        ],
        'Australia': [
            "Lifeline Australia: 13 11 14",
            "Beyond Blue: 1300 22 4636",
            "Headspace: www.headspace.org.au"
        ],
        'India': [
            "AASRA Suicide Prevention Helpline: 91-9820466726",
            "National Institute of Mental Health and Neurosciences (NIMHANS): www.nimhans.ac.in",
            "The Live Love Laugh Foundation: www.thelivelovelaughfoundation.org",
            "Manas Foundation: www.manasfoundation.in",
            "SCARF India (Schizophrenia Research Foundation): www.scarfindia.org",
            "iCall Psychosocial Helpline: 022-25521111",
            "Vandrevala Foundation Mental Health Helpline: 1860-2662-345"
        ]
    }
    
    # Helper function to identify potential conditions from symptoms
    def identify_conditions(symptoms_text):
        symptoms_text = symptoms_text.lower()
        
        # Check for each condition
        matches = {}
        for condition, info in conditions.items():
            matched_symptoms = []
            
            for symptom in info['symptoms']:
                if symptom in symptoms_text:
                    matched_symptoms.append(symptom)
            
            if matched_symptoms:
                matches[condition] = {
                    'matched_symptoms': matched_symptoms,
                    'matched_count': len(matched_symptoms)
                }
        
        # Sort by number of symptoms matched
        return sorted(matches.items(), key=lambda x: x[1]['matched_count'], reverse=True)
    
    # Helper function to get condition information
    def get_condition_info(condition, country):
        info = conditions[condition]
        
        # Get country-specific statistics
        country_stat = ""
        if country in info['statistics']:
            country_value = info['statistics'][country]
            country_stat = f"\nIn {country}, approximately {country_value:.1f}% of the population experiences {condition}."
        
        # Get global statistics for comparison
        global_value = info['statistics']['Global']
        global_comparison = ""
        if country in info['statistics']:
            country_value = info['statistics'][country]
            global_comparison = f"\nThis is {'higher' if country_value > global_value else 'lower'} than the global average of {global_value:.1f}%."
        
        # Construct the response
        response = f"Information about {condition.title()}:\n\n{info['description']}{country_stat}{global_comparison}\n\nEvidence-based strategies for managing {condition}:\n\n"
        
        # Add strategies
        for i, strategy in enumerate(info['strategies'], 1):
            response += f"{i}. {strategy}\n"
        
        response += "\nIt's important to work with healthcare professionals for personalized treatment."
        return response
    
    # Helper function to get resources for a country
    def get_resources(country):
        response = "Mental Health Resources:\n\n"
        
        # Add country-specific resources if available
        if country in resources:
            response += f"Resources in {country}:\n"
            for resource in resources[country]:
                response += f"- {resource}\n"
            response += "\n"
        
        # Always add global resources
        response += "Global Resources:\n"
        for resource in resources['Global']:
            response += f"- {resource}\n"
        
        response += "\nRemember that in a serious emergency, you should call your local emergency services."
        return response
    
    # Start the conversation
    print("Initializing Mental Health Assistant...")
    print("All available datasets loaded successfully.")
    print("Mental Health Assistant initialized successfully.")
    print("== Mental Health Assistant ==")
    print("Type 'exit' to end the conversation.")
    
    # Initial greeting
    print("\nChatbot: Hi! I'm your Mental Health Assistant, trained on global mental health data. I'd like to understand how you're feeling. On a scale of 1-10, how would you rate your mental wellbeing today? (1 being very poor, 10 being excellent) [Please enter a number between 1-10]")
    
    # Conversation state tracking
    state = {
        'step': 'wellbeing_rating',
        'wellbeing_score': None,
        'duration': None,
        'symptoms': "",
        'country': None,
        'condition': None
    }
    
    while True:
        user_input = input("\nYou: ").strip()
        
        if user_input.lower() in ['exit', 'quit', 'bye', 'goodbye']:
            print("\nChatbot: Thank you for using the Mental Health Assistant. Remember that this tool provides information based on global mental health data, but is not a substitute for professional care. If you're experiencing mental health difficulties, please consider speaking with a healthcare professional.")
            break
        
        # Process input based on conversation state
        if state['step'] == 'wellbeing_rating':
            try:
                rating = int(user_input)
                state['wellbeing_score'] = rating
                state['step'] = 'duration'
                print("\nChatbot: I'm sorry to hear you're not feeling well." if rating <= 5 else "\nChatbot: I'm glad you're feeling relatively well.")
                print("How long have you been experiencing these feelings? (days, weeks, months?)")
            except ValueError:
                print("\nChatbot: I didn't understand that rating. Please provide a number between 1 and 10.")
        
        elif state['step'] == 'duration':
            state['duration'] = user_input
            state['step'] = 'symptoms'
            print("\nChatbot: Thank you for sharing. Could you describe the main symptoms or feelings you've been experiencing? [For example: anxiety, low mood, trouble sleeping, irritability, worry, panic attacks, etc.]")
        
        elif state['step'] == 'symptoms':
            state['symptoms'] = user_input.lower()
            
            # Try to identify potential conditions from symptoms
            potential_conditions = identify_conditions(user_input)
            if potential_conditions:
                state['condition'] = potential_conditions[0][0]  # Store the most likely condition
            
            state['step'] = 'country'
            print("\nChatbot: Thank you for sharing those details. Which country do you live in? This will help me provide statistics and coping strategies relevant to your region. [Example countries: India, United States, United Kingdom, Canada, Australia]")
        
        elif state['step'] == 'country':
            state['country'] = user_input
            state['step'] = 'offer_info'
            
            # Check for country in our database, use Global as fallback
            country = state['country']
            if country not in resources and country not in list(conditions.values())[0]['statistics']:
                print(f"\nChatbot: I don't have specific data for {country}, so I'll provide global information.")
                country = 'Global'
                state['country'] = 'Global'
            
            # Offer information about identified condition
            if state['condition']:
                print(f"\nChatbot: Based on what you've shared, some of your experiences might be associated with {state['condition']}. Would you like to learn more about {state['condition']} and coping strategies? [Please respond with: yes or no]")
            else:
                # If no condition identified, offer general support
                print("\nChatbot: Based on what you've shared, would you like to learn about general mental wellbeing strategies? [Please respond with: yes or no]")
        
        elif state['step'] == 'offer_info':
            if user_input.lower() in ['yes', 'y', 'sure', 'ok', 'okay']:
                state['step'] = 'provide_resources'
                
                # Provide information about the condition
                if state['condition']:
                    response = get_condition_info(state['condition'], state['country'])
                    print(f"\nChatbot: {response}")
                else:
                    # Provide general mental wellbeing advice
                    print("\nChatbot: Here are some evidence-based strategies for maintaining mental wellbeing:\n\n1. Regular physical activity\n2. Maintaining a healthy sleep schedule\n3. Connecting with supportive friends and family\n4. Practicing mindfulness and stress-reduction techniques\n5. Setting realistic goals and expectations\n\nIf you're experiencing persistent difficulties, consider speaking with a healthcare professional.")
            else:
                state['step'] = 'additional_help'
                print("\nChatbot: I understand. If you have other questions or concerns about mental health, feel free to ask. Would you like information about professional help resources? [Please respond with: yes or no]")
        
        elif state['step'] == 'provide_resources':
            state['step'] = 'additional_help'
            print("\nChatbot: If you have any other questions about mental health resources or would like to discuss something specific, feel free to ask. Would you like information about professional help resources in your region? [Please respond with: yes or no]")
        
        elif state['step'] == 'additional_help':
            if user_input.lower() in ['yes', 'y', 'sure', 'ok', 'okay']:
                resources_info = get_resources(state['country'])
                print(f"\nChatbot: {resources_info}")
                state['step'] = 'open_ended'
            else:
                state['step'] = 'open_ended'
                print("\nChatbot: Is there anything specific about mental health you'd like to know more about? I can provide information about various conditions, coping strategies, or treatment approaches. [Example queries: Tell me about anxiety, What are PTSD symptoms, How is OCD treated]")
        
        else:  # open_ended
            # Process open-ended queries
            # First, check if user is asking about a condition
            potential_conditions = identify_conditions(user_input)
            
            if potential_conditions:
                # User is asking about a condition
                condition = potential_conditions[0][0]
                response = get_condition_info(condition, state['country'])
                print(f"\nChatbot: {response}")
            elif "resource" in user_input.lower() or "help" in user_input.lower():
                # User is asking about resources
                resources_info = get_resources(state['country'])
                print(f"\nChatbot: {resources_info}")
            elif "thank" in user_input.lower():
                # User is thanking
                print("\nChatbot: You're welcome! I'm here to help. Is there anything else you'd like to know about mental health?")
            else:
                # General response for other queries
                print("\nChatbot: I'm here to provide mental health information and resources. You can ask about specific conditions like anxiety, depression, bipolar disorder, PTSD, or OCD. I can also provide information about coping strategies or resources in your region. How can I help you today?")

# Run the assistant
if __name__ == "__main__":
    run_mental_health_assistant()

Initializing Mental Health Assistant...
All available datasets loaded successfully.
Mental Health Assistant initialized successfully.
== Mental Health Assistant ==
Type 'exit' to end the conversation.

Chatbot: Hi! I'm your Mental Health Assistant, trained on global mental health data. I'd like to understand how you're feeling. On a scale of 1-10, how would you rate your mental wellbeing today? (1 being very poor, 10 being excellent) [Please enter a number between 1-10]



You:  2



Chatbot: I'm sorry to hear you're not feeling well.
How long have you been experiencing these feelings? (days, weeks, months?)



You:  3 weeks



Chatbot: Thank you for sharing. Could you describe the main symptoms or feelings you've been experiencing? [For example: anxiety, low mood, trouble sleeping, irritability, worry, panic attacks, etc.]



You:  low mood



Chatbot: Thank you for sharing those details. Which country do you live in? This will help me provide statistics and coping strategies relevant to your region. [Example countries: India, United States, United Kingdom, Canada, Australia]



You:  India



Chatbot: Based on what you've shared, some of your experiences might be associated with depression. Would you like to learn more about depression and coping strategies? [Please respond with: yes or no]



You:  yes



Chatbot: Information about Depression:

Depression (major depressive disorder) causes persistent feelings of sadness and loss of interest. It affects how you feel, think, and behave and can lead to various emotional and physical problems.
In India, approximately 3.3% of the population experiences depression.
This is lower than the global average of 3.4%.

Evidence-based strategies for managing depression:

1. Psychotherapy (especially CBT and Interpersonal Therapy)
2. Medication (antidepressants) when prescribed by a healthcare provider
3. Regular physical activity, which has been shown to reduce symptoms
4. Maintaining social connections and talking about your feelings
5. Establishing routines and setting achievable goals

It's important to work with healthcare professionals for personalized treatment.



You:  thank you



Chatbot: If you have any other questions about mental health resources or would like to discuss something specific, feel free to ask. Would you like information about professional help resources in your region? [Please respond with: yes or no]



You:  yes



Chatbot: Mental Health Resources:

Resources in India:
- AASRA Suicide Prevention Helpline: 91-9820466726
- National Institute of Mental Health and Neurosciences (NIMHANS): www.nimhans.ac.in
- The Live Love Laugh Foundation: www.thelivelovelaughfoundation.org
- Manas Foundation: www.manasfoundation.in
- SCARF India (Schizophrenia Research Foundation): www.scarfindia.org
- iCall Psychosocial Helpline: 022-25521111
- Vandrevala Foundation Mental Health Helpline: 1860-2662-345

Global Resources:
- WHO Mental Health Website: www.who.int/mental_health
- International Association for Suicide Prevention: www.iasp.info

Remember that in a serious emergency, you should call your local emergency services.



You:  exit



Chatbot: Thank you for using the Mental Health Assistant. Remember that this tool provides information based on global mental health data, but is not a substitute for professional care. If you're experiencing mental health difficulties, please consider speaking with a healthcare professional.
