# Mental Health Survey Prediction Project

## Overview of The Notebook
The Mental Health Survey Prediction Project aims to leverage machine learning to predict mental health conditions such as depression, anxiety, and insomnia among university students based on various lifestyle and demographic factors 📊💡. This project utilizes a Random Forest model to analyze survey data and identify key indicators of mental well-being 🌱.

### 1. Load and Clean the Data

In [1]:
import pandas as pd

# Load the dataset
data_path = 'Results.csv'  # Replace with your CSV file path
df = pd.read_csv(data_path)

# Clean the column names by stripping any leading/trailing spaces
df.columns = df.columns.str.strip()

# Display the column names to check for any discrepancies
print(df.columns)


Index(['Name', 'Sex', 'Age', 'Hobby', 'University Department',
       'University Year', 'Sports', 'BMI', 'Weight Loss', 'Heart Rate',
       'Depressed Mode', 'Guilt', 'Insomnia', 'Appetite', 'Suicidal Thought',
       'Anxiety'],
      dtype='object')


###  2. Preprocess the Data

In [2]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

# Drop the 'Name' column as it is not useful for the model
df = df.drop('Name', axis=1)

# Encode categorical variables
label_encoders = {}
for column in df.select_dtypes(include=['object']).columns:
    le = LabelEncoder()
    df[column] = le.fit_transform(df[column].astype(str))
    label_encoders[column] = le

# Split the data into features (X) and target (y)
X = df.drop('Depressed Mode', axis=1)
y = df['Depressed Mode']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


### 3. Train the Random Forest Model

In [3]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

# Train the Random Forest model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = rf_model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f'Accuracy: {accuracy}')
print(f'Classification Report:\n{report}')

Accuracy: 1.0
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         5

    accuracy                           1.00         5
   macro avg       1.00      1.00      1.00         5
weighted avg       1.00      1.00      1.00         5



## Step-by-Step Explanation:

### 1-Load and Clean the Data:

 Load the CSV file and clean column names by stripping any leading/trailing spaces.

### 2-Preprocess the Data:

. Drop the Name column as it is not useful for model training.

. Encode categorical variables using LabelEncoder.

### 3-Split the Data:

 Split the data into features (X) and target (y), then further split into training and testing sets.

### 4-Train the Model:

 Train a Random Forest model using the training data.

### 5-Evaluate the Model:

 Make predictions on the test set and evaluate the model's performance.