# Depression Prediction Model

---

This is a ML model created by Keshav Ghai (An aspiring AI/ML dev).
It is a binary classifier which predicts whether a student is depressed or not based on behavioral and academic indicators. Unlike text-based models, this model works with **structured/tabular data** combining numeric features (Age, CGPA, Sleep, etc.) and categorical features (Gender, Department). The training script **"trainer.py"** preprocesses the data, handles feature encoding and scaling, trains a dense neural network, and generates visualizations to assess model performance. The model uses binary classification with sigmoid activation and binary crossentropy loss.

## Imports:- 
---

In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
import json
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

## 1. Loading the Dataset (CSV in current directory)
---

> The dataset is loaded from **"dataset.csv"** using pandas. Contains student behavioral and academic data.

In [None]:
DATA_PATH = "./tensorflow/depression_predictor/dataset.csv"
df = pd.read_csv(DATA_PATH)

print("Dataset shape:", df.shape)
print("Columns:", df.columns.tolist())

## 2. Encode Label & Separate Features
---

> The target label (Depression: True/False) is encoded to binary values. Student_ID and Depression columns are removed from features.

In [None]:
# Label column (True/False)
label_encoder = LabelEncoder()
df["Depression"] = label_encoder.fit_transform(df["Depression"])

y = df["Depression"].values  # 0 or 1

# Drop Student_ID and label column from features
df_features = df.drop(columns=["Student_ID", "Depression"])

## 3. Identify Feature Types (Numeric & Categorical)
---

> Features are classified into two types: categorical (Gender, Department) and numeric (Age, CGPA, Sleep Duration, etc.).

In [None]:
categorical_cols = ["Gender", "Department"]
numeric_cols = [
    "Age",
    "CGPA",
    "Sleep_Duration",
    "Study_Hours",
    "Social_Media_Hours",
    "Physical_Activity",
    "Stress_Level"
]

## 4. Encode Categorical Columns
---

> Categorical variables (Gender, Department) are converted to numeric labels using LabelEncoder and saved for later use.

In [None]:
cat_encoders = {}
for col in categorical_cols:
    encoder = LabelEncoder()
    df_features[col] = encoder.fit_transform(df_features[col])
    cat_encoders[col] = encoder

# Save categorical encoders
with open("./tensorflow/depression_predictor/categorical_encoders.pkl", "wb") as f:
    pickle.dump(cat_encoders, f)

print("Categorical encoders saved")

## 5. Scale Numeric Columns
---

> Numeric features are scaled to a standard normal distribution using StandardScaler for better model performance.

In [None]:
scaler = StandardScaler()
df_features[numeric_cols] = scaler.fit_transform(df_features[numeric_cols])

# Save scaler
with open("./tensorflow/depression_predictor/scaler.pkl", "wb") as f:
    pickle.dump(scaler, f)

print("Scaler saved as scaler.pkl")

## 6. Build Final Feature Matrix
---

> All processed features are combined into a single feature matrix for model input.

In [None]:
X = df_features.values

print("Feature matrix shape:", X.shape)

## 7. Train / Validation Split
---

> Data is split into 85% training and 15% validation using stratified splitting.

In [None]:
X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.15, random_state=42
)

print("Train size:", X_train.shape)
print("Validation size:", X_val.shape)

## 8. Defining the Model's Architecture
---

> A dense neural network with 3 hidden layers is created. Uses sigmoid activation for binary classification (Depressed / Not Depressed).

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(X_train.shape[1],), dtype="float32"),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(1, activation="sigmoid", dtype="float32")
])

model.compile(
    optimizer="adam",
    loss="binary_crossentropy",
    metrics=["accuracy"]
)

model.summary()

## 9. Training the Model (With validation)
---

> The model is trained over 12 epochs with a batch size of 32. Binary crossentropy is used as loss function.

In [None]:
history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=12,
    batch_size=32
)

model.save("./Models/depression_model.keras")
print("Model saved as depression_model.keras")

## 10. Graphs
---

> Multiple graphs are created to visualize model performance, loss, accuracy, and predictions. (Good for learning about ML)

### a. Loss Over Epochs:-

In [None]:
plt.figure(figsize=(6,4))
plt.plot(history.history["loss"], label="Train Loss")
plt.plot(history.history["val_loss"], label="Val Loss")
plt.title("Loss Over Epochs")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()
plt.grid(True)
plt.savefig("./tensorflow/depression_predictor/loss_graph.png")
plt.close()

### b. Train vs. Validation Accuracy:-

In [None]:
plt.figure(figsize=(6,4))
plt.plot(history.history["accuracy"], label="Train Accuracy")
plt.plot(history.history["val_accuracy"], label="Val Accuracy")
plt.title("Accuracy Over Epochs")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.legend()
plt.grid(True)
plt.savefig("./tensorflow/depression_predictor/accuracy_graph.png")
plt.close()

### c. Confusion Matrix:-

In [None]:
val_pred = model.predict(X_val)
val_pred = (val_pred > 0.5).astype(int).flatten()

cm = confusion_matrix(y_val, val_pred)

plt.figure(figsize=(6,5))
sns.heatmap(cm, annot=True, fmt="d",
            xticklabels=["Not Depressed", "Depressed"],
            yticklabels=["Not Depressed", "Depressed"])
plt.title("Confusion Matrix")
plt.savefig("./tensorflow/depression_predictor/confusion_matrix.png")
plt.close()

print("Saved: loss_graph.png, accuracy_graph.png, confusion_matrix.png")

## 11. Interactive Prediction Mode
---

> This interactive mode allows testing the model with custom student data. Input is encoded and scaled before prediction. (Remember to validate predictions thoroughly)

In [None]:
def encode_input(age, gender, dept, cgpa, sleep, study, social, physical, stress):
    # Build a single-row DataFrame
    row = pd.DataFrame([[
        age, gender, dept, cgpa, sleep, study,
        social, physical, stress
    ]], columns=df_features.columns)

    # Apply saved encoders
    for col in categorical_cols:
        row[col] = cat_encoders[col].transform(row[col])

    # Scale numeric columns
    row[numeric_cols] = scaler.transform(row[numeric_cols])

    return row.values

In [None]:
print("\nInteractive testing mode:")
while True:
    text = input("Enter 'predict' or 'quit': ").strip().lower()
    if text == "quit":
        break

    print("Enter student details:")
    age = float(input("Age: "))
    gender = input("Gender: ")
    dept = input("Department: ")
    cgpa = float(input("CGPA: "))
    sleep = float(input("Sleep Duration: "))
    study = float(input("Study Hours: "))
    social = float(input("Social Media Hours: "))
    physical = float(input("Physical Activity: "))
    stress = float(input("Stress Level: "))

    X_input = encode_input(age, gender, dept, cgpa, sleep, study, social, physical, stress)
    pred = model.predict(X_input)[0][0]

    result = "Depressed" if pred > 0.5 else "Not Depressed"
    print("Prediction:", result)
    print(f"Confidence: {pred*100:.2f}%")