# Students' mental health EDA

The goal of this notebook is to provide comprehensive EDA and probably gain valuable insights into students' menthal health problem.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# For example, here's several helpful packages to load


import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
import tkinter as tk
from tkinter import filedialog, messagebox

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory



In [None]:
df = pd.read_csv('/Users/riteshkumar/Downloads/ML projects/Students Mental Health/mentalhealth_dataset.csv')

In [None]:
df.head()
df.shape

In [None]:
df.describe()

In [None]:
df.info()

In [None]:
df.isna().sum()

Wow. We have 1000 entries. No missing values. Data looks promissing, it would be interesting to know what features correlate with anxiety and depresion and to check, wheather anxiety and depression have negative effects on CGPA.

## `Timestamp`

In [None]:
g = sns.histplot(df, x='Timestamp', discrete=True)
g.set(title='Timestamp histogram')
xticks = g.get_xticks()
xticks_labels = g.get_xticklabels()
g.tick_params(axis='x', labelrotation=45);


Hm. We observe strange peaks at certain dates. Probably these are days when dataset's owner published it somewhere.

## `Gender`

In [None]:
sns.countplot(df, x='Gender').set(title='Gender ratio');

Wow. Many women took part in the survey, significantly more than men. That's interesting. We will check later the same ratios for different courses.

## `Age`

In [None]:
sns.countplot(df, x='Age').set(title='Students of different ages in data');

Nice. Dataset is slightly unballanced, but we have enough samples for ages from 18 to 25 years old. Let's check wheather gender ratio varies significantly for any age group.

In [None]:
sns.countplot(df, x='Age', hue='Gender').set(title='Students of different ages and genders in data');

We observe no suspicious gender disbalance for any age group.

## `Course`

In [None]:
df.Course.value_counts()

Okay, we have lot's of different courses here. Let's visualise only the most popular ones.

In [None]:
plt.figure(figsize=(10,10))
sns.countplot(df[df.Course.apply(lambda x: df.Course.value_counts()[x] > 16)], x='Course').set(title='Most popular courses in data');

Data seems reasanoble enough. Let's check gender ratios for different courses.

In [None]:
plt.figure(figsize=(10,10))

sns.countplot(df[df.Course.apply(lambda x: df.Course.value_counts()[x] > 16)], x='Course', hue='Gender').set(title='Most popular courses in data');

Aha. We observe that we do not have data on male participants of psycology, KOE (I'm not sure what that is), laws and engine (?), but we have more male participants on engineering, BIT (Bachelor of Information Technology) and BCS (Bachelor of Computer Science).

## `YearOfStudy`

In [None]:
df.YearOfStudy.value_counts()

We need a little cleanup here. Let's make all data in this column lowercase.

In [None]:
df.YearOfStudy = df.YearOfStudy.str.lower()

In [None]:
df.YearOfStudy.value_counts()

In [None]:
sns.countplot(df.sort_values(by='YearOfStudy'), x='YearOfStudy').set(title='Raitio of different years of study in data', xlabel='Year of study');

Nice. Probably first year students were more motivated to participate in a survery?

## `CGPA`

CGPA is Cumulative Grade Point Average, a significant indicator of academic performance (high CGPA = good performance).

In [None]:
sns.histplot(df, x='CGPA').set(title='CGPA histogram');

We observe two peaks at 4.0 (excellent performance) and 2.0 (bad performance). Otherwise data looks okay.

## `Depression`

In [None]:
sns.countplot(df, x='Depression').set(title='Depression ratio amoung survey participants');

That's unexpected. We observe really high depression rate. Probably students with depression were more motivated to participate in the survey? Let's check wheather depression rate depends on gender or year of study.

In [None]:
sns.countplot(df, x='Depression', hue='Gender').set(title='Depression ratio amoung survey participants with different genders');

No significant difference in depression ratios amoung genders.

In [None]:
sns.countplot(df.sort_values(by='YearOfStudy'), hue='Depression', x='YearOfStudy').set(title='Depression ratio amoung survey participants with different years of study');

Hm. Depression levels are high amoung participants of all years of study, but they are higher amoung year 3 and 4. 

## `Anxiety`

In [None]:
sns.countplot(df, x='Anxiety').set(title='Anxiety ratio amoung survey participants');

In [None]:
sns.countplot(df.sort_values(by='YearOfStudy'), hue='Anxiety', x='YearOfStudy').set(title='Anxiety ratio amoung survey participants with different years of study');

Anxiety levels are also high amoung all students.

## `PanicAttack`

In [None]:
sns.countplot(df, x='PanicAttack').set(title='PanicAttack ratio amoung survey participants');

Let's check that depression, anxiety and panic attacks are highly correlated.

In [None]:
df[['Depression', 'Anxiety', 'PanicAttack']].corr()

Wait, that's strange. No correlation at all?

In [None]:
sns.countplot(df, x='Depression', hue='Anxiety');

In [None]:
sns.countplot(df, x='Depression', hue='PanicAttack');

According to our data participants with depression have the same anxiety and panick attack problems as participants without depression. Thats' strange, currently we do not have an explanation to this.

## `SpecialistTreatment`

In [None]:
sns.countplot(df, x='SpecialistTreatment').set(title='Specialist treatment ratio amoung survey participants');

In [None]:
df[(df['Depression'] == 1) & (df['SpecialistTreatment'] == 1)].shape[0], df[df['Depression'] == 1].shape[0]

That's sad. Only 30 (amoung 483) survey participants with depression recieve specialist treatment. Let's also check `HasMentalHealthSupport`

## `HasMentalHealthSupport`

In [None]:
sns.countplot(df, x='HasMentalHealthSupport').set(title='Mental health support ratio amoung survey participants');

In [None]:
df[(df['Depression'] == 1) & (df['HasMentalHealthSupport'] == 1)].shape[0], df[df['Depression'] == 1].shape[0]

Once again the amount of students with mental health support is very low.

## `SleepQuality` and `StudyStressLevel`

Let's check wheather sleep quality of study stress level correlate with depression, anxiety or panick attacks.

In [None]:
df[['Depression', 'Anxiety', 'PanicAttack', 'SleepQuality', 'StudyStressLevel']].corr()

We observe no significant correlation between sleep quality, study stress level and depression, anxiety and panick attacks. Thats counter intuitive. We currently have no explanation to that.

In [None]:
def load_data():
    file_path = filedialog.askopenfilename(filetypes=[("CSV files", "*.csv")])
    if not file_path:
        return
    global df
    df = pd.read_csv(file_path)
    messagebox.showinfo("Success", "Dataset Loaded Successfully")


In [None]:
# Feature Engineering
X = df[['Age', 'CGPA', 'StudyStressLevel', 'SleepQuality']]
y = df['Depression']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
# Initialize models
global models
models = {
    "RandomForest": RandomForestClassifier(n_estimators=100, random_state=42),
    "SVC": SVC(),
    "KNN": KNeighborsClassifier(n_neighbors=5),
}


In [None]:
results = {}
for name, model in models.items():
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)
        results[name] = accuracy_score(y_test, y_pred)
    
result_text = "".join([f"{model}: Accuracy = {accuracy:.4f}\n" for model, accuracy in results.items()])


In [None]:
def predict_depression():
    try:
        age = int(age_entry.get())
        cgpa = float(cgpa_entry.get())
        study_stress = int(study_stress_entry.get())
        sleep_quality = int(sleep_quality_entry.get())
        model_name = model_var.get()
        
        input_data = np.array([[age, cgpa, study_stress, sleep_quality]])
        input_scaled = scaler.transform(input_data)
        
        model = models.get(model_name)
        if not model:
            messagebox.showerror("Error", "Selected model not found!")
            return
        
        prediction = model.predict(input_scaled)
        result = "The student might be suffering from depression." if prediction[0] == 1 else "The student is not suffering from depression."
        messagebox.showinfo("Prediction Result", result)
    except ValueError:
        messagebox.showerror("Input Error", "Please enter valid numerical values.")
    except Exception as e:
        messagebox.showerror("Error", f"Unexpected error: {e}")

In [None]:
# GUI Setup
def create_gui():
    global age_entry, cgpa_entry, study_stress_entry, sleep_quality_entry, model_var
    root = tk.Tk()
    root.title("Students' Mental Health Analysis")
    root.geometry("400x400")
    
    tk.Label(root, text="Age:").pack()
    age_entry = tk.Entry(root)
    age_entry.pack()
    
    tk.Label(root, text="CGPA:").pack()
    cgpa_entry = tk.Entry(root)
    cgpa_entry.pack()
    
    tk.Label(root, text="Study Stress Level (1-10):").pack()
    study_stress_entry = tk.Entry(root)
    study_stress_entry.pack()
    
    tk.Label(root, text="Sleep Quality (1-10):").pack()
    sleep_quality_entry = tk.Entry(root)
    sleep_quality_entry.pack()
    
    tk.Label(root, text="Select Model:").pack()
    model_var = tk.StringVar(root)
    model_var.set("RandomForest")
    tk.OptionMenu(root, model_var, *models.keys()).pack()
    
    tk.Button(root, text="Predict Depression", command=predict_depression).pack(pady=5)
    tk.Button(root, text="Exit", command=root.quit).pack(pady=5)
    
    root.mainloop()

In [None]:

if __name__ == "__main__":
    create_gui()
