<div style="text-align: center;">
  <img src="Images/Depressed_Student.png" alt="Depressed Student Illustration" width="600"/>
</div>

## Hello! 
This is a project to predict **Student Depression** using the [Student Depression Dataset](https://www.kaggle.com/datasets/hopesb/student-depression-dataset/data) from Kaggle. Understanding and predicting depression among students is an essential task, as mental health plays a critical role in their academic performance and overall well-being.  

Student depression datasets are typically used to analyze and predict depression levels among students. This project can contribute to identifying factors influencing student mental health and designing early intervention strategies.

The dataset provides comprehensive information about students and their mental health status, with 18 columns and 27,901 rows, structured in a CSV format. Below is a brief explanation of the columns:  

1. **id**: A unique identifier for each student.  
2. **Gender**: The gender of the student.  
3. **Age**: The age of the student.  
4. **City**: The city where the student resides.  
5. **Profession**: The student's occupation, such as student, part-time worker, etc.  
6. **Academic Pressure**: The level of academic stress experienced by the student.  
7. **Work Pressure**: The work-related stress experienced by the student.  
8. **CGPA**: The student’s cumulative grade point average.  
9. **Study Satisfaction**: The student’s level of satisfaction with their studies.  
10. **Job Satisfaction**: The student’s level of satisfaction with their job or part-time work.  
11. **Sleep Duration**: Average sleep duration in hours per day.  
12. **Dietary Habits**: The dietary pattern of the student (e.g., healthy or unhealthy).  
13. **Degree**: The current level of education the student is pursuing.  
14. **Have you ever had suicidal thoughts?**: A binary column (Yes/No) indicating if the student has had suicidal thoughts.  
15. **Work/Study Hours**: The number of hours spent working or studying per day.  
16. **Financial Stress**: The financial burden or stress experienced by the student.  
17. **Family History of Mental Illness**: A binary column (Yes/No) indicating if the student has a family history of mental health issues.  
18. **Depression**: The target variable, indicating whether the student is experiencing depression (Yes/No).  

> ⚠️ *Disclaimer*: This dataset, given its sensitive nature, must be used responsibly, ensuring ethical considerations like privacy, informed consent, and data anonymization. This project aims to leverage this dataset to build a model capable of predicting depression status in students and identifying significant contributing factors.

# **Step 3: Exploratory Data Analysis (EDA) – SQL Queries**

This notebook focuses on performing **exploratory data analysis using SQL queries** within a Jupyter environment. By using SQL-style queries with the help of `sqlite3`, we can analyze and extract insights from the student depression dataset in a more intuitive and readable way for those familiar with SQL.

---

### Objectives of This Notebook

1. [Import Libraries and Load the Dataset](#import)  
2. [Preview the Dataset](#preview)  
3. [Run SQL Queries to Answer Key Questions](#queries)  
4. [Summary and Key Insights](#summary)


---

### Key Questions Explored via SQL

1. What is the distribution of students based on their depression status?
2. How does the average CGPA differ between depressed and non-depressed students?
3. What is the relationship between sleep duration and depression?
4. Which region has the highest percentage of depressed students?
5. Do students with suicidal thoughts show a higher rate of depression?
6. What percentage of students with a family mental illness history are depressed?
7. Does dietary habit correlate with depression levels?
8. What is the average number of work/study hours for depressed vs. non-depressed students?
9. Among different education levels (degree), which group has the highest depression rate?
10. What is the combined effect of financial stress and depression, ordered by highest counts?

---

### Next Steps

- Step 4: [Excel Dashboard](./04_excel_dashboard.xlsx)  
- Step 5: [Modeling & Prediction](./05_modeling_prediction.ipynb)

<a id="import"></a>

## **3.1 Import Libraries, Load the Dataset, and Connect to the Database**

We begin by importing the required Python libraries:

- **Pandas** for data manipulation and analysis.
- **SQLite3** for setting up and querying a local SQL database.
- **IPython SQL extension** for running SQL queries directly within the notebook.

After loading the **Student Depression** dataset into a pandas DataFrame, we connect it to a local **SQLite** database. This setup enables us to perform SQL-based exploration conveniently within the Jupyter Notebook environment.

In [1]:
# Pandas is a software library written for the Python programming language for data manipulation and analysis.
import pandas as pd

# sqlite3 is a built-in Python library for creating and interacting with SQLite databases
import sqlite3

# prettytable is a module used to format tabular data in a readable way (optional)
import prettytable

In [2]:
# Load the ipython-sql extension to run SQL queries in Jupyter
%load_ext sql

# Sets default format for prettytable display
prettytable.DEFAULT = 'DEFAULT'

In [3]:
# Create SQLite connection and cursor
con = sqlite3.connect("student_depression.db")
cur = con.cursor()

# Connect to the SQLite database for ipython-sql
%sql sqlite:///student_depression.db

In [4]:
# Load your cleaned dataset
df = pd.read_csv("student_depression_cleaned.csv")

# Save the DataFrame to a new table in the SQLite database
df.to_sql("DepressionSurvey", con, if_exists='replace', index=False)

27814

In [5]:
# Drop the cleaned query table if it already exists (to avoid duplication)
%sql DROP TABLE IF EXISTS Depression_Clean;

 * sqlite:///student_depression.db
Done.


[]

In [6]:
# Create a new cleaned SQL table with all non-null rows (modify as needed)
%sql CREATE TABLE Depression_Clean AS SELECT * FROM DepressionSurvey;

 * sqlite:///student_depression.db
Done.


[]

---

<a id="preview"></a>

## **3.2 Preview the Dataset**

To understand the structure and content of the dataset, we start by viewing the first few rows using an SQL query:

In [7]:
%%sql
SELECT * FROM Depression_Clean LIMIT 10;

 * sqlite:///student_depression.db
Done.


id,gender,age,region,profession,academic_pressure,work_pressure,cgpa,study_satisfaction,job_satisfaction,sleep_duration,dietary_habits,degree,suicidal_thoughts,work/study_hours,financial_stress,family_mental_history,depression
2,Male,33.0,South,Student,5.0,0.0,8.97,2.0,0.0,5-6 hours,Healthy,Undergraduate,Yes,3.0,1.0,No,1
8,Female,24.0,South,Student,2.0,0.0,5.9,5.0,0.0,5-6 hours,Moderate,Undergraduate,No,3.0,2.0,Yes,0
26,Male,31.0,North,Student,3.0,0.0,7.03,5.0,0.0,Less than 5 hours,Healthy,Undergraduate,No,9.0,1.0,Yes,0
30,Female,28.0,North,Student,3.0,0.0,5.59,2.0,0.0,7-8 hours,Moderate,Undergraduate,Yes,4.0,5.0,Yes,1
32,Female,25.0,North,Student,4.0,0.0,8.13,3.0,0.0,5-6 hours,Moderate,Postgraduate,Yes,1.0,1.0,No,0
33,Male,29.0,West,Student,2.0,0.0,5.7,3.0,0.0,Less than 5 hours,Healthy,Doctoral,No,4.0,1.0,No,0
52,Male,30.0,West,Student,3.0,0.0,9.54,4.0,0.0,7-8 hours,Healthy,Undergraduate,No,1.0,2.0,No,0
56,Female,30.0,South,Student,2.0,0.0,8.04,4.0,0.0,Less than 5 hours,Unhealthy,Class 12,No,0.0,1.0,Yes,0
59,Male,28.0,West,Student,3.0,0.0,9.79,1.0,0.0,7-8 hours,Moderate,Undergraduate,Yes,12.0,3.0,No,1
62,Male,31.0,West,Student,2.0,0.0,8.38,3.0,0.0,Less than 5 hours,Moderate,Undergraduate,Yes,2.0,5.0,No,1


---

<a id="sql_queries"></a>

## **3.3 Run SQL Queries to Answer Key Questions**

Here we run a series of SQL queries to explore and gain insights from the dataset. Each query focuses on a specific question relevant to understanding factors related to depression.

### 3.3.1 Count total records in the dataset

In [8]:
%%sql
SELECT COUNT(*) AS total_records FROM Depression_Clean;

 * sqlite:///student_depression.db
Done.


total_records
27814


### 3.3.2 Count Records by Depression Status

In [9]:
%%sql
SELECT depression, COUNT(*) AS count
FROM Depression_Clean
GROUP BY depression;

 * sqlite:///student_depression.db
Done.


depression,count
0,11530
1,16284


### 3.3.3 Average Age by Depression Status

In [10]:
%%sql
SELECT depression, AVG(age) AS average_age
FROM Depression_Clean
GROUP BY depression;

 * sqlite:///student_depression.db
Done.


depression,average_age
0,27.14362532523851
1,24.88319823139278


### 3.3.4 Gender Distribution by Depression Status

In [11]:
%%sql
SELECT depression, AVG(age) AS avg_age
FROM Depression_Clean
GROUP BY depression;
SELECT depression, gender, COUNT(*) AS count
FROM Depression_Clean
GROUP BY depression, gender;

 * sqlite:///student_depression.db
Done.
Done.


depression,gender,count
0,Female,5116
0,Male,6414
1,Female,7200
1,Male,9084


### 3.3.5 Average CGPA by Depression Status

In [12]:
%%sql
SELECT depression, AVG(cgpa) AS average_cgpa
FROM Depression_Clean
GROUP BY depression;

 * sqlite:///student_depression.db
Done.


depression,average_cgpa
0,7.618728534258456
1,7.68247743183493


### 3.3.6 Count by Region and Depression

In [13]:
%%sql
SELECT region, depression, COUNT(*) AS count
FROM Depression_Clean
GROUP BY region, depression;

 * sqlite:///student_depression.db
Done.


region,depression,count
East,0,784
East,1,1279
North,0,4321
North,1,5519
South,0,1515
South,1,2441
West,0,4910
West,1,7045


### 3.3.7 Average Sleep Duration by Depression Status

In [14]:
%%sql
SELECT depression, AVG(sleep_duration) AS average_sleep_duration
FROM Depression_Clean
GROUP BY depression;

 * sqlite:///student_depression.db
Done.


depression,average_sleep_duration
0,2.956287944492628
1,2.9496438221567183


### 3.3.8 Count by Dietary Habits and Depression

In [15]:
%%sql
SELECT dietary_habits, depression, COUNT(*) AS count
FROM Depression_Clean
GROUP BY dietary_habits, depression;

 * sqlite:///student_depression.db
Done.


dietary_habits,depression,count
Healthy,0,4171
Healthy,1,3463
Moderate,0,4353
Moderate,1,5547
Unhealthy,0,3006
Unhealthy,1,7274


### 3.3.9 Count by Suicidal Thoughts and Depression

In [16]:
%%sql
SELECT suicidal_thoughts, depression, COUNT(*) AS count
FROM Depression_Clean
GROUP BY suicidal_thoughts, depression;

 * sqlite:///student_depression.db
Done.


suicidal_thoughts,depression,count
No,0,7846
No,1,2367
Yes,0,3684
Yes,1,13917


### 3.3.10 Count by Financial Stress and Depression (Ordered by Count)

In [17]:
%%sql
SELECT financial_stress, depression, count
FROM (
    SELECT financial_stress, depression, COUNT(*) AS count
    FROM Depression_Clean
    GROUP BY financial_stress, depression
) AS subquery
ORDER BY count DESC;

 * sqlite:///student_depression.db
Done.


financial_stress,depression,count
5.0,1,5438
4.0,1,3979
1.0,0,3475
3.0,1,3072
2.0,0,2879
2.0,1,2170
3.0,0,2142
4.0,0,1780
1.0,1,1625
5.0,0,1254


In [18]:
# Commit and close the connection
con.commit()
con.close()

<a id="summary"></a>

## **3.4 Summary and Key Insights**

After running the SQL queries, we summarize the key findings and insights to guide the next phases of the project.

#### **Summary**

In this section, we analyzed the cleaned **Student Depression Survey** dataset using SQL queries executed within a SQLite database. Our goal was to extract meaningful patterns and summarize key insights that help understand the factors associated with student depression.

#### **Key Insights**

1. **Total Records**
   The dataset contains **27,814 records**, representing individual student responses.

2. **Depression Distribution**

   * **58.6%** (16,284 students) reported experiencing depression.
   * **41.4%** (11,530 students) reported no depression.

     → Depression is a significant issue among the surveyed students.

3. **Age and Depression**

   * The **average age** of depressed students is **\~24.9 years**, while students without depression average **\~27.1 years**.

     → Younger students tend to report more depression symptoms.

4. **Gender and Depression**

   * Among those without depression: 44.4% female, 55.6% male.
   * Among depressed students: 44.2% female, 55.8% male.

     → Depression is slightly more prevalent among male respondents.

5. **CGPA and Depression**

   * Average CGPA for depressed students: **7.68**
   * Average CGPA for non-depressed students: **7.62**

     → Academic performance does not show a strong correlation with depression.

6. **Regional Distribution**

   * The **West** and **North** regions have the highest number of students, and also the highest reported cases of depression.
   * All regions show more cases of depression than non-depression.

7. **Sleep Duration**

   * Average sleep duration is around **3 hours** for both groups.

     → Indicates possible sleep deprivation overall, but no clear difference between groups.

8. **Dietary Habits**

   * Students with **unhealthy diets** are more likely to be depressed (over 7,000 cases).
   * Those with **healthy diets** are more likely to report no depression.

     → Diet plays an important role in mental health.

9. **Suicidal Thoughts**

   * A strong link: 13,917 students with suicidal thoughts reported depression.
   * Only 2,367 with no suicidal thoughts reported depression.

     → A red flag for mental health interventions.

10. **Financial Stress**

* High levels of financial stress (level 5) are mostly reported by students with depression.
* Lower stress levels are more common among those not depressed.

  → Financial hardship is a major contributing factor.

#### **Conclusion**

The SQL-based EDA has highlighted several behavioral, demographic, and lifestyle factors that correlate with student depression. These insights can inform targeted mental health support and policy decisions in educational institutions.