# University Students’ Mental Health

## 🧠 Background
Mental health has become an increasingly important topic among university students, especially in the context of rising academic pressure, social isolation, and the impact of external stressors such as the COVID-19 pandemic. University students often experience stress, anxiety, and depression due to factors such as heavy coursework, financial issues, and personal challenges.

In Malaysia, concerns about student mental health are growing. Various studies and surveys have shown that a significant proportion of university students suffer from symptoms of psychological distress, but only a few seek professional help.

## 🎯 Objectives
This data analysis project aims to explore the mental health status of university students using survey-based datasets. The main objectives are:

1. To understand the overall mental health condition of university students.
2. To identify key factors that influence students' mental well-being.
3. To visualize mental health trends by demographics (e.g., gender, age, academic year).


## 📂 Dataset
The dataset used in this project contains anonymized responses from university students related to their mental health, academic pressure, personal habits, and emotional well-being. The Kaggle dataset typically includes features such as:

1. Timestamp
2. Gender
3. Age
4. Course
5. YearOfStudy
6. CGPA
7. Depression
8. Anxiety
9. PanicAttack
10. specialistTreatment
11. SymptomFrequency_Last7Days
12. HasMentalHealthSupport
13. SleepQuality
14. StudyStressLevel
15. StudyHoursPerWeek
16. AcademicEngagement

## 💻 Tools and Libraries
This project is implemented using Python in a Jupyter Notebook, utilizing common data science libraries:

1. pandas for data loading and manipulation
2. matplotlib and seaborn for visualization
3. scikit-learn (if modeling is included)

### Import Necessary Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Load the dataset

In [2]:
df = pd.read_csv('C:/Users/Yong1/Documents/mentalhealth_dataset.csv')

### Display first five rows 

In [None]:
print("The first few rows of the dataset:")
df.head()

The first few rows of the dataset:


Unnamed: 0,Timestamp,Gender,Age,Course,YearOfStudy,CGPA,Depression,Anxiety,PanicAttack,SpecialistTreatment,SymptomFrequency_Last7Days,HasMentalHealthSupport,SleepQuality,StudyStressLevel,StudyHoursPerWeek,AcademicEngagement
0,13/7/2020,Female,24,Biotechnology,Year 3,2.38,1,0,0,0,5,0,4,5,8,2
1,13/7/2020,Female,18,Biotechnology,Year 3,4.0,0,1,0,0,0,0,4,4,13,5
2,13/7/2020,Female,25,Biotechnology,Year 3,3.68,0,0,1,0,3,0,1,2,13,1
3,13/7/2020,Female,18,Engineering,year 4,4.0,0,0,0,0,3,0,5,1,19,2
4,13/7/2020,Female,20,Engineering,year 4,2.0,1,1,0,0,0,0,2,4,3,2


### Display dataset info

In [11]:
print("The information about the dataset:")
df.info()

The information about the dataset:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 16 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Timestamp                   1000 non-null   object 
 1   Gender                      1000 non-null   object 
 2   Age                         1000 non-null   int64  
 3   Course                      1000 non-null   object 
 4   YearOfStudy                 1000 non-null   object 
 5   CGPA                        1000 non-null   float64
 6   Depression                  1000 non-null   int64  
 7   Anxiety                     1000 non-null   int64  
 8   PanicAttack                 1000 non-null   int64  
 9   SpecialistTreatment         1000 non-null   int64  
 10  SymptomFrequency_Last7Days  1000 non-null   int64  
 11  HasMentalHealthSupport      1000 non-null   int64  
 12  SleepQuality                1000 non-null   int64  
 13 

### Display dataset basic statistics

In [10]:
print("The statistical description of the dataset:")
df.describe()

The statistical description of the dataset:


Unnamed: 0,Age,CGPA,Depression,Anxiety,PanicAttack,SpecialistTreatment,SymptomFrequency_Last7Days,HasMentalHealthSupport,SleepQuality,StudyStressLevel,StudyHoursPerWeek,AcademicEngagement
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,21.402,3.12253,0.483,0.474,0.458,0.067,3.498,0.067,2.983,3.045,9.746,3.055
std,2.373611,0.810961,0.499961,0.499573,0.498482,0.250147,2.3081,0.250147,1.417999,1.417386,5.651497,1.422673
min,18.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0
25%,19.0,2.25,0.0,0.0,0.0,0.0,1.75,0.0,2.0,2.0,5.0,2.0
50%,21.0,3.25,0.0,0.0,0.0,0.0,3.0,0.0,3.0,3.0,9.0,3.0
75%,24.0,4.0,1.0,1.0,1.0,0.0,6.0,0.0,4.0,4.0,15.0,4.0
max,25.0,4.0,1.0,1.0,1.0,1.0,7.0,1.0,5.0,5.0,19.0,5.0


### Check missing value

In [9]:
print("Count of missing values in each column:")
df.isnull().sum()

Count of missing values in each column:


Timestamp                     0
Gender                        0
Age                           0
Course                        0
YearOfStudy                   0
CGPA                          0
Depression                    0
Anxiety                       0
PanicAttack                   0
SpecialistTreatment           0
SymptomFrequency_Last7Days    0
HasMentalHealthSupport        0
SleepQuality                  0
StudyStressLevel              0
StudyHoursPerWeek             0
AcademicEngagement            0
dtype: int64

#### There are no missing values in the dataset, which is ideal. Therefore, we can skip any data cleaning steps related to handling missing values.

### Display the shape of the dataset

In [8]:
print("Shape of the dataset:")
df.shape

Shape of the dataset:


(1000, 16)