<a href="https://colab.research.google.com/github/ara1x/DataMining-Project/blob/main/Phase1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Project Goal**

The main objective of this project is to analyze students’ academic and demographic factors to predict and understand their levels of anxiety. Using data mining techniques, the project will:
 * Classification: Categorize students into anxiety levels such as (Minimal, Moderate, Severe).
 * Clustering: Group students with similar anxiety patterns and academic characteristics to uncover common risk factors.

Beyond analysis, this project aims to shed light on the psychological challenges students face, helping to create a more supportive educational environment that reduces anxiety and promotes student well being.

**Dataset Source**

This project utilizes anxiety data from university students, which were part of a broader study examining anxiety, stress, and depression. In this project, the focus is specifically on anxiety.

URL: https://figshare.com/articles/dataset/MHP_Anxiety_Stress_Depression_Dataset_of_University_Students/25771164?file=46172340


In [7]:
import pandas as pd

data = pd.read_csv("Anxiety.csv")

print("Shape of dataset (rows, columns):", data.shape)

data.head()

Shape of dataset (rows, columns): (2028, 16)


Unnamed: 0,1. Age,2. Gender,3. University,4. Department,5. Academic Year,6. Current CGPA,7. Did you receive a waiver or scholarship at your university?,"1. In a semester, how often you felt nervous, anxious or on edge due to academic pressure?","2. In a semester, how often have you been unable to stop worrying about your academic affairs?","3. In a semester, how often have you had trouble relaxing due to academic pressure?","4. In a semester, how often have you been easily annoyed or irritated because of academic pressure?","5. In a semester, how often have you worried too much about academic affairs?","6. In a semester, how often have you been so restless due to academic pressure that it is hard to sit still?","7. In a semester, how often have you felt afraid, as if something awful might happen?",Anxiety Value,Anxiety Label
0,18-22,Female,"Independent University, Bangladesh (IUB)",Engineering - CS / CSE / CSC / Similar to CS,Second Year or Equivalent,2.50 - 2.99,No,2,2,3,2,2,2,2,15,Severe Anxiety
1,18-22,Male,"Independent University, Bangladesh (IUB)",Engineering - CS / CSE / CSC / Similar to CS,Third Year or Equivalent,3.00 - 3.39,No,1,2,2,1,1,3,2,12,Moderate Anxiety
2,18-22,Male,American International University Bangladesh (...,Engineering - CS / CSE / CSC / Similar to CS,Third Year or Equivalent,3.00 - 3.39,No,0,0,0,0,0,0,0,0,Minimal Anxiety
3,18-22,Male,American International University Bangladesh (...,Engineering - CS / CSE / CSC / Similar to CS,Third Year or Equivalent,3.00 - 3.39,No,2,1,1,1,2,1,2,10,Moderate Anxiety
4,18-22,Male,North South University (NSU),Engineering - CS / CSE / CSC / Similar to CS,Second Year or Equivalent,2.50 - 2.99,No,3,0,3,3,1,1,3,14,Moderate Anxiety


**Data Types & Summary**

Before diving deeper into the analysis, it is important to first understand the structure of the dataset, including the types of data stored in each column and the overall distribution of the numerical variables. This step ensures that we know how to preprocess and handle the data correctly in later stages.

In [8]:
data.dtypes

Unnamed: 0,0
1. Age,object
2. Gender,object
3. University,object
4. Department,object
5. Academic Year,object
6. Current CGPA,object
7. Did you receive a waiver or scholarship at your university?,object
"1. In a semester, how often you felt nervous, anxious or on edge due to academic pressure?",int64
"2. In a semester, how often have you been unable to stop worrying about your academic affairs?",int64
"3. In a semester, how often have you had trouble relaxing due to academic pressure?",int64


**Observations on Data Types :**

- Most demographic columns (Age, Gender, University, Department, Academic Year,
Scholarship, Anxiety Label) are categorical (object).

- The survey questions and Anxiety Value are numeric (int64).

- This highlights the need for encoding categorical variables (e.g., Label Encoding or One-Hot Encoding) so that machine learning algorithms can understand them.

- The numeric features may later require normalization or standardization to ensure fair comparison between attributes.





In [9]:
data.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
"1. In a semester, how often you felt nervous, anxious or on edge due to academic pressure?",2028.0,1.778107,0.952277,0.0,1.0,2.0,3.0,3.0
"2. In a semester, how often have you been unable to stop worrying about your academic affairs?",2028.0,1.634122,1.02762,0.0,1.0,2.0,3.0,3.0
"3. In a semester, how often have you had trouble relaxing due to academic pressure?",2028.0,1.754438,0.996221,0.0,1.0,2.0,3.0,3.0
"4. In a semester, how often have you been easily annoyed or irritated because of academic pressure?",2028.0,1.782544,0.965386,0.0,1.0,2.0,3.0,3.0
"5. In a semester, how often have you worried too much about academic affairs?",2028.0,1.865878,0.982209,0.0,1.0,2.0,3.0,3.0
"6. In a semester, how often have you been so restless due to academic pressure that it is hard to sit still?",2028.0,1.797337,0.992748,0.0,1.0,2.0,3.0,3.0
"7. In a semester, how often have you felt afraid, as if something awful might happen?",2028.0,1.732742,1.057113,0.0,1.0,2.0,3.0,3.0
Anxiety Value,2028.0,12.345168,5.493521,0.0,8.0,13.0,17.0,21.0


**Observations on Statistical Summary**

- All survey questions have values ranging from 0 to 3, which represents a Likert-scale type response.

- The mean values (around 1.6–1.8) indicate that students generally experience mild-to-moderate anxiety symptoms.

- Anxiety Value ranges from 0 to 21, with an average of 12.34, suggesting that many students fall toward the higher side of anxiety levels.

- The standard deviation shows moderate variation, meaning students’ responses are somewhat consistent.

- No negative values or extreme outliers were detected, confirming that the dataset is clean and reliable for further analysis.  
