# Problem

*  Student life is fraught with challenges that extend beyond academic responsibilities, encompassing social pressures, time management difficulties, and the need for personal development. The prevalent stress experienced by students can significantly hinder their mental health and academic performance. Despite this, there remains a lack of focused, data-driven analysis on the varied sources of this stress and their direct impacts on students' daily lives and educational outcomes.



# Data Mining Task


* The goal of collecting the dataset "Student Stress Factors: A Comprehensive Analysis" is likely to understand and analyze the factors contributing to stress among students. By gathering data on various aspects of students' lives, such as demographics, academic performance, lifestyle choices, and mental health indicators, the dataset aims to provide insights into the complexities of student stress. The ultimate objective may include identifying patterns, correlations, and potential predictors of stress, as well as informing interventions and strategies to support student well-being in educational settings.

In [None]:
1. Classification Task:
Goal: Predict the "Stress Level" based on various factors such as anxiety, depression, academic performance, and social dynamics.
Class Attribute: Stress Level (e.g., Low, Moderate, High — Depending on the scale provided in the dataset, this might need to be categorized if not already done).
Features/Attributes: All other columns except "Stress Level".
2. Clustering Task:
Goal: Group students into clusters based on their responses to various stress-related factors to identify common stress profiles.
Features/Attributes: All columns except "Stress Level" could be used to form clusters to identify patterns without predefined outcomes.
These tasks can help uncover underlying patterns in how stress factors affect different groups of students and predict stress levels based on measurable attributes, aiding in targeted interventions. Let's proceed with the analysis or further refine these tasks based on your specific goals. ​​


# Data

- [The source](https://www.kaggle.com/datasets/rxnach/student-stress-factors-a-comprehensive-analysis)
- Number of objects: 1100
- Number of attributes: 21
- Class label: Stress Level

| Attribute Name                  | Description                                       | Range     | Attribute Type |
|---------------------------------|---------------------------------------------------|-----------|----------------|
| Anxiety Level                   | Intensity of anxiety symptoms                     | 0 to 21   | int64          |
| Self Esteem                     | Overall subjective evaluation of one's own worth  | 0 to 30   | int64          |
| Mental Health History           | History of mental health issues (0=No, 1=Yes)     | 0 to 1    | int64          |
| Depression                      | Severity of depression symptoms                   | 0 to 27   | int64          |
| Headache                        | Frequency or severity of headaches                | 0 to 5    | int64          |
| Blood Pressure                  | Levels of blood pressure (1=Low, 2=Normal, 3=High)| 1 to 3    | int64          |
| Sleep Quality                   | Perceived quality of sleep                        | 0 to 5    | int64          |
| Breathing Problem               | Severity of breathing problems                    | 0 to 5    | int64          |
| Noise Level                     | Level of noise in the environment                 | 0 to 5    | int64          |
| Living Conditions               | Quality of living conditions                      | 0 to 5    | int64          |
| Safety                          | Perception of safety                              | 0 to 5    | int64          |
| Basic Needs                     | Fulfillment of basic needs                        | 0 to 5    | int64          |
| Academic Performance            | Perception of academic performance                | 0 to 5    | int64          |
| Study Load                      | Perceived workload from studies                   | 0 to 5    | int64          |
| Teacher-Student Relationship    | Quality of teacher-student relationships          | 0 to 5    | int64          |
| Future Career Concerns          | Level of concern about future career prospects    | 0 to 5    | int64          |
| Social Support                  | Level of social support received                  | 0 to 3    | int64          |
| Peer Pressure                   | Level of pressure felt from peers                 | 0 to 5    | int64          |
| Extracurricular Activities      | Extent of participation in activities             | 0 to 5    | int64          |
| Bullying                        | Presence and severity of bullying experiences     | 0 to 5    | int64          |
| Stress Level                    | Overall level of stress experienced               | 0 to 2    | int64          |


* ### Missing Value

In [None]:
#missing value 
missing_values = ds.isna()
missing_values_count = ds.isna().sum()
print(missing_values_count)

In [None]:
missing_values = df.isna()

missing_counts = missing_values.sum()

rows_with_missing = df[df.isna().any(axis=1)]

print("Missing values in each column: \n",missing_counts);
#print(missing_counts);
print("\n Rows with missing values:",rows_with_missing);
#print(rows_with_missing);

* ### Statistical Summary

In [None]:
import matplotlib.pyplot as plt

ds = pd.read_csv('../dataset/Sampled_StressLevelDataset.csv' )

#Statistical Summary
print("Statistical Summary:")
print(ds.describe())


In [None]:


#statical summary 
summary_stats=df.describe()
print(summary_stats)

import pandas as pd
import matplotlib.pyplot as plt


# Plotting the summary statistics

fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(20, 10))
fig.subplots_adjust(hspace=0.5, wspace=0.3)
stats_to_plot = ['mean', 'std', '50%', 'min', 'max']

for i, stat in enumerate(stats_to_plot):
    ax = axes.flat[i]
    summary_stats.loc[stat].plot(kind='bar', ax=ax)
    ax.set_title(stat)
    ax.set_xticklabels(summary_stats.columns, rotation=45, ha='right')

* ### Outliers

In [None]:
import pandas as pd
import numpy as np


df = pd.read_csv('StressLevelcleaned_dataset.csv')

def count_outliers(column, threshold=3):
    column_mean = column.mean()
    column_std = column.std()
    upper_bound = column_mean + threshold * column_std
    lower_bound = column_mean - threshold * column_std
    num_outliers = ((column > upper_bound) | (column < lower_bound)).sum()
    return num_outliers

outlier_counts = {}
numeric_columns = df.select_dtypes(include=[np.number]).columns

for column in numeric_columns:
    num_outliers = count_outliers(df[column])
    outlier_counts[column] = num_outliers

rows_with_outliers = sum(outlier_counts.values())

for column, count in outlier_counts.items():
    print(f"Column: {column}, Number of outliers: {count}")

print(f"Total rows with outliers: {rows_with_outliers}")

* ### Boxplot

In [None]:
import matplotlib.pyplot as plt

# Filter out non-numeric columns
numeric_cols = df.select_dtypes(include=['float64', 'int64']).columns

# Creating a box plot for each numeric column
num_plots = len(numeric_cols)
fig, axes = plt.subplots(num_plots, 1, figsize=(10, num_plots*5))

for i, col in enumerate(numeric_cols):
    df.boxplot(column=col, ax=axes[i], grid=False)
    axes[i].set_title(col)

plt.tight_layout()
plt.show()

* ### Plotting Methods:

In [None]:
Graph show relation between stress level and academic performance
-Based on The histogram and scatter that represent the relation between stress level and academic performance and by the grapgh we find out that the student with high academic level they have a high stress level

# histogram represent the relation between stress level and academic performance

plt.figure(figsize=(10, 6))

# Overlaying the histograms of Stress Level and Academic Performance
plt.hist(df['stress_level'], bins=20, color='skyblue', edgecolor='black', alpha=0.5, label='Stress Level')
plt.hist(df['academic_performance'], bins=20, color='lightgreen', edgecolor='black', alpha=0.5, label='Academic Performance')

plt.title('Distributions of Stress Level and Academic Performance')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()
plt.show()

# Data preprocessing

# Data Mining Technique


# Evaluation and Comparison


# Findings

# References

* https://www.kaggle.com/