# SOCIAL MEDIA ADDICTION ANALYSIS AMONG UNIVERSITY STUDENTS

## Business Problem

**How can universities detect and reduce the negative academic impact of social media addiction among students using behavioral and usage data?**

University students are increasingly spending more time on social media platforms, often to the detriment of their academic performance, mental health, and sleep quality. This analysis aims to identify patterns of excessive usage, understand their correlation with academic outcomes, and create predictive tools to help institutions intervene early.

## Business Objective

To assist universities and student welfare departments in:
- Identifying students at risk of severe social media addiction.
- Understanding the behavioral predictors most correlated with academic and personal decline.
- Designing proactive intervention strategies that improve academic success and student well-being.

## Stakeholders

- University Administrators  
- Student Wellness and Counseling Departments  
- Academic Policy Makers  
- Educational Technology Startups  

## Analytics Objectives

1. **Classification Task**  
   Predict whether a student is at risk of social media addiction and its severity.

2. **Regression or Classification Task**  
   Predict the level of academic impact based on social media usage patterns.

3. **Clustering Task**  
   Segment students into behavioral groups for personalized outreach or interventions.

4. **Exploratory Data Analysis (EDA)**  
   - Discover the most influential features (e.g., time spent, platform type, time of day) associated with negative outcomes.
   - Visualize usage trends across demographics (gender, age, year of study).

## Potential Applications

- Early warning systems for academic counselors.
- Integration with learning platforms to flag at-risk behavior.
- Dashboards for student self-assessment and digital wellness tracking.


## INITIAL DATA EXPLORATION (IDE)

Every dataset tells a story- but before I dive into any narratives, I'll flip through the table of contents. This phase is about getting comfortable with the data: seeing what’s there, what’s missing, and what might surprise me later if I don’t pay attention now.

#### What I’m doing:
- Importing key libraries like 'pandas', 'numpy', 'seaborn', 'matplotlib', and 'plotly'- the usual suspects for slicing, dicing and visualizing data.
- Previewing the first few rows to get a feel for the dataset’s structure, naming conventions, and early red flags (no one likes nasty surprises 30 cells in).
- Checking the shape of the data because whether it's 500 rows or 50,000 completely changes the game.
- Get metadata
- Get basic statistics information of both numerica and categorical columns

This might not be the flashiest part of the workflow, but it’s where trust is built- between me and the dataset. And as I’ve learned from previous projects, a few extra minutes spent here can save hours of confusion down the road.

Exploration done right is part instinct, part structure- this is BOTH!

In [None]:
# Mathematical computation and data manipulation libraries
import numpy as np
import pandas as pd

# Data visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Load the data
std_data = pd.read_csv('Students Social Media Addiction.csv')

# Preview first 5
std_data.head()

Unnamed: 0,Student_ID,Age,Gender,Academic_Level,Country,Avg_Daily_Usage_Hours,Most_Used_Platform,Affects_Academic_Performance,Sleep_Hours_Per_Night,Mental_Health_Score,Relationship_Status,Conflicts_Over_Social_Media,Addicted_Score
0,1,19,Female,Undergraduate,Bangladesh,5.2,Instagram,Yes,6.5,6,In Relationship,3,8
1,2,22,Male,Graduate,India,2.1,Twitter,No,7.5,8,Single,0,3
2,3,20,Female,Undergraduate,USA,6.0,TikTok,Yes,5.0,5,Complicated,4,9
3,4,18,Male,High School,UK,3.0,YouTube,No,7.0,7,Single,1,4
4,5,21,Male,Graduate,Canada,4.5,Facebook,Yes,6.0,6,In Relationship,2,7


In [None]:
# Check how many rows and columns I am working with
print(f'The dataset has {std_data.shape[0]} rows and {std_data.shape[1]} columns')

# Check column names to inform on standardisation needs
print('\nColumn Names:\n', std_data.columns)

The dataset has 705 rows and 13 columns

Columns Names:
 Index(['Student_ID', 'Age', 'Gender', 'Academic_Level', 'Country',
       'Avg_Daily_Usage_Hours', 'Most_Used_Platform',
       'Affects_Academic_Performance', 'Sleep_Hours_Per_Night',
       'Mental_Health_Score', 'Relationship_Status',
       'Conflicts_Over_Social_Media', 'Addicted_Score'],
      dtype='object')


In [7]:
# Standardise column names
std_data.columns = (std_data.columns.str.strip().str.lower())

# Preview changes
std_data.sample(4)

Unnamed: 0,student_id,age,gender,academic_level,country,avg_daily_usage_hours,most_used_platform,affects_academic_performance,sleep_hours_per_night,mental_health_score,relationship_status,conflicts_over_social_media,addicted_score
484,485,19,Female,Undergraduate,Switzerland,2.3,Instagram,No,9.5,8,Single,2,4
557,558,21,Male,Graduate,Poland,3.7,Facebook,No,8.4,7,Single,2,5
605,606,19,Male,Undergraduate,Denmark,4.7,Instagram,No,7.2,7,In Relationship,2,5
563,564,22,Male,Graduate,New Zealand,4.1,Instagram,Yes,8.1,7,In Relationship,3,6


In [8]:
# Get metadata
std_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 705 entries, 0 to 704
Data columns (total 13 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   student_id                    705 non-null    int64  
 1   age                           705 non-null    int64  
 2   gender                        705 non-null    object 
 3   academic_level                705 non-null    object 
 4   country                       705 non-null    object 
 5   avg_daily_usage_hours         705 non-null    float64
 6   most_used_platform            705 non-null    object 
 7   affects_academic_performance  705 non-null    object 
 8   sleep_hours_per_night         705 non-null    float64
 9   mental_health_score           705 non-null    int64  
 10  relationship_status           705 non-null    object 
 11  conflicts_over_social_media   705 non-null    int64  
 12  addicted_score                705 non-null    int64  
dtypes: fl

In [None]:
# Get basic statistical info of numerical variables
std_data.describe()

Unnamed: 0,student_id,age,avg_daily_usage_hours,sleep_hours_per_night,mental_health_score,conflicts_over_social_media,addicted_score
count,705.0,705.0,705.0,705.0,705.0,705.0,705.0
mean,353.0,20.659574,4.918723,6.868936,6.22695,2.849645,6.436879
std,203.660256,1.399217,1.257395,1.126848,1.105055,0.957968,1.587165
min,1.0,18.0,1.5,3.8,4.0,0.0,2.0
25%,177.0,19.0,4.1,6.0,5.0,2.0,5.0
50%,353.0,21.0,4.8,6.9,6.0,3.0,7.0
75%,529.0,22.0,5.8,7.7,7.0,4.0,8.0
max,705.0,24.0,8.5,9.6,9.0,5.0,9.0


In [10]:
# Get basic statistical info of categorical variables
std_data.describe(include = 'O').T

Unnamed: 0,count,unique,top,freq
gender,705,2,Female,353
academic_level,705,3,Undergraduate,353
country,705,110,India,53
most_used_platform,705,12,Instagram,249
affects_academic_performance,705,2,Yes,453
relationship_status,705,3,Single,384


In [14]:
# Check for duplicattes and nulls
print('Duplicates:', std_data.duplicated().sum())
print('\nNull Values:\n', std_data.isna().sum())

Duplicates: 0

Null Values:
 student_id                      0
age                             0
gender                          0
academic_level                  0
country                         0
avg_daily_usage_hours           0
most_used_platform              0
affects_academic_performance    0
sleep_hours_per_night           0
mental_health_score             0
relationship_status             0
conflicts_over_social_media     0
addicted_score                  0
dtype: int64


## Data Understanding

Before diving deep into analysis, it's crucial to understand the landscape of the dataset — its structure, health, and what the raw numbers are whispering beneath the surface.

### Dataset Snapshot

The dataset comprises **705 student records**, each detailing social media habits, academic status, and mental health indicators. The data is clean and structured — no null values, no duplicate records. A solid foundation to build insights on.

### Data Integrity

**Duplicates:** 0- Each entry is unique.

**Null Values:** None- All fields are complete.

This means no immediate data wrangling is needed — we can proceed straight to meaningful exploration.

### Categorical Features

| Feature | Unique Values | Most Frequent | Frequency |
|---------|----------------|----------------|-----------|
| 'gender' | 2 | Female | 353 |
| 'academic_level' | 3 | Undergraduate | 353 |
| 'country' | 110 | India | 53 |
| 'most_used_platform' | 12 | Instagram | 249 |
| 'affects_academic_performance' | 2 | Yes | 453 |
| 'relationship_status' | 3 | Single | 384 |

**Observation:**  
- The dataset skews slightly towards **female students** and **undergraduates**.  
- **Instagram** dominates as the most-used platform, hinting at a potential hotspot for behavioral patterns.  
- A majority acknowledge that social media affects their academic performance.

### Numerical Features (Summary Statistics)

| Feature | Mean | Std | Min | 25% | 50% | 75% | Max |
|---------|------|-----|-----|-----|-----|-----|-----|
| 'age' | 20.66 | 1.40 | 18 | 19 | 21 | 22 | 24 |
| 'avg_daily_usage_hours' | 4.92 | 1.26 | 1.5 | 4.1 | 4.8 | 5.8 | 8.5 |
| 'sleep_hours_per_night' | 6.87 | 1.13 | 3.8 | 6.0 | 6.9 | 7.7 | 9.6 |
| 'mental_health_score' | 6.23 | 1.11 | 4 | 5 | 6 | 7 | 9 |
| 'conflicts_over_social_media' | 2.85 | 0.96 | 0 | 2 | 3 | 4 | 5 |
| 'addicted_score' | 6.44 | 1.59 | 2 | 5 | 7 | 8 | 9 |

**Observation Highlights:**
- On average, students spend just under **5 hours daily** on social media- nearly a part-time job.
- Most students report getting **~7 hours of sleep**, but there's a low-end outlier at 3.8 hours.
- **Mental health scores** lean toward moderate to good (mean ≈ 6.2 out of 10), but deeper analysis may reveal platform or usage correlations.
- **Addiction scores** show notable variation- an area worth visualizing across academic levels or usage time.

### Early Insight Teasers

- Heavy Instagram usage and high average screen time could signal burnout or academic pressure.
- Students reporting more social media conflicts tend to rate higher on the addiction scale.
- Relationship status might have an influence on both usage and mental health metrics- a juicy angle for later.

Next step? Let’s dive into **exploratory data analysis** and uncover the patterns hiding in plain sight.