---
title: "Detecting Potential Teen Smartphone Addiction via Anomaly Detection"
subtitle: "INFO 523 - Final Project"
author: 
  - name: "Vivek Aswal"
    affiliations:
      - name: "College of Information Science, University of Arizona"
description: "Identify anomalous patterns in teen smartphone usage data to detect potential smartphone addiction and associated behavioral impacts."
format:
   html:
    code-tools: true
    code-overflow: wrap
    embed-resources: true
editor: visual
execute:
  warning: false
  echo: false
jupyter: python3
---

## Abstract

Smartphone usage among teenagers has seen a dramatic increase over the past decade, raising growing concerns about digital overdependence and its potential psychological, social, and academic consequences. This project explores the detection of potential smartphone addiction among teens by applying anomaly detection techniques to usage pattern data. Using a publicly available dataset from Kaggle, we analyze daily app usage, screen time, sleep duration, and activity logs to identify behavioral anomalies that may signal addictive tendencies. By leveraging unsupervised learning models such as Isolation Forest and Autoencoders, we aim to uncover hidden patterns and outliers in the data. The outcome of this work could support the early detection of problematic phone use and inform interventions by parents, educators, and mental health professionals. Our study emphasizes the role of data science in addressing contemporary behavioral health challenges in the digital age.


## Introduction

Smartphone addiction among teens is a growing concern worldwide. Excessive use of social media, gaming apps, and late-night usage can affect mental health, sleep, and academic performance. Early identification of anomalous usage patterns can help in designing interventions. This project uses anomaly detection techniques to detect teens whose usage patterns deviate significantly from the norm.

## Dataset Description

The dataset `teen_phone_addiction_dataset.csv` contains detailed information about teen smartphone usage and associated behavioral, academic, and social factors. Each record corresponds to a unique teen, capturing daily habits and psychological indicators.

| Variable                     | Type        | Description                                      |
|-------------------------------|------------|-------------------------------------------------|
| `ID`                         | Integer    | Unique identifier for each teen                 |
| `Name`                       | Categorical| Teen's name                                     |
| `Age`                        | Integer    | Age of the teen                                 |
| `Gender`                     | Categorical| Gender of the teen (Male, Female, Other)       |
| `Location`                   | Categorical| Home location                                   |
| `School_Grade`               | Categorical| Current school grade                             |
| `Daily_Usage_Hours`          | Numeric    | Total daily phone usage in hours               |
| `Sleep_Hours`                | Numeric    | Average hours of sleep per day                 |
| `Academic_Performance`       | Numeric    | Academic performance score (0-100)            |
| `Social_Interactions`        | Numeric    | Average social interactions per day            |
| `Exercise_Hours`             | Numeric    | Hours spent on physical activity per day       |
| `Anxiety_Level`              | Numeric    | Anxiety level (0-10)                            |
| `Depression_Level`           | Numeric    | Depression level (0-10)                         |
| `Self_Esteem`                | Numeric    | Self-esteem score (0-10)                        |
| `Parental_Control`           | Numeric    | Level of parental control (0-10)               |
| `Screen_Time_Before_Bed`     | Numeric    | Phone usage in hours before bed                |
| `Phone_Checks_Per_Day`       | Numeric    | Number of times phone is checked daily         |
| `Apps_Used_Daily`            | Numeric    | Number of apps used daily                       |
| `Time_on_Social_Media`       | Numeric    | Hours spent on social media                     |
| `Time_on_Gaming`             | Numeric    | Hours spent on gaming                           |
| `Time_on_Education`          | Numeric    | Hours spent on educational apps                |
| `Phone_Usage_Purpose`        | Categorical| Primary purpose of phone usage                 |
| `Family_Communication`       | Numeric    | Level of family communication (0-10)          |
| `Weekend_Usage_Hours`        | Numeric    | Phone usage on weekends in hours               |
| `Addiction_Level`            | Numeric    | Overall addiction risk score (0-10)           |

This dataset will be used for anomaly detection to identify teens at risk of potential smartphone addiction based on their usage patterns, lifestyle, and psychological factors.

In [None]:
#| label: load-data
import pandas as pd

# Load the dataset
data_path = "data/teen_phone_addiction_dataset.csv"
df = pd.read_csv(data_path)

# Display first few rows
df.head()

## Exploratory Data Analysis (EDA)

In this section, we explore the dataset to understand the distribution of features, identify potential anomalies, and examine relationships between smartphone usage and behavioral or psychological indicators.

In [None]:
#| label: eda-imports
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Display all columns
df.columns

### Summary Statistics

In [None]:
#| label: summary-stats
# Basic statistics for numeric columns
df.describe()

### Distribution of Key Numeric Features
We examine the distributions of Daily_Usage_Hours, Sleep_Hours, Academic_Performance, Anxiety_Level, Depression_Level, and Self_Esteem.

In [None]:
#| label: distribution-plots
numeric_features = ['Daily_Usage_Hours', 'Sleep_Hours', 'Academic_Performance', 
                    'Social_Interactions', 'Exercise_Hours', 'Anxiety_Level', 
                    'Depression_Level', 'Self_Esteem', 'Screen_Time_Before_Bed', 
                    'Phone_Checks_Per_Day', 'Apps_Used_Daily', 'Time_on_Social_Media', 
                    'Time_on_Gaming', 'Time_on_Education', 'Family_Communication', 
                    'Weekend_Usage_Hours', 'Addiction_Level']

# Histograms
for col in numeric_features:
    plt.figure(figsize=(8,4))
    sns.histplot(df[col], kde=True, bins=20)
    plt.title(f'Distribution of {col}')
    plt.show()

### Correlation Analysis
We check correlations between numeric features to understand relationships between phone usage, behavioral, and psychological metrics.

In [None]:
#| label: correlation
plt.figure(figsize=(12,10))
corr = df[numeric_features].corr()
sns.heatmap(corr, annot=True, fmt=".2f", cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()

### Gender vs. Daily Phone Usage

In [None]:
#| label: gender-usage
plt.figure(figsize=(8,5))
sns.boxplot(x='Gender', y='Daily_Usage_Hours', data=df)
plt.title("Daily Phone Usage by Gender")
plt.show()

### Scatter Plot: Phone Usage vs. Sleep

In [None]:
#| label: usage-sleep
plt.figure(figsize=(8,5))
sns.scatterplot(x='Daily_Usage_Hours', y='Sleep_Hours', hue='Addiction_Level', data=df, palette='viridis')
plt.title("Daily Phone Usage vs Sleep Hours")
plt.show()

### Identifying Potential Outliers

We will use the Z-score method to detect extreme values (potential anomalies) in `Daily_Usage_Hours` and `Addiction_Level`.

In [None]:
#| label: outliers
from scipy import stats
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Select numeric columns to check for outliers
numeric_cols = ['Daily_Usage_Hours', 'Addiction_Level']

# Compute Z-scores
z_scores = stats.zscore(df[numeric_cols])
abs_z_scores = np.abs(z_scores)

# Identify rows where Z-score > 3 in any column
outliers = df[(abs_z_scores > 3).any(axis=1)]

# Display detected outliers
print("Detected Outliers:")
display(outliers)

# Visualize outliers
for col in numeric_cols:
    plt.figure(figsize=(8,4))
    sns.boxplot(x=df[col])
    plt.title(f'Boxplot of {col} (Outliers Highlighted)')
    plt.scatter(outliers.index, outliers[col], color='red', zorder=10, label='Outliers')
    plt.legend()
    plt.show()

### Anomaly Detection: Teen Smartphone Addiction

We will use Isolation Forest to identify potential anomalies in teen smartphone usage patterns, including excessive usage or high addiction levels.

In [None]:
#| label: anomaly-detection
from sklearn.ensemble import IsolationForest

# Select relevant numeric features for anomaly detection
features = [
    'Daily_Usage_Hours', 'Sleep_Hours', 'Academic_Performance',
    'Social_Interactions', 'Exercise_Hours', 'Anxiety_Level',
    'Depression_Level', 'Self_Esteem', 'Parental_Control',
    'Screen_Time_Before_Bed', 'Phone_Checks_Per_Day', 'Apps_Used_Daily',
    'Time_on_Social_Media', 'Time_on_Gaming', 'Time_on_Education',
    'Weekend_Usage_Hours', 'Addiction_Level'
]

X = df[features]

# Initialize Isolation Forest
iso_forest = IsolationForest(
    n_estimators=100,
    contamination=0.05,  # Assumes ~5% of teens are anomalous
    random_state=42
)

# Fit model
iso_forest.fit(X)

# Predict anomalies: -1 indicates anomaly, 1 indicates normal
df['Anomaly'] = iso_forest.predict(X)

# Separate normal vs anomalous records
anomalies = df[df['Anomaly'] == -1]
normal = df[df['Anomaly'] == 1]

print(f"Detected {len(anomalies)} potential anomalies out of {len(df)} records")
display(anomalies)

# Visualize anomalies
plt.figure(figsize=(10,6))
sns.scatterplot(
    data=df,
    x='Daily_Usage_Hours',
    y='Addiction_Level',
    hue='Anomaly',
    palette={1:'blue', -1:'red'},
    s=80
)
plt.title('Isolation Forest Anomaly Detection')
plt.xlabel('Daily Usage Hours')
plt.ylabel('Addiction Level')
plt.show()

#### Interpretation:

Red points indicate potential high-risk teens.

Isolation Forest detects anomalies considering multiple behavioral and academic features.

These teens may require attention or intervention.

### Conclusion

Teen smartphone addiction can be identified using multi-feature anomaly detection.

Daily phone usage, sleep patterns, mental health, and academic performance all contribute to the detection model.

Isolation Forest effectively highlights teens with unusual behavior patterns indicative of potential addiction.

Future work could include larger datasets, longitudinal analysis, or supervised classification for predictive modeling.


## Interactive Exploration Dashboard

This dashboard allows interactive exploration of anomalies detected in teen smartphone usage. Users can filter by **Daily Usage Hours**, **Addiction Level**, or **Sleep Hours** to inspect specific teens at risk.

In [None]:
#| label: interactive-dashboard
import ipywidgets as widgets
from IPython.display import display

# Define sliders for key features
daily_usage_slider = widgets.FloatSlider(
    value=df['Daily_Usage_Hours'].mean(),
    min=df['Daily_Usage_Hours'].min(),
    max=df['Daily_Usage_Hours'].max(),
    step=0.1,
    description='Daily Usage Hours:',
    continuous_update=False
)

addiction_level_slider = widgets.FloatSlider(
    value=df['Addiction_Level'].mean(),
    min=df['Addiction_Level'].min(),
    max=df['Addiction_Level'].max(),
    step=0.1,
    description='Addiction Level:',
    continuous_update=False
)

sleep_hours_slider = widgets.FloatSlider(
    value=df['Sleep_Hours'].mean(),
    min=df['Sleep_Hours'].min(),
    max=df['Sleep_Hours'].max(),
    step=0.1,
    description='Sleep Hours:',
    continuous_update=False
)

# Function to filter anomalies based on slider values
def filter_anomalies(daily_usage, addiction_level, sleep_hours):
    filtered = anomalies[
        (anomalies['Daily_Usage_Hours'] >= daily_usage - 1) &
        (anomalies['Daily_Usage_Hours'] <= daily_usage + 1) &
        (anomalies['Addiction_Level'] >= addiction_level - 1) &
        (anomalies['Addiction_Level'] <= addiction_level + 1) &
        (anomalies['Sleep_Hours'] >= sleep_hours - 1) &
        (anomalies['Sleep_Hours'] <= sleep_hours + 1)
    ]
    if filtered.empty:
        print("No anomalies in this range.")
    else:
        display(filtered[['ID','Name','Age','Gender','Daily_Usage_Hours','Sleep_Hours','Addiction_Level']])

# Create interactive widget
widgets.interact(
    filter_anomalies,
    daily_usage=daily_usage_slider,
    addiction_level=addiction_level_slider,
    sleep_hours=sleep_hours_slider
)