# **Student Social Media Addiction EDA**

## **1. Introduction**

### **What dataset are you looking at?**
This analysis examines the "Social Media Addiction vs Relationships" dataset created by Adil Shamim and published on Kaggle. The dataset focuses on students' social media addiction patterns and their correlation with personal relationships, providing a comprehensive cross-country survey of usage behaviors and their impacts.

### **Where/how was it created?**
The dataset was compiled through a structured survey methodology that collected data from students across multiple countries. The data collection process involved gathering information about:

Social media usage patterns - Time spent on various platforms, frequency of use, and engagement behaviors
Addiction indicators - Compulsive usage patterns, withdrawal symptoms, and behavioral changes
Relationship metrics - Quality of personal relationships, social interactions, and interpersonal communication patterns
Academic performance indicators - How social media usage correlates with educational outcomes
Demographic information - Age, gender, geographic location, and educational background

The survey was designed to capture both quantitative metrics (usage hours, frequency scores) and qualitative assessments (relationship satisfaction, perceived impact on well-being).

### **What questions will be asked?**
This dataset enables investigation of several critical research questions:

Primary Research Questions:

- Correlation Analysis: What is the relationship between social media addiction levels and the quality of personal relationships among students?

- Usage Pattern Impact: How do different social media usage patterns (passive vs. active engagement) affect interpersonal relationship satisfaction?

- Cross-Cultural Variations: Are there significant differences in social media addiction patterns and their relationship impacts across different countries and cultures?

- Academic Performance Connection: Does social media addiction correlate with decreased academic performance, and does this impact affect relationship quality?

Secondary Research Questions:

- Gender and Age Factors: Do social media addiction patterns and their relationship impacts vary significantly by demographic factors?

- Platform-Specific Effects: Do different social media platforms (Instagram, TikTok, Facebook, etc.) have varying impacts on relationship quality?

- Intervention Insights: What usage thresholds or patterns could indicate when social media use becomes problematic for relationship health?

- Predictive Modeling: Can we develop models to predict relationship satisfaction based on social media usage patterns and addiction indicators?

This dataset provides valuable insights into the modern digital landscape's impact on young adults' social development and interpersonal connections, offering data-driven perspectives on a increasingly relevant social phenomenon.

## 2. Cleaning and Organizing Data

In [8]:
import pandas as pd

# Loading the data from the .csv file
file_path = 'data/housing.csv'
df = pd.read_csv(file_path)

df_info = df.info()
df_preview = df.head()

df.shape, df.columns, df_preview

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   longitude           20640 non-null  float64
 1   latitude            20640 non-null  float64
 2   housing_median_age  20640 non-null  float64
 3   total_rooms         20640 non-null  float64
 4   total_bedrooms      20433 non-null  float64
 5   population          20640 non-null  float64
 6   households          20640 non-null  float64
 7   median_income       20640 non-null  float64
 8   median_house_value  20640 non-null  float64
 9   ocean_proximity     20640 non-null  object 
dtypes: float64(9), object(1)
memory usage: 1.6+ MB


((20640, 10),
 Index(['longitude', 'latitude', 'housing_median_age', 'total_rooms',
        'total_bedrooms', 'population', 'households', 'median_income',
        'median_house_value', 'ocean_proximity'],
       dtype='object'),
    longitude  latitude  housing_median_age  total_rooms  total_bedrooms  \
 0    -122.23     37.88                41.0        880.0           129.0   
 1    -122.22     37.86                21.0       7099.0          1106.0   
 2    -122.24     37.85                52.0       1467.0           190.0   
 3    -122.25     37.85                52.0       1274.0           235.0   
 4    -122.25     37.85                52.0       1627.0           280.0   
 
    population  households  median_income  median_house_value ocean_proximity  
 0       322.0       126.0         8.3252            452600.0        NEAR BAY  
 1      2401.0      1138.0         8.3014            358500.0        NEAR BAY  
 2       496.0       177.0         7.2574            352100.0        NEAR

### Checking for missing values

In [9]:
df.isna().any()

longitude             False
latitude              False
housing_median_age    False
total_rooms           False
total_bedrooms         True
population            False
households            False
median_income         False
median_house_value    False
ocean_proximity       False
dtype: bool

## 3. Visualizations

## 4. Descriptive Analysis

## 5. Conclusion