## Exploratory Data Analysis on the dataset

In [2]:
# importing the necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


In [3]:
# importing the dataset
df = pd.read_csv('Projects/teen_phone_addiction/teen_phone_addiction_dataset.csv')

#### Info & Description About the data

In [4]:
# Info about the data
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 25 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   ID                      3000 non-null   int64  
 1   Name                    3000 non-null   object 
 2   Age                     3000 non-null   int64  
 3   Gender                  3000 non-null   object 
 4   Location                3000 non-null   object 
 5   School_Grade            3000 non-null   object 
 6   Daily_Usage_Hours       3000 non-null   float64
 7   Sleep_Hours             3000 non-null   float64
 8   Academic_Performance    3000 non-null   int64  
 9   Social_Interactions     3000 non-null   int64  
 10  Exercise_Hours          3000 non-null   float64
 11  Anxiety_Level           3000 non-null   int64  
 12  Depression_Level        3000 non-null   int64  
 13  Self_Esteem             3000 non-null   int64  
 14  Parental_Control        3000 non-null   

In [15]:
print(df.describe())

                ID          Age  Daily_Usage_Hours  Sleep_Hours  \
count  3000.000000  3000.000000        3000.000000  3000.000000   
mean   1500.500000    15.969667           5.020667     6.489767   
std     866.169729     1.989489           1.956501     1.490713   
min       1.000000    13.000000           0.000000     3.000000   
25%     750.750000    14.000000           3.700000     5.500000   
50%    1500.500000    16.000000           5.000000     6.500000   
75%    2250.250000    18.000000           6.400000     7.500000   
max    3000.000000    19.000000          11.500000    10.000000   

       Academic_Performance  Social_Interactions  Exercise_Hours  \
count           3000.000000          3000.000000     3000.000000   
mean              74.947333             5.097667        1.040667   
std               14.684156             3.139333        0.734620   
min               50.000000             0.000000        0.000000   
25%               62.000000             2.000000        

In [6]:
print(df.head())

   ID               Name  Age  Gender          Location School_Grade  \
0   1    Shannon Francis   13  Female        Hansonfort          9th   
1   2    Scott Rodriguez   17  Female      Theodorefort          7th   
2   3        Adrian Knox   13   Other       Lindseystad         11th   
3   4  Brittany Hamilton   18  Female      West Anthony         12th   
4   5       Steven Smith   14   Other  Port Lindsaystad          9th   

   Daily_Usage_Hours  Sleep_Hours  Academic_Performance  Social_Interactions  \
0                4.0          6.1                    78                    5   
1                5.5          6.5                    70                    5   
2                5.8          5.5                    93                    8   
3                3.1          3.9                    78                    8   
4                2.5          6.7                    56                    4   

   ...  Screen_Time_Before_Bed  Phone_Checks_Per_Day  Apps_Used_Daily  \
0  ...       

---

<H2  style = "color:royalblue">Understanding Features</H2>

<H3 ">Feature Selection based on its datatype</H3>

#### **Categorical Features**

In [7]:
# Categorical features
categorical_features = df.select_dtypes(include=["object"]).columns
print("Categorical Features:\n")
for i in categorical_features.to_list():
    print(i)

Categorical Features:

Name
Gender
Location
School_Grade
Phone_Usage_Purpose


####  **Numerical Features**

In [8]:
# Numerical features
numerical_features = df.select_dtypes(include=["int64"]).columns
float_features = df.select_dtypes(include=["float64"]).columns

print("Numerical Features:\n")
for i in numerical_features.to_list():
    print(i)
print("\n--------------------------------------\n")
print("Float Features:\n")
for i in float_features.to_list():
    print(i)

Numerical Features:

ID
Age
Academic_Performance
Social_Interactions
Anxiety_Level
Depression_Level
Self_Esteem
Parental_Control
Phone_Checks_Per_Day
Apps_Used_Daily
Family_Communication

--------------------------------------

Float Features:

Daily_Usage_Hours
Sleep_Hours
Exercise_Hours
Screen_Time_Before_Bed
Time_on_Social_Media
Time_on_Gaming
Time_on_Education
Weekend_Usage_Hours
Addiction_Level


#### **Identifiers**
   - ID: entity id
   - Name: entiry name
  


#### **Time Series Features**: N/A

#### **Text Features:** N/A

---

### More Analysis on the Data

1. Average Daily Usuage Hours by Gender

In [10]:
avg_st = df.groupby("Gender")["Daily_Usage_Hours"].mean()
print(avg_st)

Gender
Female    5.052532
Male      5.054626
Other     4.952508
Name: Daily_Usage_Hours, dtype: float64


2. Avg Sleep Hours, Exercise Hours, Screen Time before bed, Time on Social media, time on gaming, time on education, weekend usuage hours and addiction level by gender
   

In [14]:
avg_info  = df.groupby("Gender")[["Sleep_Hours","Exercise_Hours","Screen_Time_Before_Bed","Time_on_Social_Media","Time_on_Gaming","Time_on_Education","Weekend_Usage_Hours","Addiction_Level"]].mean()
print(avg_info)

        Sleep_Hours  Exercise_Hours  Screen_Time_Before_Bed  \
Gender                                                        
Female     6.499206        1.027607                1.010328   
Male       6.502854        1.050689                0.997244   
Other      6.466428        1.043705                1.012897   

        Time_on_Social_Media  Time_on_Gaming  Time_on_Education  \
Gender                                                            
Female              2.507746        1.547964           1.050348   
Male                2.496457        1.583268           1.036319   
Other               2.493347        1.441556           0.960491   

        Weekend_Usage_Hours  Addiction_Level  
Gender                                        
Female             6.071500         8.950645  
Male               5.952461         8.867323  
Other              6.022108         8.826203  


3. Total highly addicted students in the dataset


In [19]:
high_addict_with_count = df.groupby(['Gender', 'Age'])['Addiction_Level'].agg(['max', 'count'])
print(high_addict_with_count)

             max  count
Gender Age             
Female 13   10.0    146
       14   10.0    130
       15   10.0    146
       16   10.0    178
       17   10.0    142
       18   10.0    129
       19   10.0    136
Male   13   10.0    140
       14   10.0    147
       15   10.0    140
       16   10.0    152
       17   10.0    134
       18   10.0    143
       19   10.0    160
Other  13   10.0    147
       14   10.0    150
       15   10.0    145
       16   10.0    137
       17   10.0    136
       18   10.0    137
       19   10.0    125


---

<H3 style = "color:brown"> Features by Role in ML Model<H3>

#### [Hypothesis 1](/H1.ipynb): Students Performance based on their Screen time
- **Target Variables (Dependent Variables)**
  - *School Grade* :  Categorial
  - *Academic performance* : numerical (int)
  
- **Feature Variables (or Indepencdent Variables)**
  - Sleep_Hours
  - Screen_Time_Before_Bed
  - Time_on_Social_Media
  - Time_on_Gaming
  - Time_on_Education
  - Weekend_Usuage_Hours

[Link To Hypothesis 1](/H1.ipynb)
