# Title: Usage of Smart Phones among teenagers in US

### Aim of the Project

#### To identify the primary reasons teenagers use mobile phones (e.g., communication, entertainment, education).
#### To measure the average duration of daily phone use among teenagers.
#### To assess the impact of mobile phone usage on academic performance and concentration.
#### To explore psychological and behavioral effects, such as anxiety, sleep issues, or addiction.
#### To evaluate the influence of smartphones on social interaction, both online and offline.
#### To compare phone usage patterns across demographics (age, gender, urban/rural, etc.).

### Problem Definition & DataSet Selection

##### Smart phone use is on the rise globally, which may have an impact on people's health. The third-largest country in terms of mobile phone usage is United States.However, there aren't many studies that have been done in US to evaluate its health impacts. So in this scenario, I take this opportunity to conduct a detailed analysis on the usage of Mobile Phones among the teen in US. Data set was collected from Kaggle.com which contains the data of 3000 students in and around US. The questionnaire contains 24 questions and the insights of the data-set(size of 324kb) can be used for the benefit for future generation.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
df = pd.read_csv("D:\Python\Project\Smart Phones_Teenagers.csv")

In [4]:
print(df)

        ID               Name  Age  Gender          Location School_Grade  \
0        1    Shannon Francis   13  Female        Hansonfort          9th   
1        2    Scott Rodriguez   17  Female      Theodorefort          7th   
2        3        Adrian Knox   13   Other       Lindseystad         11th   
3        4  Brittany Hamilton   18  Female      West Anthony         12th   
4        5       Steven Smith   14   Other  Port Lindsaystad          9th   
...    ...                ...  ...     ...               ...          ...   
2995  2996        Jesus Yates   16  Female      New Jennifer         12th   
2996  2997     Bethany Murray   13  Female       Richardport          8th   
2997  2998      Norman Hughes   14   Other        Rebeccaton          7th   
2998  2999     Barbara Hinton   17  Female      Ramirezmouth          9th   
2999  3000     Curtis Johnson   17    Male    Lake Alexander         10th   

      Daily_Usage_Hours  Sleep_Hours  Academic_Performance  \
0            

In [5]:
num_rows= len(df)
num_columns= len(df.columns)
print(f"Number of rows: {num_rows}")
print(f"Number of Columns: {num_columns}")

Number of rows: 3000
Number of Columns: 25


In [6]:
all_dtypes = df.dtypes
print(f"Datatype of all columns: {all_dtypes}")

Datatype of all columns: ID                          int64
Name                       object
Age                         int64
Gender                     object
Location                   object
School_Grade               object
Daily_Usage_Hours         float64
Sleep_Hours               float64
Academic_Performance        int64
Social_Interactions         int64
Exercise_Hours            float64
Anxiety_Level               int64
Depression_Level            int64
Self_Esteem                 int64
Parental_Control            int64
Screen_Time_Before_Bed    float64
Phone_Checks_Per_Day        int64
Apps_Used_Daily             int64
Time_on_Social_Media      float64
Time_on_Gaming            float64
Time_on_Education         float64
Phone_Usage_Purpose        object
Family_Communication        int64
Weekend_Usage_Hours       float64
Addiction_Level           float64
dtype: object


In [7]:
df.head()

Unnamed: 0,ID,Name,Age,Gender,Location,School_Grade,Daily_Usage_Hours,Sleep_Hours,Academic_Performance,Social_Interactions,...,Screen_Time_Before_Bed,Phone_Checks_Per_Day,Apps_Used_Daily,Time_on_Social_Media,Time_on_Gaming,Time_on_Education,Phone_Usage_Purpose,Family_Communication,Weekend_Usage_Hours,Addiction_Level
0,1,Shannon Francis,13,Female,Hansonfort,9th,4.0,6.1,78,5,...,1.4,86,19,3.6,1.7,1.2,Browsing,4,8.7,10.0
1,2,Scott Rodriguez,17,Female,Theodorefort,7th,5.5,6.5,70,5,...,0.9,96,9,1.1,4.0,1.8,Browsing,2,5.3,10.0
2,3,Adrian Knox,13,Other,Lindseystad,11th,5.8,5.5,93,8,...,0.5,137,8,0.3,1.5,0.4,Education,6,5.7,9.2
3,4,Brittany Hamilton,18,Female,West Anthony,12th,3.1,3.9,78,8,...,1.4,128,7,3.1,1.6,0.8,Social Media,8,3.0,9.8
4,5,Steven Smith,14,Other,Port Lindsaystad,9th,2.5,6.7,56,4,...,1.0,96,20,2.6,0.9,1.1,Gaming,10,3.7,8.6


In [8]:
df.tail()

Unnamed: 0,ID,Name,Age,Gender,Location,School_Grade,Daily_Usage_Hours,Sleep_Hours,Academic_Performance,Social_Interactions,...,Screen_Time_Before_Bed,Phone_Checks_Per_Day,Apps_Used_Daily,Time_on_Social_Media,Time_on_Gaming,Time_on_Education,Phone_Usage_Purpose,Family_Communication,Weekend_Usage_Hours,Addiction_Level
2995,2996,Jesus Yates,16,Female,New Jennifer,12th,3.9,6.4,53,4,...,0.3,80,15,2.7,1.8,1.0,Other,8,9.4,9.8
2996,2997,Bethany Murray,13,Female,Richardport,8th,3.6,7.3,93,5,...,0.9,45,8,3.1,0.0,0.3,Gaming,9,5.2,5.5
2997,2998,Norman Hughes,14,Other,Rebeccaton,7th,3.2,6.5,98,1,...,0.2,51,13,2.4,0.2,2.4,Social Media,9,5.9,6.2
2998,2999,Barbara Hinton,17,Female,Ramirezmouth,9th,6.7,7.5,67,3,...,1.6,125,17,1.7,2.6,1.5,Browsing,4,6.1,10.0
2999,3000,Curtis Johnson,17,Male,Lake Alexander,10th,3.5,6.9,79,4,...,0.6,117,8,0.0,2.3,0.1,Education,7,5.1,6.3


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 25 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   ID                      3000 non-null   int64  
 1   Name                    3000 non-null   object 
 2   Age                     3000 non-null   int64  
 3   Gender                  3000 non-null   object 
 4   Location                3000 non-null   object 
 5   School_Grade            3000 non-null   object 
 6   Daily_Usage_Hours       3000 non-null   float64
 7   Sleep_Hours             3000 non-null   float64
 8   Academic_Performance    3000 non-null   int64  
 9   Social_Interactions     3000 non-null   int64  
 10  Exercise_Hours          3000 non-null   float64
 11  Anxiety_Level           3000 non-null   int64  
 12  Depression_Level        3000 non-null   int64  
 13  Self_Esteem             3000 non-null   int64  
 14  Parental_Control        3000 non-null   

In [10]:
df.describe()

Unnamed: 0,ID,Age,Daily_Usage_Hours,Sleep_Hours,Academic_Performance,Social_Interactions,Exercise_Hours,Anxiety_Level,Depression_Level,Self_Esteem,Parental_Control,Screen_Time_Before_Bed,Phone_Checks_Per_Day,Apps_Used_Daily,Time_on_Social_Media,Time_on_Gaming,Time_on_Education,Family_Communication,Weekend_Usage_Hours,Addiction_Level
count,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0
mean,1500.5,15.969667,5.020667,6.489767,74.947333,5.097667,1.040667,5.59,5.460333,5.546333,0.507333,1.006733,83.093,12.609333,2.499233,1.525267,1.016333,5.459667,6.0151,8.8819
std,866.169729,1.989489,1.956501,1.490713,14.684156,3.139333,0.73462,2.890678,2.871557,2.860754,0.50003,0.492878,37.747044,4.611486,0.988201,0.932701,0.648341,2.864572,2.014776,1.609598
min,1.0,13.0,0.0,3.0,50.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,20.0,5.0,0.0,0.0,0.0,1.0,0.0,1.0
25%,750.75,14.0,3.7,5.5,62.0,2.0,0.5,3.0,3.0,3.0,0.0,0.7,51.0,9.0,1.8,0.8,0.5,3.0,4.7,8.0
50%,1500.5,16.0,5.0,6.5,75.0,5.0,1.0,6.0,5.0,6.0,1.0,1.0,82.0,13.0,2.5,1.5,1.0,5.0,6.0,10.0
75%,2250.25,18.0,6.4,7.5,88.0,8.0,1.5,8.0,8.0,8.0,1.0,1.4,115.25,17.0,3.2,2.2,1.5,8.0,7.4,10.0
max,3000.0,19.0,11.5,10.0,100.0,10.0,4.0,10.0,10.0,10.0,1.0,2.6,150.0,20.0,5.0,4.0,3.0,10.0,14.0,10.0


### Data Cleaning & Preprocessing

In [11]:
df.isnull().sum()

ID                        0
Name                      0
Age                       0
Gender                    0
Location                  0
School_Grade              0
Daily_Usage_Hours         0
Sleep_Hours               0
Academic_Performance      0
Social_Interactions       0
Exercise_Hours            0
Anxiety_Level             0
Depression_Level          0
Self_Esteem               0
Parental_Control          0
Screen_Time_Before_Bed    0
Phone_Checks_Per_Day      0
Apps_Used_Daily           0
Time_on_Social_Media      0
Time_on_Gaming            0
Time_on_Education         0
Phone_Usage_Purpose       0
Family_Communication      0
Weekend_Usage_Hours       0
Addiction_Level           0
dtype: int64

In [16]:
 # Remove unnecessary columns (ID and Name)

df_clean = df.drop(columns=["ID", "Name"])
print(df_clean)

      Age  Gender          Location School_Grade  Daily_Usage_Hours  \
0      13  Female        Hansonfort          9th                4.0   
1      17  Female      Theodorefort          7th                5.5   
2      13   Other       Lindseystad         11th                5.8   
3      18  Female      West Anthony         12th                3.1   
4      14   Other  Port Lindsaystad          9th                2.5   
...   ...     ...               ...          ...                ...   
2995   16  Female      New Jennifer         12th                3.9   
2996   13  Female       Richardport          8th                3.6   
2997   14   Other        Rebeccaton          7th                3.2   
2998   17  Female      Ramirezmouth          9th                6.7   
2999   17    Male    Lake Alexander         10th                3.5   

      Sleep_Hours  Academic_Performance  Social_Interactions  Exercise_Hours  \
0             6.1                    78                    5       

In [17]:
# Standardize Categorical values

df_clean["Gender"] = df_clean["Gender"].str.strip().str.title()
df_clean["School_Grade"] = df_clean["School_Grade"].str.replace("th", "", regex=False)
df_clean["Phone_Usage_Purpose"] = df_clean["Phone_Usage_Purpose"].str.strip().str.title()
print(df_clean)

      Age  Gender          Location School_Grade  Daily_Usage_Hours  \
0      13  Female        Hansonfort            9                4.0   
1      17  Female      Theodorefort            7                5.5   
2      13   Other       Lindseystad           11                5.8   
3      18  Female      West Anthony           12                3.1   
4      14   Other  Port Lindsaystad            9                2.5   
...   ...     ...               ...          ...                ...   
2995   16  Female      New Jennifer           12                3.9   
2996   13  Female       Richardport            8                3.6   
2997   14   Other        Rebeccaton            7                3.2   
2998   17  Female      Ramirezmouth            9                6.7   
2999   17    Male    Lake Alexander           10                3.5   

      Sleep_Hours  Academic_Performance  Social_Interactions  Exercise_Hours  \
0             6.1                    78                    5       

In [18]:
# Check for duplicates

duplicates = df_clean.duplicated().sum()
print(f"Duplicate rows: {duplicates}")

Duplicate rows: 0


In [31]:
# Convert integer columns to float

int_cols = df_clean.select_dtypes(include=["int64"]).columns
df_clean[int_cols] = df_clean[int_cols].astype(float)
print(df_clean)

       Age  Gender          Location School_Grade  Daily_Usage_Hours  \
0     13.0  Female        Hansonfort            9                4.0   
1     17.0  Female      Theodorefort            7                5.5   
2     13.0   Other       Lindseystad           11                5.8   
3     18.0  Female      West Anthony           12                3.1   
4     14.0   Other  Port Lindsaystad            9                2.5   
...    ...     ...               ...          ...                ...   
2995  16.0  Female      New Jennifer           12                3.9   
2996  13.0  Female       Richardport            8                3.6   
2997  14.0   Other        Rebeccaton            7                3.2   
2998  17.0  Female      Ramirezmouth            9                6.7   
2999  17.0    Male    Lake Alexander           10                3.5   

      Sleep_Hours  Academic_Performance  Social_Interactions  Exercise_Hours  \
0             6.1                  78.0                

In [32]:
df_clean["Age"] = df_clean["Age"].astype(int)

# Check datatype
print(df_clean["Age"].dtype)
print(df_clean)

int64
      Age  Gender          Location School_Grade  Daily_Usage_Hours  \
0      13  Female        Hansonfort            9                4.0   
1      17  Female      Theodorefort            7                5.5   
2      13   Other       Lindseystad           11                5.8   
3      18  Female      West Anthony           12                3.1   
4      14   Other  Port Lindsaystad            9                2.5   
...   ...     ...               ...          ...                ...   
2995   16  Female      New Jennifer           12                3.9   
2996   13  Female       Richardport            8                3.6   
2997   14   Other        Rebeccaton            7                3.2   
2998   17  Female      Ramirezmouth            9                6.7   
2999   17    Male    Lake Alexander           10                3.5   

      Sleep_Hours  Academic_Performance  Social_Interactions  Exercise_Hours  \
0             6.1                  78.0                  5.0 