# Analysis of Screen Time Impact on Children

### Introduction
This notebook performs a statistical analysis on the impact of increased screen time on children's health. The analysis is divided into three sections:
1.  **Significance of Impact:** Calculating the mean impact across different domains (Physical, Psychological, etc.) and testing if the differences are statistically significant using ANOVA.
2.  **Correlation with Health Impact:** Determining the correlation between daily screen time and a total health impact score using Pearson's correlation.
3.  **Association with Demographics:** Examining the association between screen time and demographic variables like age, gender, and income using the Chi-square test.

The final report is generated and saved as `Statistical_Report.docx` and `Statistical_Report.pdf`.

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import f_oneway, pearsonr, chi2_contingency
from docx import Document
from docx.shared import Pt
from fpdf import FPDF
import warnings

# Suppress potential warnings from FPDF
warnings.filterwarnings('ignore', category=DeprecationWarning)

print("Libraries imported successfully.")

Libraries imported successfully.


### Step 1: Data Loading, Merging, and Cleaning
First, we load the demographic and survey impact datasets. We then merge them into a single DataFrame using the common `CODE` column. To make the data easier to work with, we rename the columns to be more descriptive based on the provided questionnaire documents.

In [2]:
try:
    # Define the path to your Excel file
    excel_file_path = '../data/processed_data/dataset.xlsx'

    # Load the demographic sheet normally
    demographic_df = pd.read_excel(excel_file_path, sheet_name='demographic_data')

    # Load the 'data' sheet, skipping the problematic header row with comments
    # We also specify that there is no header, so pandas uses default integer columns
    impact_df = pd.read_excel(excel_file_path, sheet_name='impact_data')
    
    print("Excel sheets 'demographic_data' and 'impact_data' loaded successfully.")
    
except FileNotFoundError:
    print(f"ERROR: The file '{excel_file_path}' was not found. Please ensure it is in the same folder as your notebook.")
except ValueError as e:
    print(f"ERROR: A sheet name was not found. Please ensure the sheets are named 'demographic_data' and 'impact_data' (with a space).")
    print(f"Details: {e}")

Excel sheets 'demographic_data' and 'impact_data' loaded successfully.


In [3]:
impact_df.columns

Index(['CODE', 'Q1', 'Q2', 'Q3', 'Q4', 'Q1.1', 'Q2.1', 'Q3.1', 'Q4.1', 'Q1.2',
       'Q2.2', 'Q3.2', 'Q4.2', 'Q1.3', 'Q2.3', 'Q3.3', 'Q4.3', 'Q1.4', 'Q2.4',
       'Q3.4', 'Q4.4'],
      dtype='object')

In [4]:
# Manually define the 21 correct column names to match your DataFrame
impact_column_names = [
    'CODE',
    'Physical_1', 'Physical_2', 'Physical_3', 'Physical_4',
    'Psychological_1', 'Psychological_2', 'Psychological_3', 'Psychological_4',
    'Academic_1', 'Academic_2', 'Academic_3', 'Academic_4',
    'Social_1', 'Social_2', 'Social_3', 'Social_4',
    'Habit_1', 'Habit_2', 'Habit_3', 'Habit_4'
]
# Assign the new names directly to the DataFrame's columns
impact_df.columns = impact_column_names

# Merge the two DataFrames into a single one for analysis
df = pd.merge(demographic_df, impact_df, on='CODE', suffixes=('_demo', ''))

print("Impact DF columns successfully renamed and DataFrames merged.")
# To verify the new column names
print(df.columns)

Impact DF columns successfully renamed and DataFrames merged.
Index(['CODE', 'Q1', 'Q2', 'Q3', 'Q4', 'Q5', 'Q6', 'Q7', 'Physical_1',
       'Physical_2', 'Physical_3', 'Physical_4', 'Psychological_1',
       'Psychological_2', 'Psychological_3', 'Psychological_4', 'Academic_1',
       'Academic_2', 'Academic_3', 'Academic_4', 'Social_1', 'Social_2',
       'Social_3', 'Social_4', 'Habit_1', 'Habit_2', 'Habit_3', 'Habit_4'],
      dtype='object')


In [5]:
df.head()

Unnamed: 0,CODE,Q1,Q2,Q3,Q4,Q5,Q6,Q7,Physical_1,Physical_2,...,Academic_3,Academic_4,Social_1,Social_2,Social_3,Social_4,Habit_1,Habit_2,Habit_3,Habit_4
0,1,2,1,3,2,2,1,4,2,2,...,2,1,1,2,1.0,1,1,2,1,2
1,2,1,1,4,1,4,1,4,4,4,...,1,2,2,1,2.0,1,1,4,1,2
2,3,2,1,3,1,1,2,3,4,2,...,1,1,1,2,1.0,1,1,2,1,1
3,4,1,2,1,1,1,1,4,3,2,...,1,2,3,4,2.0,4,1,5,1,4
4,5,2,2,3,2,1,2,2,3,2,...,2,4,2,3,4.0,4,2,3,2,4


In [6]:
demographic_df.columns

Index(['CODE', 'Q1', 'Q2', 'Q3', 'Q4', 'Q5', 'Q6', 'Q7'], dtype='object')

In [7]:
demographic_rename_map = {
    'Q1': 'Age',
    'Q2': 'Gender',
    'Q3': 'Annual_Income',
    'Q4': 'Family_Type',
    'Q5': 'Devices_Owned_Raw',
    'Q6': 'Daily_Screen_Time',
    'Q7': 'Daily_Study_Hours'
}

# Apply the new column names to the DataFrame
df.rename(columns=demographic_rename_map, inplace=True)
print(df.columns)

Index(['CODE', 'Age', 'Gender', 'Annual_Income', 'Family_Type',
       'Devices_Owned_Raw', 'Daily_Screen_Time', 'Daily_Study_Hours',
       'Physical_1', 'Physical_2', 'Physical_3', 'Physical_4',
       'Psychological_1', 'Psychological_2', 'Psychological_3',
       'Psychological_4', 'Academic_1', 'Academic_2', 'Academic_3',
       'Academic_4', 'Social_1', 'Social_2', 'Social_3', 'Social_4', 'Habit_1',
       'Habit_2', 'Habit_3', 'Habit_4'],
      dtype='object')


In [8]:
df.sample(10)

Unnamed: 0,CODE,Age,Gender,Annual_Income,Family_Type,Devices_Owned_Raw,Daily_Screen_Time,Daily_Study_Hours,Physical_1,Physical_2,...,Academic_3,Academic_4,Social_1,Social_2,Social_3,Social_4,Habit_1,Habit_2,Habit_3,Habit_4
92,93,2,2,2,1,1,2,3,4,2,...,3,5,4,1,1.0,3,2,3,1,4
94,95,2,2,2,3,2,2,4,1,2,...,2,4,4,2,4.0,2,1,2,2,1
4,5,2,2,3,2,1,2,2,3,2,...,2,4,2,3,4.0,4,2,3,2,4
14,15,2,2,3,2,2,2,3,2,2,...,1,1,1,1,1.0,1,1,1,2,1
38,39,2,1,2,1,1,3,3,4,2,...,1,1,2,2,2.0,2,2,2,2,3
36,37,2,1,4,2,1,3,2,3,4,...,4,4,2,1,2.0,3,4,4,2,2
66,67,3,2,3,1,1,2,3,3,4,...,1,5,5,3,2.0,1,5,3,5,5
60,61,2,1,1,2,1,2,4,4,3,...,2,4,2,1,2.0,2,2,2,4,2
27,28,2,1,1,1,1,1,2,3,2,...,4,2,2,4,3.0,2,2,4,4,2
69,70,2,2,3,1,1,2,4,3,1,...,4,1,1,1,1.0,1,2,3,1,1


In [9]:
# Export the final DataFrame to a CSV file (uncomment below lines of code to export)
# df.to_csv('../data/processed_data/final_merged_dataset.csv', index=False)

print("Dataset exported successfully to 'final_merged_dataset.csv'")

Dataset exported successfully to 'final_merged_dataset.csv'
