# DSA210 Term Project – Analysis of Work Hours, Happiness, and Mental Health

This notebook documents the full process of analyzing the relationship between working hours, happiness, and mental health. The analysis is conducted as part of the DSA210 Term Project.

The following data science workflow is followed throughout the study:
1. **Loading datasets**
2. **Initial exploration and data cleaning**
3. **Exploratory Data Analysis (EDA)**
4. **Statistical hypothesis testing**
5. **Visualization of findings**
6. **Discussion of results and limitations**

### Data Sources:
The datasets are obtained from *Our World in Data* and include:
- `Annual Working Hours`: average annual working hours per person employed.
- `Mental Health Burden`: DALYs (Disability-Adjusted Life Years) related to mental disorders.
- `Happiness`: survey-based data indicating the share of people who describe themselves as happy.
- `Life Satisfaction`: survey data on self-reported life satisfaction levels.

The objective of this analysis is to examine whether excessive working hours are associated with lower levels of happiness and increased mental health challenges.


In [1]:
# Importing required libraries
import pandas as pd

# Defining base path for the data files
base_path = r'C:\Users\7981\Desktop\DSA201\DSA210-Term-Project-Spring2025\data'

# Loading datasets
df_work = pd.read_csv(base_path + r'\annual-working-hours-per-person-employed.csv')
df_mental = pd.read_csv(base_path + r'\mentaldisorders.csv')
df_happy = pd.read_csv(base_path + r'\share-of-people-who-say-they-are-happy.csv')
df_satisfy = pd.read_csv(base_path + r'\share-of-people-who-say-they-are-satisfied.csv')


## 1. Preview of Each Dataset
Let's look at the first few rows of each dataset to understand their structure.


In [2]:
# Previewing datasets

print("Working Hours Data:")
display(df_work.head())

print("Mental Health Disorders Data:")
display(df_mental.head())

print("Happiness Data:")
display(df_happy.head())

print("Life Satisfaction Data:")
display(df_satisfy.head())


Working Hours Data:


Unnamed: 0,Entity,Code,Year,Subject:Average hours worked per person employed - PDB_LV
0,Australia,AUS,1970,1864.935951
1,Australia,AUS,1971,1847.040484
2,Australia,AUS,1972,1827.516925
3,Australia,AUS,1973,1814.705069
4,Australia,AUS,1974,1831.279122


Mental Health Disorders Data:


Unnamed: 0,measure_id,measure_name,location_id,location_name,sex_id,sex_name,age_id,age_name,cause_id,cause_name,metric_id,metric_name,year,val,upper,lower
0,1,Deaths,55,Slovenia,3,Both,22,All ages,558,Mental disorders,1,Number,2021,5.638814e-05,6.509461e-05,2.776283e-05
1,1,Deaths,55,Slovenia,3,Both,22,All ages,558,Mental disorders,2,Percent,2021,2.439168e-09,2.801477e-09,1.208993e-09
2,1,Deaths,55,Slovenia,3,Both,22,All ages,558,Mental disorders,3,Rate,2021,2.724524e-06,3.145197e-06,1.341425e-06
3,1,Deaths,55,Slovenia,3,Both,22,All ages,572,Eating disorders,1,Number,2021,5.638814e-05,6.509461e-05,2.776283e-05
4,1,Deaths,55,Slovenia,3,Both,22,All ages,572,Eating disorders,2,Percent,2021,2.439168e-09,2.801477e-09,1.208993e-09


Happiness Data:


Unnamed: 0,Entity,Code,Year,Happiness: Happy (aggregate),821407-annotations
0,Albania,ALB,1998,33.43343,
1,Albania,ALB,2004,58.8,
2,Albania,ALB,2010,66.85212,
3,Albania,ALB,2022,73.9271,
4,Algeria,DZA,2004,80.73323,


Life Satisfaction Data:


Unnamed: 0,Entity,Code,Year,Share of people who are happy (Eurobarometer 2017)
0,Albania,ALB,2014,58.685448
1,Albania,ALB,2015,62.037964
2,Albania,ALB,2016,59.073544
3,Austria,AUT,1996,93.234474
4,Austria,AUT,1997,83.727531


## 2. Dataset Overview: Dimensions, Column Names, and Missing Values

This section provides a structural overview of the datasets:
- Number of rows and columns in each dataset
- Column names to understand available variables
- Presence of missing values that may require further cleaning


In [3]:
# Checking dataset shapes
print("Dataset Shapes:")
print("Working Hours:", df_work.shape)
print("Mental Disorders:", df_mental.shape)
print("Happiness:", df_happy.shape)
print("Life Satisfaction:", df_satisfy.shape)

# Checking column names
print("\nColumn Names:")
print("Working Hours:", df_work.columns.tolist())
print("Mental Disorders:", df_mental.columns.tolist())
print("Happiness:", df_happy.columns.tolist())
print("Life Satisfaction:", df_satisfy.columns.tolist())

# Checking missing values
print("\nMissing Value Counts:")
print("Working Hours:\n", df_work.isnull().sum())
print("\nMental Disorders:\n", df_mental.isnull().sum())
print("\nHappiness:\n", df_happy.isnull().sum())
print("\nLife Satisfaction:\n", df_satisfy.isnull().sum())


Dataset Shapes:
Working Hours: (1568, 4)
Mental Disorders: (31356, 16)
Happiness: (423, 5)
Life Satisfaction: (809, 4)

Column Names:
Working Hours: ['Entity', 'Code', 'Year', 'Subject:Average hours worked per person employed - PDB_LV']
Mental Disorders: ['measure_id', 'measure_name', 'location_id', 'location_name', 'sex_id', 'sex_name', 'age_id', 'age_name', 'cause_id', 'cause_name', 'metric_id', 'metric_name', 'year', 'val', 'upper', 'lower']
Happiness: ['Entity', 'Code', 'Year', 'Happiness: Happy (aggregate)', '821407-annotations']
Life Satisfaction: ['Entity', 'Code', 'Year', 'Share of people who are happy (Eurobarometer 2017)']

Missing Value Counts:
Working Hours:
 Entity                                                         0
Code                                                         155
Year                                                           0
Subject:Average hours worked per person employed - PDB_LV      0
dtype: int64

Mental Disorders:
 measure_id       0
measure_

### 3.1 Happiness Dataset Cleaning

This section focuses on cleaning the happiness dataset.

- The column `821407-annotations` contains no usable information and will be removed.
- The main value column `Happiness: Happy (aggregate)` will be renamed to `Happiness` for simplicity.
- The column names will be standardized to allow easier merging and analysis.


In [None]:
# Copying original to avoid modifying raw data
df_happy_cleaned = df_happy.copy()

# Dropping empty annotation column
df_happy_cleaned.drop(columns=['821407-annotations'], inplace=True)

# Renaming column for simplicity
df_happy_cleaned.rename(columns={
    'Happiness: Happy (aggregate)': 'Happiness'
}, inplace=True)

# Save the cleaned happiness dataset
df_happy_cleaned.to_csv('cleaned_data/happiness_cleaned.csv', index=False)


### 3.2 Life Satisfaction Dataset Cleaning

This section handles the cleaning of the life satisfaction dataset.

- The dataset contains the percentage of people who say they are satisfied with their life.
- The column `Share of people who are happy (Eurobarometer 2017)` will be renamed to `LifeSatisfaction`.
- The column `Code` contains missing values but is not essential for our analysis, so it will be kept as is.


In [None]:
# Copy original
df_satisfy_cleaned = df_satisfy.copy()

# Rename column for simplicity
df_satisfy_cleaned.rename(columns={
    'Share of people who are happy (Eurobarometer 2017)': 'LifeSatisfaction'
}, inplace=True)

# Save the cleaned version (optional but recommended)
df_satisfy_cleaned.to_csv('cleaned_data/life_satisfaction_cleaned.csv', index=False)
