# Visualization Project

Data visualization plays a crucial role in getting meaningfull insights. This project is part of a coursework in data visualizaiton and aims to explore techniques for understanding data needs and designing effective visual representations.

## The Dataset

For this project, I have selected the "Sleep Health and Lifestyle Dataset" from Kaggle. Although the data is synthetic, I chose it because it includes a diverse set of variables and allows for the exploration of multiple relationships between them. The dataset contains key attributes such as occupation, sleep duration, and sleep quality, along with other well-being indicators like physical activity level, stress level, blood pressure, and sleep disorders.
My objective is to explore the following research questions:

- How does occupation influence sleep duration?
- Is sleep quality correlated with stress levels?
- Do sleep disorders impact both sleep duration and quality?
- How do sleep duration and quality change over time? Is there a gender-based difference?
- Does physical activity correlate with better sleep quality?

While it would be ideal to analyze real-world data, I encountered challenges in finding a sufficiently updated dataset that included a wide range of relevant variables while remaining manageable in size. A comprehensive and up-to-date dataset on sleep health could significantly benefit sleep medicine research. At the same time, this structured dataset serves as a valuable methodological example for conducting an in-depth analysis in a real-world study.

Also, I analyzed a Kaggle notebook available at: https://www.kaggle.com/code/ratchakritbootkong/sleep-health-and-lifestyle-dataset-analyze . The analysis demonstrates a strong use of the grammar of graphics, with well-structured visualizations that feature clear axes and minimal visual clutter.
The study explores sleep duration by gender, followed by a line plot analyzing sleep duration across different age groups. An alternative approach could be to consolidate these insights into a single graph, using color to encode gender, which might offer a more comprehensive view.
Additionally, the analysis examines the relationship between physical activity and sleep quality using a bar plot to display average values. While this effectively summarizes the data, a scatter plot could potentially reveal clusters or patterns that are not as apparent in a bar chart.
One of the standout visualizations in the analysis is a correlation between age and sleep disorders, which is both clear and insightful. Another valuable visualization is a box plot showing the relationship between stress levels and sleep duration, which effectively captures distribution patterns and variations.

## Libraries

This section provides an overview of the libraries used in the project.

In [2]:
import pandas as pd

## Import the Dataframe

In [None]:

sleep_dataset = pd.read_csv('https://storage.googleapis.com/kagglesdsdata/datasets/3321433/6491929/Sleep_health_and_lifestyle_dataset.csv?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20250305%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20250305T230343Z&X-Goog-Expires=259200&X-Goog-SignedHeaders=host&X-Goog-Signature=2e79fe7b15cd00025c0e0cc45dd68da438c706974f176334b03826549388ae8d13861be9c24d73b51b7b204cceb7ab9b9af7cc4d5e7613dc52fe8089577a6e0bd04f69aad51d52686a4a436e652b03990c0fcc3c4a3f35e2d56072bcc252697f9ff0ac2e373da1b9b052303ed44bb9f3f9bb3cb71b9631fe7184b60e6d41f247a320d9027355cb3bbedb264a5638c1259272e65f742c71dc0ceb0f0b0e98470447bbf7ec49061a74fae08cb35540799b12ef264e525e186c7284eefd0f2c95a750890e1561b35a96a8d39794fa77d8f0c9a6c2f37acf5ad5e89a97258c9252e7f40a37ad6c67d32f88ed12d908b15c2d82f5c39e0978d2aac75850d1f683a379')


Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
...,...,...,...,...,...,...,...,...,...,...,...,...,...
369,370,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
370,371,Female,59,Nurse,8.0,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
371,372,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
372,373,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea


## Exploratory Data Analysis

In [4]:
sleep_dataset.sample(10)

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
146,147,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,Insomnia
357,358,Female,58,Nurse,8.0,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
55,56,Male,32,Doctor,6.0,6,30,8,Normal,125/80,72,5000,
154,155,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,
369,370,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
271,272,Female,49,Nurse,6.1,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
39,40,Male,31,Doctor,7.6,7,75,6,Normal,120/80,70,8000,
285,286,Female,50,Nurse,6.0,6,90,8,Overweight,140/95,75,10000,Sleep Apnea
65,66,Male,32,Doctor,6.2,6,30,8,Normal,125/80,72,5000,
243,244,Female,44,Teacher,6.5,7,45,4,Overweight,135/90,65,6000,Insomnia


In [7]:
#Description of the dataset
sleep_dataset.info()

sleep_dataset.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 374 entries, 0 to 373
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Person ID                374 non-null    int64  
 1   Gender                   374 non-null    object 
 2   Age                      374 non-null    int64  
 3   Occupation               374 non-null    object 
 4   Sleep Duration           374 non-null    float64
 5   Quality of Sleep         374 non-null    int64  
 6   Physical Activity Level  374 non-null    int64  
 7   Stress Level             374 non-null    int64  
 8   BMI Category             374 non-null    object 
 9   Blood Pressure           374 non-null    object 
 10  Heart Rate               374 non-null    int64  
 11  Daily Steps              374 non-null    int64  
 12  Sleep Disorder           155 non-null    object 
dtypes: float64(1), int64(7), object(5)
memory usage: 38.1+ KB


Unnamed: 0,Person ID,Age,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,Heart Rate,Daily Steps
count,374.0,374.0,374.0,374.0,374.0,374.0,374.0,374.0
mean,187.5,42.184492,7.132086,7.312834,59.171123,5.385027,70.165775,6816.84492
std,108.108742,8.673133,0.795657,1.196956,20.830804,1.774526,4.135676,1617.915679
min,1.0,27.0,5.8,4.0,30.0,3.0,65.0,3000.0
25%,94.25,35.25,6.4,6.0,45.0,4.0,68.0,5600.0
50%,187.5,43.0,7.2,7.0,60.0,5.0,70.0,7000.0
75%,280.75,50.0,7.8,8.0,75.0,7.0,72.0,8000.0
max,374.0,59.0,8.5,9.0,90.0,8.0,86.0,10000.0
