# Visualization Project

Data visualization plays a crucial role in getting meaningfull insights. This project is part of a coursework in data visualizaiton and aims to explore techniques for understanding data needs and designing effective visual representations.

## The Dataset

For this project, I have selected the "Sleep Health and Lifestyle Dataset" from Kaggle. Although the data is synthetic, I chose it because it includes a diverse set of variables and allows for the exploration of multiple relationships between them. The dataset contains key attributes such as occupation, sleep duration, and sleep quality, along with other well-being indicators like physical activity level, stress level, blood pressure, and sleep disorders.
My objective is to explore the following research questions:

- How does occupation influence sleep duration?
- Is sleep quality correlated with stress levels?
- Do sleep disorders impact both sleep duration and quality?
- How do sleep duration and quality change over time? Is there a gender-based difference?
- Does physical activity correlate with better sleep quality?

While it would be ideal to analyze real-world data, I encountered challenges in finding a sufficiently updated dataset that included a wide range of relevant variables while remaining manageable in size. A comprehensive and up-to-date dataset on sleep health could significantly benefit sleep medicine research. At the same time, this structured dataset serves as a valuable methodological example for conducting an in-depth analysis in a real-world study.

Also, I analyzed a Kaggle notebook available at: https://www.kaggle.com/code/ratchakritbootkong/sleep-health-and-lifestyle-dataset-analyze . The analysis demonstrates a strong use of the grammar of graphics, with well-structured visualizations that feature clear axes and minimal visual clutter.
The study explores sleep duration by gender, followed by a line plot analyzing sleep duration across different age groups. An alternative approach could be to consolidate these insights into a single graph, using color to encode gender, which might offer a more comprehensive view.
Additionally, the analysis examines the relationship between physical activity and sleep quality using a bar plot to display average values. While this effectively summarizes the data, a scatter plot could potentially reveal clusters or patterns that are not as apparent in a bar chart.
One of the standout visualizations in the analysis is a correlation between age and sleep disorders, which is both clear and insightful. Another valuable visualization is a box plot showing the relationship between stress levels and sleep duration, which effectively captures distribution patterns and variations.

## Tasks and Sketch

For this section, I aim to define two key tasks that will guide the development of this project. These tasks are designed to address both the goals and the means of the project.


- Why analyze the relationship between sleep quality and various factors?

    Examining correlations can provide insights into the factors that influence sleep quality. Understanding these relationships can help improve overall sleep health and determine wheter these improvements contribute to a healthier lifestyle.
    
- How to analyze correlations?

    Initially, I'll compute correlation coefficients to identify potential relationships between variables. If a multicollinearity is detected, I'll address it by removing redundant variables or applying dimensionality reduction techniques.

## Libraries

This section provides an overview of the libraries used in the project.

In [18]:
import pandas as pd

## Import the Dataframe



In [19]:
sleep_dataset = pd.read_csv('Sleep_health_and_lifestyle_dataset_Synthetic.csv')


## Exploratory Data Analysis



In [20]:
sleep_dataset.sample(10)

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
64,65,Male,32,Doctor,6.2,6,30,8,Normal,125/80,72,5000,
53,54,Male,32,Doctor,7.6,7,75,6,Normal,120/80,70,8000,
60,61,Male,32,Doctor,6.0,6,30,8,Normal,125/80,72,5000,
108,109,Male,37,Engineer,7.8,8,70,4,Normal Weight,120/80,68,7000,
152,153,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,
224,225,Female,44,Teacher,6.6,7,45,4,Overweight,135/90,65,6000,Insomnia
101,102,Female,36,Teacher,7.2,8,60,4,Normal,115/75,68,7000,
156,157,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,
340,341,Female,55,Nurse,8.1,9,75,4,Overweight,140/95,72,5000,Sleep Apnea
300,301,Female,51,Engineer,8.5,9,30,3,Normal,125/80,65,5000,


In [21]:
#Description of the dataset
sleep_dataset.info()

sleep_dataset.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 374 entries, 0 to 373
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Person ID                374 non-null    int64  
 1   Gender                   374 non-null    object 
 2   Age                      374 non-null    int64  
 3   Occupation               374 non-null    object 
 4   Sleep Duration           374 non-null    float64
 5   Quality of Sleep         374 non-null    int64  
 6   Physical Activity Level  374 non-null    int64  
 7   Stress Level             374 non-null    int64  
 8   BMI Category             374 non-null    object 
 9   Blood Pressure           374 non-null    object 
 10  Heart Rate               374 non-null    int64  
 11  Daily Steps              374 non-null    int64  
 12  Sleep Disorder           155 non-null    object 
dtypes: float64(1), int64(7), object(5)
memory usage: 38.1+ KB


Unnamed: 0,Person ID,Age,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,Heart Rate,Daily Steps
count,374.0,374.0,374.0,374.0,374.0,374.0,374.0,374.0
mean,187.5,42.184492,7.132086,7.312834,59.171123,5.385027,70.165775,6816.84492
std,108.108742,8.673133,0.795657,1.196956,20.830804,1.774526,4.135676,1617.915679
min,1.0,27.0,5.8,4.0,30.0,3.0,65.0,3000.0
25%,94.25,35.25,6.4,6.0,45.0,4.0,68.0,5600.0
50%,187.5,43.0,7.2,7.0,60.0,5.0,70.0,7000.0
75%,280.75,50.0,7.8,8.0,75.0,7.0,72.0,8000.0
max,374.0,59.0,8.5,9.0,90.0,8.0,86.0,10000.0


## Visualization evaluation

This section outlines our approach to evaluating the current project. Our key research question is: Is sleep quality directly correlated with well-known health variables such as BMI or blood pressure?. Secondary questions include: Are there any factors that directly correlate with poor sleep quality?

To explore this, we plan to recruit doctors from various specialities and employ a think-aloud protocol to assess their insights. We will evaluate the quantity, depth, and time to insights. which will help determine whether our visualization effectively communicates key findingd. Specifically, we aim to see whether relationships(or lack thereof) between sleep quality and health variables are clearly conveyed. Addiotionally, this evaluation will highlight potential areas for design improvement.

If our visualization is successfull, we expecto to generate in-depth insights in a short period. More importantly, we hope to inspire doctors to further research on sleep medicine, ultimately benefiting their patients.
