<a href="https://colab.research.google.com/github/Ulnika/Sleep-Health-and-Lifestyle/blob/main/Sleep_Health_and_Lifestyle_part2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Capstone 2. Sleep Health and Lifestyle


Goal: visualize data for two genders in each occupation and show how different factors effect sleep quality and sleep disorders.

Data: Survey data of 374 people on sleep health and lifestyle. Useful for understanding sleep health.

https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset

Previously I demonstrated relationships between average sleep duration, average stress level, average physical activite and quality of sleep across various occupations.

## Importing and Displaying Data

In [49]:
import pandas as pd
data = pd.read_csv('Sleep_health_and_lifestyle_dataset.csv')
data = data.fillna('NA')
data.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea


In [50]:
data.groupby(['Occupation']).size().sort_values(ascending=False)

Unnamed: 0_level_0,0
Occupation,Unnamed: 1_level_1
Nurse,73
Doctor,71
Engineer,63
Lawyer,47
Teacher,40
Accountant,37
Salesperson,32
Scientist,4
Software Engineer,4
Sales Representative,2


## Assumption to exclude underrepresented groups
Previously, I demonstrated that some occupations are underrepresented in the provided dataset, such as:

scientists,
software engineers,
sales representatives,
managers.
For further analysis, I'll create a dataset based on the original one but excluding the data of listed occupations.

In [51]:
data = data[~data['Occupation'].isin(['Scientist', 'Software Engineer', 'Sales Representative', 'Manager'])]

print(data['Occupation'].value_counts())

Occupation
Nurse          73
Doctor         71
Engineer       63
Lawyer         47
Teacher        40
Accountant     37
Salesperson    32
Name: count, dtype: int64


In [52]:
data = data.drop(columns=['Person ID'])
data.head()

Unnamed: 0,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
1,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
6,Male,29,Teacher,6.3,6,40,7,Obese,140/90,82,3500,Insomnia
7,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
8,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,


## Visualization of Quality of Sleep by Occupation for Male and Female participants

In [53]:
data['BMI Category'].unique()

array(['Normal', 'Obese', 'Normal Weight', 'Overweight'], dtype=object)

In [54]:
data['Sleep Disorder'].unique()

array(['NA', 'Insomnia', 'Sleep Apnea'], dtype=object)

In [58]:
import ipywidgets as widgets
from ipywidgets import interact
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource, ColorBar, LinearColorMapper, HoverTool, Label, BasicTicker, Range1d, FactorRange
from bokeh.io import push_notebook
from bokeh.layouts import row
from bokeh.colors import RGB

output_notebook()

pd.set_option('future.no_silent_downcasting', True)

# Transform BMI Category to numeric values:
# 0 for 'Normal' and 'Normal Weight', 1 for 'Overweight', 2 for 'Obese'
data["BMI Category"] = data["BMI Category"].replace({
    "Normal": 0,
    "Normal Weight": 0,
    "Overweight": 1,
    "Obese": 2
})

# Aggregation
def process_data(metric):
    grouped = data.groupby(["Occupation", metric, "Gender"]).size().unstack(fill_value=0)
    grouped["Total"] = grouped.sum(axis=1)
    grouped["Male_Ratio"] = grouped["Male"] / grouped["Total"]
    grouped["Female_Ratio"] = grouped["Female"] / grouped["Total"]
    grouped["Size"] = grouped["Total"].clip(lower=7)
    grouped = grouped.reset_index()
    grouped.rename(columns={metric: "Metric"}, inplace=True)

    def compute_rgb(male_ratio, female_ratio):
        r = int(255 * female_ratio)
        g = 0
        b = int(255 * male_ratio)
        return RGB(r, g, b)

    grouped["Color"] = grouped.apply(lambda row: compute_rgb(row["Male_Ratio"], row["Female_Ratio"]).to_hex(), axis=1)
    return grouped

# Process Sleep Disorder data for visualization
def process_sleep_disorder():
    disorder_mapping = {'NA': 'No Disorder', 'Insomnia': 'Insomnia', 'Sleep Apnea': 'Sleep Apnea'}
    data["Sleep Disorder Category"] = data["Sleep Disorder"].map(disorder_mapping)

    grouped = data.groupby(["Occupation", "Sleep Disorder Category", "Gender"]).size().unstack(fill_value=0)
    grouped["Total"] = grouped.sum(axis=1)
    grouped["Male_Ratio"] = grouped["Male"] / grouped["Total"]
    grouped["Female_Ratio"] = grouped["Female"] / grouped["Total"]
    grouped["Size"] = grouped["Total"].clip(lower=7)
    grouped = grouped.reset_index()

    def compute_rgb(male_ratio, female_ratio):
        r = int(255 * female_ratio)
        g = 0
        b = int(255 * male_ratio)
        return RGB(r, g, b)

    grouped["Color"] = grouped.apply(lambda row: compute_rgb(row["Male_Ratio"], row["Female_Ratio"]).to_hex(), axis=1)

    return ColumnDataSource(grouped)

# Create figures
source = ColumnDataSource(process_data("Quality of Sleep"))
p1 = figure(x_range=data["Occupation"].unique(), x_axis_label="Occupation", y_axis_label="Quality of Sleep", width=500, height=500)

p2 = figure(x_range=data["Occupation"].unique(), y_range=['No Disorder', 'Insomnia', 'Sleep Apnea'], title = "Sleep Disorder by Occupation", x_axis_label="Occupation", y_axis_label="Sleep Disorder", width=600, height=500)

# Define color mapper for color bar
palette = [RGB(int(255 * (1 - i / 100)), 0, int(255 * (i / 100))).to_hex() for i in range(101)]
mapper = LinearColorMapper(palette=palette, low=0, high=100)
color_bar = ColorBar(color_mapper=mapper, location=(0, 0), ticker=BasicTicker(), width=30, height=400, label_standoff=10)
p2.add_layout(color_bar, 'right')

# Add labels
p2.add_layout(Label(x=325, y=400 , x_units='screen', y_units='screen', text='100% Male', text_color='blue', text_font_size='9pt'))
p2.add_layout(Label(x=315, y=5, x_units='screen', y_units='screen', text='100% Female', text_color='red', text_font_size='9pt'))

# Add scatter plot
p1.scatter(x="Occupation", y="Metric", size="Size", color="Color", source=source, alpha=1)

# Create data source for Sleep Disorder plot
source_p2 = process_sleep_disorder()
# Add scatter plot to p2
p2.scatter(x="Occupation", y="Sleep Disorder Category", size="Size", color="Color", source=source_p2, alpha=1)

# Define y-axis ranges
y_ranges = {
    "Quality of Sleep": Range1d(0, 11),
    "Stress Level": Range1d(0, 11),
    "Sleep Duration": Range1d(4, 10),
    "Physical Activity Level": Range1d(0, 100),
    "Daily Steps": Range1d(0, 12000),
    "Heart Rate": Range1d(40, 100),
    "BMI Category": Range1d(-1, 3)  # Range for BMI Category (0 for 'Normal' and 'Normal Weight', 1 for 'Overweight', 2 for 'Obese')
}

def update_plot(metric):

    # Process data based on the selected metric
    new_data = process_data(metric)  # Process for the selected metric
    source.data = new_data.to_dict(orient='list')  # Update data source

    # Set the y_range based on predefined ranges for continuous metrics
    p1.y_range = y_ranges.get(metric, Range1d(0, 11))  # Continuous range based on metric
    p1.yaxis.axis_label = metric  # Update y-axis label for continuous axis

    # Clear previous scatter plot and add new one
    p1.renderers = []  # Clear existing renderers
    p1.scatter(x="Occupation", y="Metric", size="Size", color="Color", source=source, alpha=1)  # Add new scatter plot

    # Update the plot title dynamically based on the metric
    p1.title.text = f"{metric} by Occupation"

    # Tooltips for the hover tool
    tooltips = [
        ('Percentage of Female', '@Female_Ratio{0.0%}'),
        ('Percentage of Male', '@Male_Ratio{0.0%}'),
        ('Total', '@Total'),
        (f'{metric}', f'@{{Metric}}')
    ]

    hover = HoverTool(tooltips=tooltips)
    p1.tools = [hover]

    push_notebook(handle=handle)  # Ensure the plot updates in the notebook

# # Interactive widget (not available om GitHub)
# # Interactive widget for selecting a health metric.
# # Updates the Bokeh scatter plot by occupation and gender ratio
# # based on the selected metric (e.g., Sleep Duration, BMI, Heart Rate).

# interact(update_plot, metric=widgets.Dropdown(options=["Quality of Sleep", "Stress Level", "Sleep Duration", "Physical Activity Level", "Daily Steps", "Heart Rate", "BMI Category"], description="Choose parameter", style={'description_width': 'initial'}))

# Show initial plots
handle = show(row(p1, p2), notebook_handle=True)



# Conclusions

Female participants have better quality of sleep and lower stress level in most Occupation groups.

Physical activily level, Daily steps and heart rate are highre for Male participants.

BMI Category is the most important parameter for Sleep Disorder.