<a href="https://colab.research.google.com/github/Cynthia550/Group_project/blob/main/group_project_proposal.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploring Causes of Insomnia: A Study on Sleep and Lifestyle Factors

## Introduction:
Insomnia is a sleep disorder where individuals experience difficulties falling asleep and/or staying asleep. Stress, anxiety, or a poor sleeping environment can all be potential reasons for one to have insomnia. When someone has insomnia, they will report lower durations of sleep and a decreased quality of sleep for long period of time. Additionally, due to the insomnia-induced lack of sleep, the person will be more likely to have higher cortisol (stress) levels, higher BMI, increased heart rate, and increased blood pressure. These will be factors we analyze to peedict whether or not the person has insomnia.

The purpose for this research is to research the possible reasons and find a pattern to determine which factors impacts a diagnosis of insomnia for an individual. The dataset used is the "Sleep Health and Lifestyle Dataset" imported from [kaggle.com](https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset). The dataset includes plenty of possible variables such as: gender, age, occupation, sleep duration (hours), quality of sleep (scale from 1(bad) - 10(good)), physical activity level (minutes/day), stress level (scale 1 - 10), BMI category, blood pressure (systolic/diastolic), heart rate (bpm), daily steps, and the type of sleep disorder they have (none, insomnia, sleep apnea). Observations will only be conducted on individuals with insomnia, and patterns will be observed to identify similarities for those with insomnia, to ultimately conclude the direct possible reasons that induce insomnia.

## Preliminary exploratory data analysis:

In [1]:
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder, StandardScaler
from sklearn.model_selection import train_test_split


# Read data from web
url="https://raw.githubusercontent.com/Cynthia550/Group_project/main/data/Sleep_health_and_lifestyle_dataset.csv"
sleep_df=pd.read_csv(url)
sleep_df.columns = sleep_df.columns.str.replace(' ', '_').str.lower()

# Split blood_pressure into systolic and diastolic
sleep_df[['systolic_bp', 'diastolic_bp']] = sleep_df['blood_pressure'].str.split('/', expand=True).astype(int)

# Drop unused columns
sleep_df = sleep_df[sleep_df['sleep_disorder'] != 'Sleep Apnea'].drop(columns=['person_id', 'gender', 'occupation', 'daily_steps', 'quality_of_sleep', 'blood_pressure'])

# Preprocess bmi_category with OrdinalEncoder
encoder = OrdinalEncoder(categories=[['Normal', 'Normal Weight', 'Overweight', 'Obese']])
sleep_df['bmi_category_encoded'] = encoder.fit_transform(sleep_df[['bmi_category']])

# Preprocess numeric features with StandardScaler
numeric_cols = ['age', 'sleep_duration', 'physical_activity_level', 'stress_level', 'heart_rate', 'systolic_bp', 'diastolic_bp']
scaler = StandardScaler()
sleep_df[numeric_cols] = scaler.fit_transform(sleep_df[numeric_cols])

# Split data to training data and testing data
train_df, test_df = train_test_split(sleep_df, test_size=0.2, random_state=42)
train_df

Unnamed: 0,age,sleep_duration,physical_activity_level,stress_level,bmi_category,heart_rate,sleep_disorder,systolic_bp,diastolic_bp,bmi_category_encoded
69,-0.971814,-1.295640,-0.258612,0.433717,Overweight,1.915063,,0.294648,0.541350,2.0
21,-1.376585,0.732080,1.020618,0.433717,Normal,0.175343,,-0.962516,-0.551690,0.0
232,0.512345,-0.754915,-0.514458,-0.824912,Overweight,-1.274423,Insomnia,1.394666,1.634390,2.0
236,0.512345,-1.025277,-0.514458,1.063031,Overweight,0.755250,Insomnia,0.608939,0.541350,2.0
198,0.377421,-0.890096,-0.514458,1.063031,Overweight,0.755250,Insomnia,0.608939,0.541350,2.0
...,...,...,...,...,...,...,...,...,...,...
203,0.377421,-0.349371,-0.412119,1.063031,Normal Weight,-0.114610,,-1.433953,-1.426122,1.0
77,-0.971814,-1.566002,-1.281995,1.692346,Normal,0.755250,,-0.176789,-0.551690,0.0
117,-0.432120,0.056173,0.253080,-0.824912,Normal,-0.404563,,-1.748244,-1.644730,0.0
318,1.726657,1.678349,-1.281995,-1.454227,Normal,-1.274423,,-0.176789,-0.551690,0.0


In [2]:
# Create a summary table of our data
summary_table = train_df.describe()

missing_values = train_df.isna().sum()

summary_table.loc['missing'] = missing_values
summary_table

Unnamed: 0,age,sleep_duration,physical_activity_level,stress_level,heart_rate,systolic_bp,diastolic_bp,bmi_category_encoded
count,236.0,236.0,236.0,236.0,236.0,236.0,236.0,236.0
mean,0.016101,-0.005689,-0.010138,0.00173,0.024224,0.020309,0.015209,0.665254
std,1.021101,1.021494,1.015992,1.019541,1.05564,1.009857,1.014691,0.937527
min,-1.781355,-1.701184,-1.281995,-1.454227,-1.274423,-1.748244,-1.64473,0.0
25%,-0.769429,-0.890096,-0.770303,-0.824912,-0.404563,-0.962516,-0.55169,0.0
50%,-0.162273,0.056173,-0.002766,-0.195598,0.175343,-0.176789,-0.55169,0.0
75%,0.512345,0.73208,1.020618,1.063031,0.75525,0.608939,0.54135,2.0
max,2.536198,1.81353,1.788155,1.692346,4.524643,2.494685,2.72743,3.0
missing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [3]:
import altair as alt

# Visualize stress levels comparison between individuals with Insomia and without
stress_level_chart = alt.Chart(train_df).mark_bar().encode(
    x=alt.X('stress_level:Q', bin=alt.Bin(maxbins=20), title='Stress Level'),
    y=alt.Y('count()', title='Count'),
    color=alt.Color('sleep_disorder:N', legend=alt.Legend(title="Sleep Disorder")),
    tooltip=[alt.Tooltip('stress_level:Q', title='Stress Level'), 'count()']
).properties(
    width=200,
    height=300,
).facet(
    column=alt.Column('sleep_disorder:N', title='Comparison of Stress Levels Between Individuals with Insomnia and Without')
)

# Label BMI for visualization
bmi_labels = ['Normal', 'Normal Weight', 'Overweight', 'Obese']
train_df['bmi_category'] = train_df['bmi_category_encoded'].apply(lambda x: bmi_labels[int(x)])

# Visualize stress levels comparison between individuals with Insomia and without
bmi_category_chart = alt.Chart(train_df).mark_bar().encode(
    x=alt.X('bmi_category:O', title='BMI Category', sort=bmi_labels),
    y=alt.Y('count()', title='Count'),
    color=alt.Color('sleep_disorder:N', legend=alt.Legend(title="Sleep Disorder")),
    tooltip=[alt.Tooltip('bmi_category:O', title='BMI Category'), 'count()']
).properties(
    width=200,
    height=300,
).facet(
    column=alt.Column('sleep_disorder:N', title='Comparison of BMI Categories Between Individuals with Insomnia and Without')
)

# Remove unused column
train_df = train_df.drop(columns=['bmi_category'])

# Display the charts
stress_level_chart.display()
bmi_category_chart.display()

## Methods:

From all the possible columns in the dataset, we have decided to go with age, sleep duration, physical activity level, stress level, heart rate, sleep disorder, BMI (category encoded), and one's systolic and diatolic blood pressures. From these columns we will be finding similarities and differences to conclude direct reasons for why someone may be diagnosed with insomnia. The columns have been narrowed down to the most relevance to insomnia by personal judgement.

The results will most likely be presented as a bar or scatter plot with different colours showing either if it's insomnia or not. Other plots for each variable could be a viable option as well.


## Expected outcomes and significance:

In our analysis, we expect to find that individuals with insomnia will have increased stress levels, BMI,  blood pressure, and heart rate, and decreased sleep duration, quality of sleep, and physical acitivty.
We aim to investigate the underlying factors contributing to one of the sleep disturbances — insomnia. With approximately 172 million individuals worldwide exhibiting symptoms of insomnia, it's evident that sleep-related issues afflict a significant portion of the population. By delving into our data, we aspire to unearth insights that can mitigate the prevalence of insomnia. Using our data, we can also help individuals better determine whether or not they have insomnia based on related factors. Some future questions this could lead to are:
  * How do sleep patterns change over time within individuals?
  * What are the long-term health outcomes associated with chronic sleep disturbances?
  * What is the relationship between mental health conditions (such as anxiety, depression, PTSD) and sleep disturbances?
  * Do specific mental health interventions also improve sleep quality?