# Introduction
![](https://gifdb.com/images/high/tv-character-umaru-doma-anime-snoring-sleeping-61ix6i8x04hqgej0.gif)

## What is Sleep?
Sleep is a period of rest.

## Why is it important?
Sleep isn’t just a time when your brain and body shut down. Getting enough sleep helps you think more clearly and react more quickly. Not getting enough sleep can be dangerous, not only affecting your performance, but your health and mood, too.

## Purpose of Project
This project aims to uncover the causes of insomnia and sleep apnea, providing a clearer insight into what leads to these sleep disorders.

# Data Overview
* **Person ID**: An identifier for each individual.
* **Gender**: The gender of the person (Male/Female).
* **Age**: The age of the person in years.
* **Occupation**: The occupation or profession of the person.
* **Sleep Duration (hours)**: The number of hours the person sleeps per day.
* **Quality of Sleep (scale: 1-10)**: A subjective rating of the quality of sleep, ranging from 1 to 10.
* **Physical Activity Level (minutes/day)**: The number of minutes the person engages in physical activity daily.
* **Stress Level (scale: 1-10)**: A subjective rating of the stress level experienced by the person, ranging from 1 to 10.
* **BMI Category**: The BMI category of the person (e.g., Underweight, Normal, Overweight).
* **Blood Pressure (systolic/diastolic)**: The blood pressure measurement of the person, indicated as systolic pressure over diastolic pressure.
* **Heart Rate (bpm)**: The resting heart rate of the person in beats per minute.
* **Daily Steps**: The number of steps the person takes per day.
* **Sleep Disorder**: The presence or absence of a sleep disorder in the person (None, Insomnia, Sleep Apnea).

**Details about Sleep Disorder Column:**
* **None**: The individual does not exhibit any specific sleep disorder.
* **Insomnia**: The individual experiences difficulty falling asleep or staying asleep, leading to inadequate or poor-quality sleep.
* **Sleep Apnea**: The individual suffers from pauses in breathing during sleep, resulting in disrupted sleep patterns and potential health risks.

# Data Processing

In [None]:
import numpy as np 
import pandas as pd 
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv('/kaggle/input/sleep-health-and-lifestyle-dataset/Sleep_health_and_lifestyle_dataset.csv')
df.sample(5)

In [None]:
df.columns

In [None]:
df.info()

In [None]:
df['Sleep Disorder'] = df['Sleep Disorder'].fillna('None')
df.sample(5)

In [None]:
df.describe()

# Exploratory Data Analysis

In [None]:
# Style
color = sns.color_palette("tab10", 3)
grad = sns.color_palette("rocket", as_cmap=True)

**Sleep Disorder Distribution**

---

In [None]:
sleep_count = df['Sleep Disorder'].value_counts()
plt.figure(figsize=(8, 8))
plt.pie(sleep_count, autopct='%1.1f%%', startangle=135)
plt.title("Sleep Disorder Distribution", loc='left')
plt.legend(labels=sleep_count.index, title="Sleep Disorder Type", loc="upper right")
plt.show()

* More people are have None sleep disorder.

**Sleep Disorder by on Gender**

---

In [None]:
sleep_gender = df.groupby('Sleep Disorder')['Gender'].value_counts()
sleep_gender = sleep_gender.reset_index()

plt.figure(figsize=(8, 6))
sns.barplot(x = 'Gender',
            y = 'count',
            hue = 'Sleep Disorder',
            data = sleep_gender,
            palette = color)
sns.set_style('ticks')
plt.xlabel('Gender')
plt.ylabel('Count')
plt.title('Sleep Disorder by Gender', loc='left')
plt.legend(title='Sleep Disorder Type')

plt.show()

* It appears that more Male have **None** sleep disorder.
* **Insomnia** occurs equally in both males and females.
* Female is happen to be have more **Sleep Apnea** than Male.

**Sleep Disorder by on Occupation**

---

In [None]:
sleep_occupation = df.groupby('Sleep Disorder')['Occupation'].value_counts()
sleep_occupation = sleep_occupation.reset_index()
sleep_occupationt = sleep_occupation.sort_values(by='Occupation', ascending=True)

plt.figure(figsize=(8, 6))
sns.barplot(x = 'count',
            y = 'Occupation',
            hue = 'Sleep Disorder',
            data = sleep_occupation,
            palette = color)

plt.xlabel('Count')
plt.ylabel('Occupation')
plt.title('Sleep Disorder by Occupation', loc='left')
plt.legend(title='Sleep Disorder Type')

plt.show()

* **Salesperson** happen to be most having **Insomnia,** followed by **Teacher**, and **Accountant.**
* **Nurse** having the **Sleep Apnea** the most than other occupations.
* **Doctor, Engineer, Lawyer** have no sleep disorder.

**Age Distribution by Sleep Disorder**

---

In [None]:
sleep_list = ['None', 'Sleep Apnea', 'Insomnia']

fig, axes = plt.subplots(1, len(sleep_list), figsize=(16, 6), sharey=True)

for i, _ in enumerate(sleep_list):
    ax = axes[i]
    colors = color[i % len(color)]
    sns.histplot(data = df[df['Sleep Disorder'] == _], 
                 x = 'Age', 
                 kde = True, 
                 color = colors, 
                 ax = ax)
    ax.set_title(_)

plt.tight_layout()
plt.show()


* In the age range of **30 to 42**, the **None** sleep disorder.
* **Sleep Apnea** is more prevalent in the age range of **50 to 57.**
* The age range of **43 to 45** sees a higher occurrence of the **Insomnia.**

**Sleep Disorder by Sleep Quality**

---

In [None]:
sleep_quality = df.groupby('Sleep Disorder')['Quality of Sleep'].value_counts()
sleep_quality = sleep_quality.reset_index()

plt.figure(figsize=(8, 6))
sns.barplot(x = 'Quality of Sleep',
            y = 'count',
            hue = 'Sleep Disorder',
            data = sleep_quality,
            palette = color)
plt.xlabel('Quality of Sleep')
plt.ylabel('Count')
plt.title('Sleep Disorder by Quality of Sleep', loc='left')
plt.legend(title='Sleep Disorder Type', loc='upper left')

plt.show()

* With level quality of sleep in **8**, data shows that **None** sleep disorder.
* With level quality of sleep **6 to 7**, data shows that chance of **Insomnia** sleep disorder and **6 and 9** shows that chance of **Sleep Apnea**.

**Sleep Disorder by Physical Activity Level**

---

In [None]:
sleep_physical = df.groupby('Sleep Disorder')['Physical Activity Level'].value_counts()
sleep_physical = sleep_physical.reset_index()

plt.figure(figsize=(12, 7))
sns.lineplot(x = 'Physical Activity Level',
             y = 'count',
             hue = 'Sleep Disorder',
             data = sleep_physical,
             palette = color,
             style = 'Sleep Disorder')
plt.xlabel('Physical Activity Level')
plt.ylabel('Count')
plt.title('Sleep Disorder by Physical Activity Level', loc='left')
plt.legend(title='Sleep Disorder Type')

plt.show()

* **Insomnia:** Mostly happen at **Physical Activity Level 45.**
* **None:** Highest at **Physical Activity Level 60.**
* **Sleep Apnea:** Shows at at **Physical Activity Levels 75 and 90.**

**Sleep Disorder by Sleep Duration**

---

In [None]:
sleep_duration = df.groupby('Sleep Disorder')['Sleep Duration'].value_counts()
sleep_duration = sleep_duration.reset_index()

plt.figure(figsize=(12, 7))
sns.lineplot(x = 'Sleep Duration',
             y = 'count',
             hue = 'Sleep Disorder',
             data = sleep_duration,
             palette = color,
             style = 'Sleep Disorder')

plt.xlabel('Duration of Sleep')
plt.ylabel('Count')
plt.title('Sleep Disorder by Duration of Sleep', loc='left')
plt.legend(title='Sleep Disorder Type')

plt.show()

* **Insomnia:** Most common sleep duration is around **6.4 to 6.6 hours.**
* **None:** Most higher chance sleep duration is around **7.2, 7.7, and 7.8 hours.**
* **Sleep Apnea:** Sleep durations around **6.0 to 6.2 hours.**

**Sleep Disorder by Stress Level**

---

In [None]:
sleep_stress = df.groupby('Sleep Disorder')['Stress Level'].value_counts()
sleep_stress = sleep_stress.reset_index()

plt.figure(figsize=(12, 7))
sns.lineplot(x = 'Stress Level',
             y = 'count',
             hue = 'Sleep Disorder',
             data = sleep_stress,
             palette = color,
             style = 'Sleep Disorder')

plt.xlabel('Stress Level')
plt.ylabel('Count')
plt.title('Sleep Disorder by Stress Level', loc='left')
plt.legend(title='Sleep Disorder Type')

plt.show()

* **Insomnia:** Most common stress level is around **4 and 7.**
* **None:** Most higher chance sleep duration is around **4 to 6.**
* **Sleep Apnea:** Sleep durations around **3 and 8.**

**Sleep Disorder by BMI Category**

---

In [None]:
sleep_bmi = df.groupby('Sleep Disorder')['BMI Category'].value_counts()
sleep_bmi = sleep_bmi.reset_index()

plt.figure(figsize=(8, 6))
sns.barplot(x = 'BMI Category',
            y = 'count',
            hue = 'Sleep Disorder',
            data = sleep_bmi,
            palette = color)

plt.xlabel('Quality of Sleep')
plt.ylabel('Count')
plt.title('Sleep Disorder by BMI Category', loc='left')
plt.legend(title='Sleep Disorder Type', loc='upper left')

plt.show()

* **Normal BMI:** Mostly no sleep disorder.
* **Overweight:** More likely to have **Insomnia or Sleep Apnea.**

**Sleep Disorder by Blood Pressure**

---

In [None]:
df[['SBP', 'DBP']] = df['Blood Pressure'].str.split('/', expand=True)
df[['SBP', 'DBP']] = df[['SBP','DBP']].astype(int)

sns.jointplot(data = df,
              x = 'DBP', 
              y = 'SBP', 
              hue ='Sleep Disorder',
              kind = 'kde',
              palette = color,
              height = 10,
              )
plt.title('Sleep Disorder by Blood Pressure', loc='left')
plt.show()

* **None** sleep disorder likely at **120/80 to 125/85 blood pressure.**
* **Insomnia** sleep disorder likely at **130/85 to 135/90 blood pressure.**
* **Sleep Apnea** sleep disorder likely at **140/95 blood pressure.**

In [None]:
# converting non-numeric data (String or Boolean) into numbers
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

col_corr = ['Age', 'Gender', 'Occupation', 'Sleep Duration', 'Quality of Sleep', 
            'Physical Activity Level', 'Stress Level', 'BMI Category', 'Heart Rate', 
            'Daily Steps', 'SBP', 'DBP', 'Sleep Disorder']

df_cor = df.copy()

for x in col_corr:
    df_cor[x]=le.fit_transform(df_cor[x])

correlation_matrix = df_cor[col_corr].corr()
correlation_matrix

In [None]:
sns.heatmap(data = correlation_matrix,
            cmap = grad,
            annot = True, 
            fmt = ".2f",
            annot_kws = {'size': 6},
            center = 0)
plt.show()

In [None]:
df.to_csv('sleep_df.csv', index=False)