# Mock Lifestyle Data Based on Overall Sleep Health

MOCK_SLEEP_DATA.csv was generated using Mockaroo.com, and modeled after the Sleep_health_and_lifestyle_dataset.csv obtained from the Kaggle dataset with the same corresponding name, authored by Laksika Tharmalingam 

Data Sources:

- https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset
- MOCK_SLEEP_DATA was custom generated via Mockaroo.com

In [1]:
import pandas as pd

study = pd.read_csv("data/MOCK_SLEEP_DATA.csv")

study.shape

(1000, 21)

In [2]:
study.head(15)

Unnamed: 0,id,first_name,last_name,email,ip_address,Person ID,Gender,Age,Occupation,Sleep Duration,...,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder,Country of Residence,Min Sleep Oxygen Saturation,Average Sleep Oxygen Saturation
0,1,Desiree,McEllen,dmcellen0@usnews.com,183.48.236.130,4,Genderfluid,49,Chemical Engineer,6.3,...,73,8,Under Weight,145/95,79,4281,Insomnia,Philippines,83.2,91.2
1,2,Dona,Francescone,dfrancescone1@tuttocitta.it,231.126.33.94,3,Female,43,Cost Accountant,7.5,...,48,7,Under Weight,132/87,71,2085,Restless Legs Syndrome,China,88.8,80.4
2,3,Deanna,Reedshaw,dreedshaw2@army.mil,123.171.170.216,7,Bigender,57,Help Desk Technician,6.4,...,52,2,Under Weight,126/83,90,3365,Insomnia,Cuba,91.2,75.9
3,4,Chelsey,Maxsted,cmaxsted3@time.com,246.148.210.218,5,Genderfluid,26,Health Coach III,9.9,...,46,1,Under Weight,125/80,91,3359,,France,94.6,92.4
4,5,Shadow,Godsmark,sgodsmark4@yandex.ru,117.173.14.104,5,Male,60,Assistant Media Planner,10.2,...,52,3,Overweight,132/87,90,6768,,Azerbaijan,68.2,86.4
5,6,Raimundo,Andersch,randersch5@google.co.jp,105.149.231.102,2,Male,38,Geologist III,7.7,...,74,6,Under Weight,125/80,87,8321,Narcolepsy Type 1,Indonesia,79.1,76.2
6,7,Atlante,Whittall,awhittall6@php.net,121.251.149.95,9,Female,53,Professor,6.5,...,60,9,Normal,125/80,70,8867,,Dominican Republic,77.2,80.9
7,8,Justinn,Swatton,jswatton7@chicagotribune.com,145.174.167.219,7,Genderqueer,43,VP Product Management,5.4,...,64,1,Obese,145/95,92,3802,Sleep Apnea,China,85.1,78.4
8,9,Dell,Geoghegan,dgeoghegan8@acquirethisname.com,110.88.55.218,3,Male,34,Teacher,7.0,...,25,2,Obese,140/90,82,2759,Idiopathic Hypersomnia,China,94.6,85.4
9,10,Eben,Hayball,ehayball9@jigsy.com,184.56.211.84,1,Agender,39,Structural Analysis Engineer,9.0,...,47,8,Obese,150/102,87,7067,Narcolepsy Type 2,Iran,71.4,86.0


### Sleep Health Data Dictionary

| Column Name | Type | Description | Cleaning Notes |
| ----------- | ----- | ----------- | ------------- |
| id | INTEGER | Row ID Number | Remove, non-quantifiable data | 
| first_name | TEXT | First name of the patient | Remove, non-quantifiable data | 
| last_name | TEXT | Last name of the patient | Remove, non-quantifiable data | 
| email | TEXT | | Remove, non-quantifiable data | 
| ip_address | FLOAT | | Remove, non-quantifiable data | 
| Person ID | INTEGER | | Nonsense parameter, remove
| Gender | TEXT | Gender of the patient | |
| Age | INTEGER | Age of the patient | |
| Occupation | TEXT | Patient's occupation, the goal is to correlate this with the highest level of education once that field is added | Narrow down occupations to the top five based on education level |
| Sleep Duration | FLOAT | How many hours on average the given patient sleeps at night | |
| Quality of Sleep | INTEGER | Patients were asked to rate their average quality of sleep from 1 to 10, with 1 feeling like they had no sleep to 10 being exceedingly restorative sleep | |
| Physical Activity Level | INTEGER | Patients were asked to rate their level of physical activity from 1-100, with 1 being completely bedridden to 100 being active all day inside and outside of work | |
| Stress Level | INTEGER | Patients were asked to rate their level of stress from 1 to 10, with 1 being a completely relaxed lifestyle to 10 being on the verge of breakdown | |
| BMI Category | TEXT | Patients were asked so submit their BMI, and corresponding range name for their number was recorded | |
| Blood Pressure | TEXT | Patients were asked to track their blood pressure and send an average of their findings | |
| Heart Rate | INTEGER | Patients average heart rate was measured in BPM format and recorded | |
| Daily Steps | INTEGER | Patients were asked to record their daily steps during a 7 day period, and the average was computed for this study | |
| Sleep Disorder | TEXT | Patients that have a diagnosed sleep disorder were asked to share, otherwise the field is "Nan" if no sleep disorder was present | Convert Nan fields to present as 'none' |
| Country of Residence | TEXT | The patient's country of residence | |
| Min Sleep Oxygen Saturation | FLOAT | Patients were asked to wear a pulse oximeter during sleep, and this is the lowest oxygen saturation recorded during that time | |
| Average Sleep Oxygen Saturation | FLOAT | Patients were asked to submit their average oxygen saturation during sleep based off of pulse oximeter data | |


In [3]:
study.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 21 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   id                               1000 non-null   int64  
 1   first_name                       1000 non-null   object 
 2   last_name                        1000 non-null   object 
 3   email                            1000 non-null   object 
 4   ip_address                       1000 non-null   object 
 5   Person ID                        1000 non-null   int64  
 6   Gender                           1000 non-null   object 
 7   Age                              1000 non-null   int64  
 8   Occupation                       1000 non-null   object 
 9   Sleep Duration                   1000 non-null   float64
 10  Quality of Sleep                 1000 non-null   int64  
 11  Physical Activity Level          1000 non-null   int64  
 12  Stress Level         

In [4]:
study["Sleep Disorder"].describe()

count                        855
unique                         6
top       Idiopathic Hypersomnia
freq                         161
Name: Sleep Disorder, dtype: object

In [5]:
study["Sleep Disorder"].unique()

array(['Insomnia', 'Restless Legs Syndrome', nan, 'Narcolepsy Type 1',
       'Sleep Apnea', 'Idiopathic Hypersomnia', 'Narcolepsy Type 2'],
      dtype=object)

In [6]:
study["Sleep Disorder"].value_counts()

Sleep Disorder
Idiopathic Hypersomnia    161
Narcolepsy Type 2         148
Sleep Apnea               146
Insomnia                  140
Restless Legs Syndrome    131
Narcolepsy Type 1         129
Name: count, dtype: int64

In [7]:
study["Occupation"].unique()

array(['Chemical Engineer', 'Cost Accountant', 'Help Desk Technician',
       'Health Coach III', 'Assistant Media Planner', 'Geologist III',
       'Professor', 'VP Product Management', 'Teacher',
       'Structural Analysis Engineer', 'Account Executive',
       'Web Developer I', 'Recruiting Manager', 'Associate Professor',
       'Operator', 'Media Manager III', 'Marketing Assistant',
       'GIS Technical Architect', 'Administrative Assistant II',
       'Software Engineer I', 'Programmer III', 'Quality Engineer',
       'Desktop Support Technician', 'Editor', 'Environmental Specialist',
       'Office Assistant III', 'Dental Hygienist',
       'Senior Quality Engineer', 'Research Nurse',
       'Analog Circuit Design manager', 'Executive Secretary', 'VP Sales',
       'Biostatistician III', 'Environmental Tech', 'Financial Advisor',
       'Librarian', 'Speech Pathologist', 'Civil Engineer',
       'Web Designer I', 'Graphic Designer', 'Tax Accountant',
       'Engineer I', 'Inte

In [13]:
study["Occupation"].value_counts()

Occupation
Teacher                           16
Analyst Programmer                15
Payment Adjustment Coordinator    14
Structural Analysis Engineer      14
Account Executive                 14
                                  ..
Software Test Engineer III         1
Statistician I                     1
Safety Technician II               1
Safety Technician I                1
Engineer III                       1
Name: count, Length: 189, dtype: int64

In [9]:
study["Country of Residence"].unique()

array(['Philippines', 'China', 'Cuba', 'France', 'Azerbaijan',
       'Indonesia', 'Dominican Republic', 'Iran', 'Portugal', 'Japan',
       'Sweden', 'Poland', 'United States', 'Russia', 'Czech Republic',
       'Greece', 'Canada', 'South Korea', 'Latvia', 'Honduras',
       'Bangladesh', 'Nigeria', 'Mozambique', 'Mexico', 'Brazil', 'Spain',
       'Serbia', 'Comoros', 'Thailand', 'Malaysia', 'Uruguay', 'Libya',
       'Morocco', 'Macedonia', 'Argentina', 'Mongolia', 'Lebanon',
       'Denmark', 'New Zealand', 'Norway', 'Cameroon', 'Ecuador',
       'Sri Lanka', 'Netherlands', 'Colombia', 'Ethiopia', 'Georgia',
       'Bolivia', 'Gambia', 'Ivory Coast', 'Namibia', 'Guatemala',
       'Saint Kitts and Nevis', 'Tunisia', 'Israel', 'Vietnam', 'Peru',
       'Bulgaria', 'South Africa', 'Zimbabwe', 'Ukraine', 'Syria',
       'Albania', 'Madagascar', 'Finland', 'Pakistan', 'Venezuela',
       'Moldova', 'Malta', 'Kazakhstan', 'El Salvador', 'United Kingdom',
       'Tajikistan', 'Ireland', 

In [10]:
study["Country of Residence"].value_counts()

Country of Residence
China           198
Indonesia       103
Russia           60
Philippines      57
Brazil           40
               ... 
Namibia           1
Gambia            1
Liberia           1
Sudan             1
Sierra Leone      1
Name: count, Length: 122, dtype: int64

In [11]:
study["BMI Category"].unique()

array(['Under Weight', 'Overweight', 'Normal', 'Obese'], dtype=object)

In [12]:
study["BMI Category"].value_counts()

BMI Category
Overweight      272
Obese           265
Normal          239
Under Weight    224
Name: count, dtype: int64

## Questions:

- Which career is the most common for individuals that are affected by a sleeo disorder?
- How do sleep disorders contribute to stress levels vs others without a sleep disorder?
- Do overweight individuals with sleep disorders experience more stress?
- Which gender is more likely to have higher stress levels due to sleep disorders?
- Which career(s) are more stressful with sleep disorders?
- Does a higher physical activity level make you less likely to have a sleep disorder?
- Do those with sleep disorders earn more, less or the same as those without sleep disorders?
- Does having a sleep disorder present a barrier to higher education?
- Do certain countries have higher rates of sleep disorders vs the average? 

## Fields Needed
- Education Level 
- Salaries with preset ranges that correspond to education levels
