# Sleep health and lifestyle

In this project we took the Sleep Health and Lifestyle Dataset in where you can find data on different people such as their profession, physical activity and sleep quality. We will take this dataset to know what are the things that affect sleep such as the amount of physical activity, the profession and the pressure to which each person is subjected.

## Import libraries

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

## Load data

In [2]:
data = pd.read_csv(
    "/kaggle/input/sleep-health-and-lifestyle-dataset/Sleep_health_and_lifestyle_dataset.csv",
    index_col = "Person ID"
)
data.head()

Unnamed: 0_level_0,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
Person ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea


## Questions for analysis

- Who has worse sleep quality, men or women?
- Is there a relationship between people's sleep quality and their profession?
- Does physical activity affect sleep?
- Which profession has the worst sleep quality?
- Which profession have people with the highest level of stress and the highest body mass index?
- In what age range are the most sleep disorders found?
- Does the number of steps per day affect sleep quality and body mass index?

Summary of the data set (using `.info()` method)

In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 374 entries, 1 to 374
Data columns (total 12 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Gender                   374 non-null    object 
 1   Age                      374 non-null    int64  
 2   Occupation               374 non-null    object 
 3   Sleep Duration           374 non-null    float64
 4   Quality of Sleep         374 non-null    int64  
 5   Physical Activity Level  374 non-null    int64  
 6   Stress Level             374 non-null    int64  
 7   BMI Category             374 non-null    object 
 8   Blood Pressure           374 non-null    object 
 9   Heart Rate               374 non-null    int64  
 10  Daily Steps              374 non-null    int64  
 11  Sleep Disorder           155 non-null    object 
dtypes: float64(1), int64(6), object(5)
memory usage: 38.0+ KB


Generate descriptive statistics (using `.describe()` method).

Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset's distribution, excluding NaN values.

In [4]:
data.describe()

Unnamed: 0,Age,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,Heart Rate,Daily Steps
count,374.0,374.0,374.0,374.0,374.0,374.0,374.0
mean,42.184492,7.132086,7.312834,59.171123,5.385027,70.165775,6816.84492
std,8.673133,0.795657,1.196956,20.830804,1.774526,4.135676,1617.915679
min,27.0,5.8,4.0,30.0,3.0,65.0,3000.0
25%,35.25,6.4,6.0,45.0,4.0,68.0,5600.0
50%,43.0,7.2,7.0,60.0,5.0,70.0,7000.0
75%,50.0,7.8,8.0,75.0,7.0,72.0,8000.0
max,59.0,8.5,9.0,90.0,8.0,86.0,10000.0


## Ages analisys

Count the number of men and women

In [5]:
data["Gender"].value_counts()

# or

# data.Gender.value_counts()

Gender
Male      189
Female    185
Name: count, dtype: int64

> This gives us a general overview of the group, and we can say that the group is in proportion to the number of men and women. Proportion, with respect to the number of men and women.

## Classification by age

In the set we can find the ages of the persons, with respect to the ages we can find the following:

In [6]:
data["Age"].min()

27

In [7]:
data["Age"].max()

59

The lowest age is 27 years old
The highest age is 59 years old

What we can say is that the data set takes into account people who are already of working age, starting work or with only a few years in it, or people close to retiring. Let's find the average age in the data set.

In [8]:
data["Age"].mean()

42.18449197860963

The average age of the data set is 42 years and the standard deviation is 8.67, which means that in this set we can find very dispersed data, which may be due to the few data collected.

To support us in this analysis, a new column was added to the data set called Age_group in which the age of the people was classified as follows:

- **Age less than or equal to 40 years**: *Adult*.
- **Age over 40 years**: *OlderAdult*

In [9]:
data.loc[data["Age"] <= 40, 'AgeGroup'] = "Adult"
data.loc[data["Age"] >= 41, 'AgeGroup'] = "OlderAdult"

# View the first rows of the data set
data.head()

Unnamed: 0_level_0,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder,AgeGroup
Person ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,,Adult
2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,,Adult
3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,,Adult
4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea,Adult
5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea,Adult


In [10]:
# View the last rows of the data set
data.tail()

Unnamed: 0_level_0,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder,AgeGroup
Person ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
370,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea,OlderAdult
371,Female,59,Nurse,8.0,9,75,3,Overweight,140/95,68,7000,Sleep Apnea,OlderAdult
372,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea,OlderAdult
373,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea,OlderAdult
374,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea,OlderAdult


Taking into account this ordering of the data, we have the following:

In [11]:
data['AgeGroup'].value_counts()

AgeGroup
OlderAdult    209
Adult         165
Name: count, dtype: int64

Finally, the average age for each gender is as follows:

In [12]:
data.groupby("Gender")["Age"].agg(["mean", "std"])

Unnamed: 0_level_0,mean,std
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,47.405405,8.093407
Male,37.074074,5.662006


We can see that there is a difference of 10 years in the average age of both genders, and we also note a large difference in the standard deviation, which is 3 years.

## Occupations analisys

In [13]:
data.groupby("Occupation")["Gender"].value_counts()

Occupation            Gender
Accountant            Female    36
                      Male       1
Doctor                Male      69
                      Female     2
Engineer              Female    32
                      Male      31
Lawyer                Male      45
                      Female     2
Manager               Female     1
Nurse                 Female    73
Sales Representative  Male       2
Salesperson           Male      32
Scientist             Female     4
Software Engineer     Male       4
Teacher               Female    35
                      Male       5
Name: count, dtype: int64

**We note the following:**

The occupations where there is more presence of women are:

- Accountant
- Engineering
- Manager
- Nursing
- Science
- Teacher

Occupations where men have more presence:

- Doctor
- Sales representative
- Sales
- Lawyer
- Software Engineer

It is important to mention that only in Engineering the difference is of only one person.

In [14]:
print(data["Sleep Duration"].min(), data["Sleep Duration"].max())

5.8 8.5


The average sleep duration between men and women is as follows:

In [15]:
data.groupby("Gender")[["Sleep Duration", "Quality of Sleep"]].mean()

Unnamed: 0_level_0,Sleep Duration,Quality of Sleep
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,7.22973,7.664865
Male,7.036508,6.968254


We can see that in general, women have a better sleep than men not only in quantity but also in quality, to go deeper, we will obtain the same data but now in each of the professions of the set.

In [16]:
occupation_quality = data.groupby("Occupation")[["Sleep Duration", "Quality of Sleep"]].mean()
occupation_quality.sort_values(by = "Quality of Sleep")

Unnamed: 0_level_0,Sleep Duration,Quality of Sleep
Occupation,Unnamed: 1_level_1,Unnamed: 2_level_1
Sales Representative,5.9,4.0
Scientist,6.0,5.0
Salesperson,6.403125,6.0
Software Engineer,6.75,6.5
Doctor,6.970423,6.647887
Teacher,6.69,6.975
Manager,6.9,7.0
Nurse,7.063014,7.369863
Accountant,7.113514,7.891892
Lawyer,7.410638,7.893617


In these tables we can see that the worst occupations with the worst quality and quantity of sleep are of sleep are:

- Sales Representative
- Scientist
- Sales

And those with the best quality and time of sleep are:

- Engineer
- Lawyer
- Accountant

In the professions with the worst quality and quantity of sleep there are more men than women.

## Health