In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

a dataset with various personal health metrics for a group of individuals, including information on their sleep duration and quality, among other things. The data contains several repeated entries, and some entries have missing values in the "Sleep Disorder" column.

To help you effectively, it would be useful to understand what kind of business question or analysis you are looking to perform with this data. Here are a few possibilities:

    Descriptive Statistics:
        Calculate average, median, mode, etc., for various metrics like sleep duration, quality of sleep, daily steps, etc.
        Finding the correlation between different metrics, such as the correlation between sleep quality and physical activity level.

    Predictive Modeling:
        Build a predictive model to predict the likelihood of having a sleep disorder based on other metrics.
        Predict quality of sleep based on factors like stress level, physical activity, etc.

    Data Cleaning and Preprocessing:
        Identifying and handling missing values.
        Removing duplicate entries if necessary.

    Hypothesis Testing:
        Is there a significant difference in sleep quality between different occupations?
        Does BMI category significantly affect the quality of sleep?

    Visualization:
        Create visual representations of the data such as histogra

In [10]:
df['Sleep Duration'].describe().T

count    374.000000
mean       7.132086
std        0.795657
min        5.800000
25%        6.400000
50%        7.200000
75%        7.800000
max        8.500000
Name: Sleep Duration, dtype: float64

In [13]:
df['Sleep Duration'].median()

7.2

In [9]:
df['Sleep Duration'].mode()

0    7.2
Name: Sleep Duration, dtype: float64

In [11]:
df['Quality of Sleep'].describe().T

count    374.000000
mean       7.312834
std        1.196956
min        4.000000
25%        6.000000
50%        7.000000
75%        8.000000
max        9.000000
Name: Quality of Sleep, dtype: float64

In [14]:
df['Quality of Sleep'].median()

7.0

In [16]:
df['Quality of Sleep'].mode()

0    8
Name: Quality of Sleep, dtype: int64

In [12]:
df['Daily Steps'].describe().T

count      374.000000
mean      6816.844920
std       1617.915679
min       3000.000000
25%       5600.000000
50%       7000.000000
75%       8000.000000
max      10000.000000
Name: Daily Steps, dtype: float64

In [15]:
df['Daily Steps'].median()

7000.0

In [17]:
df['Daily Steps'].mode()

0    8000
Name: Daily Steps, dtype: int64

In [3]:
df = pd.read_csv('../DATA/Sleep_health_and_lifestyle_dataset.csv', index_col = 'Person ID')

In [4]:
df

Unnamed: 0_level_0,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
Person ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
...,...,...,...,...,...,...,...,...,...,...,...,...
370,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
371,Female,59,Nurse,8.0,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
372,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
373,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea


In [7]:
df.columns

Index(['Gender', 'Age', 'Occupation', 'Sleep Duration', 'Quality of Sleep',
       'Physical Activity Level', 'Stress Level', 'BMI Category',
       'Blood Pressure', 'Heart Rate', 'Daily Steps', 'Sleep Disorder'],
      dtype='object')

sleep quality and physical activity level.

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 374 entries, 1 to 374
Data columns (total 12 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Gender                   374 non-null    object 
 1   Age                      374 non-null    int64  
 2   Occupation               374 non-null    object 
 3   Sleep Duration           374 non-null    float64
 4   Quality of Sleep         374 non-null    int64  
 5   Physical Activity Level  374 non-null    int64  
 6   Stress Level             374 non-null    int64  
 7   BMI Category             374 non-null    object 
 8   Blood Pressure           374 non-null    object 
 9   Heart Rate               374 non-null    int64  
 10  Daily Steps              374 non-null    int64  
 11  Sleep Disorder           155 non-null    object 
dtypes: float64(1), int64(6), object(5)
memory usage: 38.0+ KB


In [9]:
df.Occupation.value_counts()

Occupation
Nurse                   73
Doctor                  71
Engineer                63
Lawyer                  47
Teacher                 40
Accountant              37
Salesperson             32
Software Engineer        4
Scientist                4
Sales Representative     2
Manager                  1
Name: count, dtype: int64

In [11]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Age,374.0,42.184492,8.673133,27.0,35.25,43.0,50.0,59.0
Sleep Duration,374.0,7.132086,0.795657,5.8,6.4,7.2,7.8,8.5
Quality of Sleep,374.0,7.312834,1.196956,4.0,6.0,7.0,8.0,9.0
Physical Activity Level,374.0,59.171123,20.830804,30.0,45.0,60.0,75.0,90.0
Stress Level,374.0,5.385027,1.774526,3.0,4.0,5.0,7.0,8.0
Heart Rate,374.0,70.165775,4.135676,65.0,68.0,70.0,72.0,86.0
Daily Steps,374.0,6816.84492,1617.915679,3000.0,5600.0,7000.0,8000.0,10000.0


In [13]:
df.isna()

Unnamed: 0_level_0,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
Person ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,False,False,False,False,False,False,False,False,False,False,False,True
2,False,False,False,False,False,False,False,False,False,False,False,True
3,False,False,False,False,False,False,False,False,False,False,False,True
4,False,False,False,False,False,False,False,False,False,False,False,False
5,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...
370,False,False,False,False,False,False,False,False,False,False,False,False
371,False,False,False,False,False,False,False,False,False,False,False,False
372,False,False,False,False,False,False,False,False,False,False,False,False
373,False,False,False,False,False,False,False,False,False,False,False,False


In [16]:
df.isnull().value_counts

<bound method DataFrame.value_counts of            Gender    Age  Occupation  Sleep Duration  Quality of Sleep  \
Person ID                                                                
1           False  False       False           False             False   
2           False  False       False           False             False   
3           False  False       False           False             False   
4           False  False       False           False             False   
5           False  False       False           False             False   
...           ...    ...         ...             ...               ...   
370         False  False       False           False             False   
371         False  False       False           False             False   
372         False  False       False           False             False   
373         False  False       False           False             False   
374         False  False       False           False             False  

In [19]:
df.iloc[0:2]

Unnamed: 0_level_0,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
Person ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
