### SLEEP HEALTH AND LIFESTYLE

#### Life cycle of Machine learning Project

* Understanding the Problem Statement
* Data Collection
* Data Checks to perform
* Exploratory data analysis
* Data Pre-Processing
* Model Training
* Choose best model

### 1) Problem statement

* This project aims to understand how sleep affects health and lifestyle, as sleep plays a vital role in maintaining overall well-being.

### 2) Data Collection
* Dataset Source - https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset
* The data consists of 374 rows and 13 columns.

### 2.1 Import Data and Required Packages

Importing Pandas, Numpy, Matplotlib, Seaborn and Warnings Library.

In [1519]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

Import CSV Data as Pandas DataFrame

In [1520]:
df = pd.read_csv('DATA/Sleep_health_and_lifestyle_dataset.csv')

Show top 5 records

In [1521]:
df.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea


Shape of Dataset.

In [1522]:
df.shape

(374, 13)

#### 2.2 Dataset Information

* Person ID: An identifier for each individual in the dataset.

* Gender: The gender of the person (Male/Female).

* Age: The age of the person in years.

* Occupation: The occupation or profession of the person.

* Sleep Duration (hours): The number of hours the person sleeps per day.

* Quality of Sleep (scale: 1-10): A subjective rating of the quality of sleep, ranging from 1 to 10

* Physical Activity Level (minutes/day): The number of minutes the person engages in physical activity daily.

* Stress Level (scale: 1-10): A subjective rating of the stress level experienced by the person, ranging from 1 to 10.

* BMI Category: The BMI category of the person (e.g., Underweight, Normal, Overweight).

* Blood Pressure (systolic/diastolic): The blood pressure measurement of the person, indicated as systolic pressure over diastolic pressure.

* Heart Rate (bpm): The resting heart rate of the person in beats per minute.

* Daily Steps: The number of steps the person takes per day.

* Sleep Disorder: The presence or absence of a sleep disorder in the person (None, Insomnia, Sleep Apnea).

This dataset provides a rich source of information for exploring the impact of various lifestyle factors on sleep health. Analyzing this data can yield valuable insights and assist in developing strategies to improve sleep quality and overall well-being.

### 3. Data Checks to perform 

* Check Missing values
* Check Duplicates
* Check data type
* Check the number of unique values of each column
* Check statistics of data set
* Check various categories present in the different categorical column

#### 3.1 Checking Missing Values

In [1523]:
df.isna().sum()

Person ID                    0
Gender                       0
Age                          0
Occupation                   0
Sleep Duration               0
Quality of Sleep             0
Physical Activity Level      0
Stress Level                 0
BMI Category                 0
Blood Pressure               0
Heart Rate                   0
Daily Steps                  0
Sleep Disorder             219
dtype: int64

* As we can see in the dataset, there are no missing values in the `Sleep Disorder` column. The `NaN` values represent individuals who do not have any sleep disorder and should be categorized as 'No Sleep Disorder'. Therefore, 'Insomnia', 'Sleep Apnea', and 'No Sleep Disorder' are the three categories in `Sleep Disorder`, allowing us to obtain accurate insights.

In [1524]:
df['Sleep Disorder'].fillna('No Sleep Disorder', inplace=True)

#### 3.2 Check Duplicates

In [1525]:
df.duplicated().sum()

np.int64(0)

There are no duplicates values in the data set

#### 3.3 Check data types

In [1526]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 374 entries, 0 to 373
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Person ID                374 non-null    int64  
 1   Gender                   374 non-null    object 
 2   Age                      374 non-null    int64  
 3   Occupation               374 non-null    object 
 4   Sleep Duration           374 non-null    float64
 5   Quality of Sleep         374 non-null    int64  
 6   Physical Activity Level  374 non-null    int64  
 7   Stress Level             374 non-null    int64  
 8   BMI Category             374 non-null    object 
 9   Blood Pressure           374 non-null    object 
 10  Heart Rate               374 non-null    int64  
 11  Daily Steps              374 non-null    int64  
 12  Sleep Disorder           374 non-null    object 
dtypes: float64(1), int64(7), object(5)
memory usage: 38.1+ KB


#### 3.4 Checking the number of unique values of each column

In [1527]:
df.nunique()

Person ID                  374
Gender                       2
Age                         31
Occupation                  11
Sleep Duration              27
Quality of Sleep             6
Physical Activity Level     16
Stress Level                 6
BMI Category                 4
Blood Pressure              25
Heart Rate                  19
Daily Steps                 20
Sleep Disorder               3
dtype: int64

#### 3.5 Check statistics of data set

In [1528]:
df.describe()

Unnamed: 0,Person ID,Age,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,Heart Rate,Daily Steps
count,374.0,374.0,374.0,374.0,374.0,374.0,374.0,374.0
mean,187.5,42.184492,7.132086,7.312834,59.171123,5.385027,70.165775,6816.84492
std,108.108742,8.673133,0.795657,1.196956,20.830804,1.774526,4.135676,1617.915679
min,1.0,27.0,5.8,4.0,30.0,3.0,65.0,3000.0
25%,94.25,35.25,6.4,6.0,45.0,4.0,68.0,5600.0
50%,187.5,43.0,7.2,7.0,60.0,5.0,70.0,7000.0
75%,280.75,50.0,7.8,8.0,75.0,7.0,72.0,8000.0
max,374.0,59.0,8.5,9.0,90.0,8.0,86.0,10000.0


* As we can see the 'Blood Pressure' column is not being captured due to the split. Lets split Blood Pressure Upper_BP and Lower_BP and try again.

In [1529]:
# Splitting Blood pressure as High_BP and Low_BP.
df[['High_BP','Low_BP']]=df['Blood Pressure'].str.split('/', expand=True)

In [1530]:
# Converting from object type to int type
df[['High_BP','Low_BP']]=df[['High_BP','Low_BP']].astype(float)

In [1531]:
# Dropping BP column
df.drop('Blood Pressure',axis=1,inplace=True)

In [1532]:
df

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Heart Rate,Daily Steps,Sleep Disorder,High_BP,Low_BP
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,77,4200,No Sleep Disorder,126.0,83.0
1,2,Male,28,Doctor,6.2,6,60,8,Normal,75,10000,No Sleep Disorder,125.0,80.0
2,3,Male,28,Doctor,6.2,6,60,8,Normal,75,10000,No Sleep Disorder,125.0,80.0
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,85,3000,Sleep Apnea,140.0,90.0
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,85,3000,Sleep Apnea,140.0,90.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
369,370,Female,59,Nurse,8.1,9,75,3,Overweight,68,7000,Sleep Apnea,140.0,95.0
370,371,Female,59,Nurse,8.0,9,75,3,Overweight,68,7000,Sleep Apnea,140.0,95.0
371,372,Female,59,Nurse,8.1,9,75,3,Overweight,68,7000,Sleep Apnea,140.0,95.0
372,373,Female,59,Nurse,8.1,9,75,3,Overweight,68,7000,Sleep Apnea,140.0,95.0


In [1533]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 374 entries, 0 to 373
Data columns (total 14 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Person ID                374 non-null    int64  
 1   Gender                   374 non-null    object 
 2   Age                      374 non-null    int64  
 3   Occupation               374 non-null    object 
 4   Sleep Duration           374 non-null    float64
 5   Quality of Sleep         374 non-null    int64  
 6   Physical Activity Level  374 non-null    int64  
 7   Stress Level             374 non-null    int64  
 8   BMI Category             374 non-null    object 
 9   Heart Rate               374 non-null    int64  
 10  Daily Steps              374 non-null    int64  
 11  Sleep Disorder           374 non-null    object 
 12  High_BP                  374 non-null    float64
 13  Low_BP                   374 non-null    float64
dtypes: float64(3), int64(7), o

In [1534]:
df.describe()

Unnamed: 0,Person ID,Age,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,Heart Rate,Daily Steps,High_BP,Low_BP
count,374.0,374.0,374.0,374.0,374.0,374.0,374.0,374.0,374.0,374.0
mean,187.5,42.184492,7.132086,7.312834,59.171123,5.385027,70.165775,6816.84492,128.553476,84.649733
std,108.108742,8.673133,0.795657,1.196956,20.830804,1.774526,4.135676,1617.915679,7.748118,6.161611
min,1.0,27.0,5.8,4.0,30.0,3.0,65.0,3000.0,115.0,75.0
25%,94.25,35.25,6.4,6.0,45.0,4.0,68.0,5600.0,125.0,80.0
50%,187.5,43.0,7.2,7.0,60.0,5.0,70.0,7000.0,130.0,85.0
75%,280.75,50.0,7.8,8.0,75.0,7.0,72.0,8000.0,135.0,90.0
max,374.0,59.0,8.5,9.0,90.0,8.0,86.0,10000.0,142.0,95.0


1. **Age Distribution:**
   - **Mean Age**: 42.2 years
   - **Standard Deviation**: 8.67 years

2. **Sleep Duration:**
   - **Average**: 7.13 hours
   - **Standard Deviation**: 0.8 hours

3. **Sleep Quality:**
   - **Average**: 7.31
   - **Standard Deviation**: 1.20

4. **Physical Activity Level:**
   - **Mean**: 59.17
   - **Standard Deviation**: 20.83

5. **Stress Level:**
   - **Average**: 5.39
   - **Standard Deviation**: 1.77

6. **Heart Rate:**
   - **Average**: 70.17 bpm
   - **Standard Deviation**: 4.14 bpm

7. **Daily Steps:**
   - **Average**: 6816.84 steps
   - **Standard Deviation**: 1617.92 steps

**Key metrics for EDA:**
- **Means** and **Standard Deviations** for understanding central tendency and variability.
- **Ranges** and **Distributions** to grasp the spread and potential outliers.

These metrics will help in identifying patterns, outliers, and relationships among variables.

* Numeric values have been handled lets work on Categorical values.

#### 3.6 Check various categories present in the different categorical column

In [1535]:
# Unique values and counts for Occupation
print('List of Unique Occupations:', df['Occupation'].unique(), '| Count of Unique Occupations:', df['Occupation'].nunique())

print()

# Unique values and counts for BMI Category
print('List of Unique BMI Category:', df['BMI Category'].unique(), '| Coungt of Unique BMI Category:', df['BMI Category'].nunique())

print()

# Unique values and counts for Genders
print('List of Unique Genders:', df['Gender'].unique(), '|Count of Unique Genders:', df['Gender'].nunique())

print()

# Unique values and counts for Sleep Disorder
print('List of Unique Sleep Disorders:', df['Sleep Disorder'].unique(), '|Count of Unique Sleep Disorders:', df['Sleep Disorder'].nunique())


List of Unique Occupations: ['Software Engineer' 'Doctor' 'Sales Representative' 'Teacher' 'Nurse'
 'Engineer' 'Accountant' 'Scientist' 'Lawyer' 'Salesperson' 'Manager'] | Count of Unique Occupations: 11

List of Unique BMI Category: ['Overweight' 'Normal' 'Obese' 'Normal Weight'] | Coungt of Unique BMI Category: 4

List of Unique Genders: ['Male' 'Female'] |Count of Unique Genders: 2

List of Unique Sleep Disorders: ['No Sleep Disorder' 'Sleep Apnea' 'Insomnia'] |Count of Unique Sleep Disorders: 3


OBSERVATIONS

1. Occupation 
* **Encoding Technique:** One-Hot Encoding
* **Reason:** Occupations are nominal categories without an inherent order. One-Hot Encoding is suitable as it creates binary columns for each occupation, allowing the model to treat each occupation as a separate feature without implying any ordinal relationship.

2. BMI Category
* **Encoding Technique:** Ordinal Encoding
* **Reason:** BMI categories have a clear ordinal relationship (Normal Weight < Normal < Overweight < Obese). Ordinal Encoding preserves this order by mapping categories to integer values based on their rank, which can be beneficial for models that leverage this ordinal information.

3. Genders
* **Encoding Technique:** Label Encoding
* **Reason:** With only two categories, Label Encoding is effective. It converts Male and Female into binary values (e.g., 0 and 1). This simple approach works well for binary categorical features.

4. Sleep Disorders 
* **Encoding Technique:**: One-Hot Encoding
* **Reason:** Sleep Disorders are nominal categories without a specific order. One-Hot Encoding is appropriate as it converts each disorder into binary columns, avoiding the imposition of any ordinal relationship and ensuring each disorder is treated as a separate feature.

In [1536]:
#copying the dataframe
df1 = df.copy()

In [1537]:
df1

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Heart Rate,Daily Steps,Sleep Disorder,High_BP,Low_BP
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,77,4200,No Sleep Disorder,126.0,83.0
1,2,Male,28,Doctor,6.2,6,60,8,Normal,75,10000,No Sleep Disorder,125.0,80.0
2,3,Male,28,Doctor,6.2,6,60,8,Normal,75,10000,No Sleep Disorder,125.0,80.0
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,85,3000,Sleep Apnea,140.0,90.0
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,85,3000,Sleep Apnea,140.0,90.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
369,370,Female,59,Nurse,8.1,9,75,3,Overweight,68,7000,Sleep Apnea,140.0,95.0
370,371,Female,59,Nurse,8.0,9,75,3,Overweight,68,7000,Sleep Apnea,140.0,95.0
371,372,Female,59,Nurse,8.1,9,75,3,Overweight,68,7000,Sleep Apnea,140.0,95.0
372,373,Female,59,Nurse,8.1,9,75,3,Overweight,68,7000,Sleep Apnea,140.0,95.0


In [1538]:

from sklearn.preprocessing import OneHotEncoder, LabelEncoder

label_encoder = LabelEncoder()
df1['Gender'] = label_encoder.fit_transform(df1['Gender'])

one_hot_encoder = OneHotEncoder(sparse_output=False)  
occupation_encoded = one_hot_encoder.fit_transform(df1[['Occupation']])
occupation_encoded_df = pd.DataFrame(occupation_encoded, columns=one_hot_encoder.get_feature_names_out(['Occupation']))

df1 = df1.join(occupation_encoded_df).drop(columns=['Occupation'])

ordinal_mapping = {
    'Normal Weight': 1,
    'Normal': 2,
    'Overweight': 3,
    'Obese': 4
}
df1['BMI Category'] = df1['BMI Category'].map(ordinal_mapping)

sleep_disorder_encoded = one_hot_encoder.fit_transform(df1[['Sleep Disorder']])
sleep_disorder_encoded_df = pd.DataFrame(sleep_disorder_encoded, columns=one_hot_encoder.get_feature_names_out(['Sleep Disorder']))
sleep_disorder_encoded_df
# Concatenate one-hot encoded columns with the DataFrame
df1 = df1.join(sleep_disorder_encoded_df).drop(columns=['Sleep Disorder'])


In [1539]:
df1

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Heart Rate,Daily Steps,High_BP,Low_BP,Sleep Disorder_Insomnia,Sleep Disorder_No Sleep Disorder,Sleep Disorder_Sleep Apnea
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,77,4200,126.0,83.0,0.0,1.0,0.0
1,2,Male,28,Doctor,6.2,6,60,8,Normal,75,10000,125.0,80.0,0.0,1.0,0.0
2,3,Male,28,Doctor,6.2,6,60,8,Normal,75,10000,125.0,80.0,0.0,1.0,0.0
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,85,3000,140.0,90.0,0.0,0.0,1.0
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,85,3000,140.0,90.0,0.0,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
369,370,Female,59,Nurse,8.1,9,75,3,Overweight,68,7000,140.0,95.0,0.0,0.0,1.0
370,371,Female,59,Nurse,8.0,9,75,3,Overweight,68,7000,140.0,95.0,0.0,0.0,1.0
371,372,Female,59,Nurse,8.1,9,75,3,Overweight,68,7000,140.0,95.0,0.0,0.0,1.0
372,373,Female,59,Nurse,8.1,9,75,3,Overweight,68,7000,140.0,95.0,0.0,0.0,1.0


In [None]:
from sklearn.preprocessing import OneHotEncoder
from sklearn import preprocessing


One_Hot_Encoder = OneHotEncoder(sparse_output=False)
Label_Encoder = preprocessing.LabelEncoder()
df1['Gender'] = Label_Encoder.fit_transform(df1[['Gender']]).astype(int)
df1['Occupation'] = Label_Encoder.fit_transform(df1[['Occupation']])
ordinal_mapping = {
    'Normal Weight': 1,
    'Normal': 2,
    'Overweight': 3,
    'Obese': 4
}
df1['BMI Category'] = df1['BMI Category'].map(ordinal_mapping)
df1['Sleep Disorder prob'] = Label_Encoder.fit_transform(df1[['Sleep Disorder']]).astype(int)

In [None]:
df1.head(21)

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Heart Rate,Daily Steps,Sleep Disorder,High_BP,Low_BP,Sleep Disorder prob
0,1,1,27,9,6.1,6,42,6,3,77,4200,No Sleep Disorder,126.0,83.0,1
1,2,1,28,1,6.2,6,60,8,2,75,10000,No Sleep Disorder,125.0,80.0,1
2,3,1,28,1,6.2,6,60,8,2,75,10000,No Sleep Disorder,125.0,80.0,1
3,4,1,28,6,5.9,4,30,8,4,85,3000,Sleep Apnea,140.0,90.0,2
4,5,1,28,6,5.9,4,30,8,4,85,3000,Sleep Apnea,140.0,90.0,2
5,6,1,28,9,5.9,4,30,8,4,85,3000,Insomnia,140.0,90.0,0
6,7,1,29,10,6.3,6,40,7,4,82,3500,Insomnia,140.0,90.0,0
7,8,1,29,1,7.8,7,75,6,2,70,8000,No Sleep Disorder,120.0,80.0,1
8,9,1,29,1,7.8,7,75,6,2,70,8000,No Sleep Disorder,120.0,80.0,1
9,10,1,29,1,7.8,7,75,6,2,70,8000,No Sleep Disorder,120.0,80.0,1


In [None]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 374 entries, 0 to 373
Data columns (total 15 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Person ID                374 non-null    int64  
 1   Gender                   374 non-null    int64  
 2   Age                      374 non-null    int64  
 3   Occupation               374 non-null    int64  
 4   Sleep Duration           374 non-null    float64
 5   Quality of Sleep         374 non-null    int64  
 6   Physical Activity Level  374 non-null    int64  
 7   Stress Level             374 non-null    int64  
 8   BMI Category             374 non-null    int64  
 9   Heart Rate               374 non-null    int64  
 10  Daily Steps              374 non-null    int64  
 11  Sleep Disorder           374 non-null    object 
 12  High_BP                  374 non-null    float64
 13  Low_BP                   374 non-null    float64
 14  Sleep Disorder prob      3

############################################################################################################

In [None]:
from sklearn.preprocessing import OneHotEncoder

One_Hot_Encoder = OneHotEncoder(sparse_output=False)

df1['Gender'] = One_Hot_Encoder.fit_transform(df[['Gender']])
# df1['Occupation'] = One_Hot_Encoder.fit_transform(df1[['Occupation']])
# df1['Sleep Disorder'] = One_Hot_Encoder.fit_transform(df1[['Sleep Disorder']])
# bmi_mapping = {'Underweight': 1, 'Normal': 2, 'Overweight': 3, 'Obese': 4}
# df1['BMI Category'] = df1['BMI Category'].map(bmi_mapping)

df1

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Heart Rate,Daily Steps,Sleep Disorder,High_BP,Low_BP,Sleep Disorder prob
0,1,0.0,27,9,6.1,6,42,6,3,77,4200,No Sleep Disorder,126.0,83.0,1
1,2,0.0,28,1,6.2,6,60,8,2,75,10000,No Sleep Disorder,125.0,80.0,1
2,3,0.0,28,1,6.2,6,60,8,2,75,10000,No Sleep Disorder,125.0,80.0,1
3,4,0.0,28,6,5.9,4,30,8,4,85,3000,Sleep Apnea,140.0,90.0,2
4,5,0.0,28,6,5.9,4,30,8,4,85,3000,Sleep Apnea,140.0,90.0,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
369,370,1.0,59,5,8.1,9,75,3,3,68,7000,Sleep Apnea,140.0,95.0,2
370,371,1.0,59,5,8.0,9,75,3,3,68,7000,Sleep Apnea,140.0,95.0,2
371,372,1.0,59,5,8.1,9,75,3,3,68,7000,Sleep Apnea,140.0,95.0,2
372,373,1.0,59,5,8.1,9,75,3,3,68,7000,Sleep Apnea,140.0,95.0,2


In [None]:
df1['BMI Category'].isnull().sum()

np.int64(0)

In [None]:
# Display rows where 'BMI Category' is missing
missing_bmi = df[df['BMI Category'].isnull()]
missing_bmi


Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Heart Rate,Daily Steps,Sleep Disorder,High_BP,Low_BP


In [None]:
# Fill missing values with the mode of the column
mode_bmi = df1['BMI Category'].mode()[0]
df1['BMI Category'].fillna(mode_bmi, inplace=True)


In [None]:
df1

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Heart Rate,Daily Steps,Sleep Disorder,High_BP,Low_BP,Sleep Disorder prob
0,1,0.0,27,9,6.1,6,42,6,3,77,4200,No Sleep Disorder,126.0,83.0,1
1,2,0.0,28,1,6.2,6,60,8,2,75,10000,No Sleep Disorder,125.0,80.0,1
2,3,0.0,28,1,6.2,6,60,8,2,75,10000,No Sleep Disorder,125.0,80.0,1
3,4,0.0,28,6,5.9,4,30,8,4,85,3000,Sleep Apnea,140.0,90.0,2
4,5,0.0,28,6,5.9,4,30,8,4,85,3000,Sleep Apnea,140.0,90.0,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
369,370,1.0,59,5,8.1,9,75,3,3,68,7000,Sleep Apnea,140.0,95.0,2
370,371,1.0,59,5,8.0,9,75,3,3,68,7000,Sleep Apnea,140.0,95.0,2
371,372,1.0,59,5,8.1,9,75,3,3,68,7000,Sleep Apnea,140.0,95.0,2
372,373,1.0,59,5,8.1,9,75,3,3,68,7000,Sleep Apnea,140.0,95.0,2


In [None]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 374 entries, 0 to 373
Data columns (total 15 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Person ID                374 non-null    int64  
 1   Gender                   374 non-null    float64
 2   Age                      374 non-null    int64  
 3   Occupation               374 non-null    int64  
 4   Sleep Duration           374 non-null    float64
 5   Quality of Sleep         374 non-null    int64  
 6   Physical Activity Level  374 non-null    int64  
 7   Stress Level             374 non-null    int64  
 8   BMI Category             374 non-null    int64  
 9   Heart Rate               374 non-null    int64  
 10  Daily Steps              374 non-null    int64  
 11  Sleep Disorder           374 non-null    object 
 12  High_BP                  374 non-null    float64
 13  Low_BP                   374 non-null    float64
 14  Sleep Disorder prob      3

In [None]:
# Assuming you've already encoded the columns:
df1['Gender'] = df1['Gender'].astype(int)
df1['Occupation'] = df1['Occupation'].astype(int)
df1['BMI Category'] = df1['BMI Category'].astype(int)
df1['Sleep Disorder'] = df1['Sleep Disorder'].astype(int)

# Check the dtypes to confirm the change
df1.info()


ValueError: invalid literal for int() with base 10: 'No Sleep Disorder'