---
## **TEHREEM ZUBAIR**
## **TASK 13**
---

In [28]:
import pandas as pd

This synthetic dataset contains sleep and cardiovascular metrics as well as lifestyle factors of close to 400 fictive persons.

---
## **INDEXING**

Indexing in Pandas is a fundamental skill that allows you to efficiently access and manipulate data. This guide will cover various indexing techniques for both Series and DataFrame objects.

---
### **SERIES INDEXING**
We can access elements using labels or positions:

In [29]:
# From a list
data = [10, 20, 30, 40, 50]
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)

In [30]:
# accessing a single element
value = series['a']
value

10

In [31]:
# accessing multiple elememts
value = series[['a', 'd', 'e']]
value

a    10
d    40
e    50
dtype: int64

In [32]:
# slicing by label
slice = series['b' : 'd']
slice

b    20
c    30
d    40
dtype: int64

In [33]:
# slicing by position
slice = series [1 : 4]
slice

b    20
c    30
d    40
dtype: int64

In [34]:
# Conditional slicing
filtered = series[series > 25]
filtered

c    30
d    40
e    50
dtype: int64

---
### **DATAFRAME INDEXING**
- For working with DataFrames I am using a datset from kaggle.
- This DataFrame captures various health and lifestyle attributes for individuals, including sleep habits, physical activity, stress levels, BMI category, and blood pressure. Such data can be useful for health and wellness studies, personalized health recommendations, and understanding the correlation between different lifestyle factors and health outcomes.

In [35]:
df = pd.read_csv('/kaggle/input/sleep-order/dataset.csv')
df.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea


In [36]:
# accessing columns
df['Age']

0      27
1      28
2      28
3      28
4      28
       ..
369    59
370    59
371    59
372    59
373    59
Name: Age, Length: 374, dtype: int64

In [37]:
# accessing multiple columns
df[['Age', 'Gender']]

Unnamed: 0,Age,Gender
0,27,Male
1,28,Male
2,28,Male
3,28,Male
4,28,Male
...,...,...
369,59,Female
370,59,Female
371,59,Female
372,59,Female


In [38]:
# using .loc with label
df.loc[0]

Person ID                                  1
Gender                                  Male
Age                                       27
Occupation                 Software Engineer
Sleep Duration                           6.1
Quality of Sleep                           6
Physical Activity Level                   42
Stress Level                               6
BMI Category                      Overweight
Blood Pressure                        126/83
Heart Rate                                77
Daily Steps                             4200
Sleep Disorder                           NaN
Name: 0, dtype: object

In [39]:
df.loc[(df.Gender == 'Male') & (df.Age == 27)]

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,


In [40]:
# using iloc[] with integer positions
df.iloc[2]

Person ID                       3
Gender                       Male
Age                            28
Occupation                 Doctor
Sleep Duration                6.2
Quality of Sleep                6
Physical Activity Level        60
Stress Level                    8
BMI Category               Normal
Blood Pressure             125/80
Heart Rate                     75
Daily Steps                 10000
Sleep Disorder                NaN
Name: 2, dtype: object

In [41]:
# accessing rows from 1 to 4 and columns from 2 to 4
df.iloc[1 : 5 , 2 : 5]

Unnamed: 0,Age,Occupation,Sleep Duration
1,28,Doctor,6.2
2,28,Doctor,6.2
3,28,Sales Representative,5.9
4,28,Sales Representative,5.9


In [42]:
# accessing specific elements
df.at[1, 'Age']

28

In [43]:
# Using .iat[] with integer opsitions
# accessing first row's 4th column value i.e sleep duration
df.iat[1, 4]

6.2

In [44]:
# setting an index
df.set_index('Person ID', inplace=True)

In [45]:
# resetting an index
df.reset_index(inplace = True)

In [46]:
df.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea


---
#### **ADVANCED DATAFRAME INDEXING**

In [47]:
# Using query
queried_df = df.query('Age > 25 and Occupation == "Software Engineer"')
queried_df

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
5,6,Male,28,Software Engineer,5.9,4,30,8,Obese,140/90,85,3000,Insomnia
84,85,Male,35,Software Engineer,7.5,8,60,5,Normal Weight,120/80,70,8000,
92,93,Male,35,Software Engineer,7.5,8,60,5,Normal Weight,120/80,70,8000,


In [48]:
# Filter columns
filtered_columns = df.filter(items=['Occupation', 'Age'])
filtered_columns

Unnamed: 0,Occupation,Age
0,Software Engineer,27
1,Doctor,28
2,Doctor,28
3,Sales Representative,28
4,Sales Representative,28
...,...,...
369,Nurse,59
370,Nurse,59
371,Nurse,59
372,Nurse,59


In [49]:
# Filter rows by index
filtered_rows = df.filter(like='00', axis=0)
filtered_rows

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
100,101,Female,36,Teacher,7.2,8,60,4,Normal,115/75,68,7000,
200,201,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia
300,301,Female,51,Engineer,8.5,9,30,3,Normal,125/80,65,5000,


---
## **PRACTICE**
---


In [50]:
# print irst five rows
df.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea


In [51]:
# print last five rows
df.tail()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
369,370,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
370,371,Female,59,Nurse,8.0,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
371,372,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
372,373,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
373,374,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea


In [52]:
# getting a specific column
df['BMI Category'].head()

0    Overweight
1        Normal
2        Normal
3         Obese
4         Obese
Name: BMI Category, dtype: object

In [70]:
# selecting multiple columns
filtered = df[['Gender', 'Occupation', 'Sleep Duration']]
filtered.head()

Unnamed: 0,Gender,Occupation,Sleep Duration
0,Male,Software Engineer,6.1
1,Male,Doctor,6.2
2,Male,Doctor,6.2
3,Male,Sales Representative,5.9
4,Male,Sales Representative,5.9


In [73]:
# getting subset of rows using loc method
filtered = df.loc[(df.Gender == 'Female') & (df.Occupation == 'Nurse')]
filtered.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
16,17,Female,29,Nurse,6.5,5,40,7,Normal Weight,132/87,80,4000,Sleep Apnea
18,19,Female,29,Nurse,6.5,5,40,7,Normal Weight,132/87,80,4000,Insomnia
30,31,Female,30,Nurse,6.4,5,35,7,Normal Weight,130/86,78,4100,Sleep Apnea
31,32,Female,30,Nurse,6.4,5,35,7,Normal Weight,130/86,78,4100,Insomnia
32,33,Female,31,Nurse,7.9,8,75,4,Normal Weight,117/76,69,6800,


In [84]:
# getting the ros where people are suffering from Insomnia
filtered = df.loc[(df['Sleep Disorder'] == 'Insomnia') & (df['Daily Steps'] > 5000) & (df['Gender'] == 'Male')]
filtered.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
146,147,Male,39,Lawyer,7.2,8,60,5,Normal,130/85,68,8000,Insomnia
165,166,Male,41,Lawyer,7.6,8,90,5,Normal,130/85,70,8000,Insomnia
177,178,Male,42,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia
187,188,Male,43,Salesperson,6.3,6,45,7,Overweight,130/85,72,6000,Insomnia
189,190,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia


In [85]:
# subsetting rows using iloc
df.iloc[0]

Person ID                                  1
Gender                                  Male
Age                                       27
Occupation                 Software Engineer
Sleep Duration                           6.1
Quality of Sleep                           6
Physical Activity Level                   42
Stress Level                               6
BMI Category                      Overweight
Blood Pressure                        126/83
Heart Rate                                77
Daily Steps                             4200
Sleep Disorder                           NaN
Name: 0, dtype: object

In [86]:
# Select the first three rows
df.iloc[:3]

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,


In [89]:
# Select specific rows by index positions (e.g., the 1st and 4th rows)
df.iloc[[0, 3]]

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea


In [91]:
# Select rows from position 1 to 6 (excluding 6)
df.iloc[1:6]

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
5,6,Male,28,Software Engineer,5.9,4,30,8,Obese,140/90,85,3000,Insomnia


In [95]:
# Select the first five rows and the 3rd and 4th columns
subset_rows_columns = df.iloc[:5, 3:5]
subset_rows_columns

Unnamed: 0,Occupation,Sleep Duration
0,Software Engineer,6.1
1,Doctor,6.2
2,Doctor,6.2
3,Sales Representative,5.9
4,Sales Representative,5.9


In [98]:
# filter rows based on a condition
filtered = df[(df['Age'] > 40) & (df['Quality of Sleep'] > 8)]
filtered

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
276,277,Male,49,Doctor,8.1,9,85,3,Obese,139/91,86,3700,Sleep Apnea
277,278,Male,49,Doctor,8.1,9,85,3,Obese,139/91,86,3700,Sleep Apnea
279,280,Female,50,Engineer,8.3,9,30,3,Normal,125/80,65,5000,
298,299,Female,51,Engineer,8.5,9,30,3,Normal,125/80,65,5000,
299,300,Female,51,Engineer,8.5,9,30,3,Normal,125/80,65,5000,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
369,370,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
370,371,Female,59,Nurse,8.0,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
371,372,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
372,373,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea


In [104]:
# Group the DataFrame by a specific column and calculate the mean of each group.
grouped = df.groupby('Gender').mean(numeric_only = True)
grouped

Unnamed: 0_level_0,Person ID,Age,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,Heart Rate,Daily Steps
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Female,251.254054,47.405405,7.22973,7.664865,59.140541,4.675676,69.259459,6840.540541
Male,125.095238,37.074074,7.036508,6.968254,59.201058,6.079365,71.05291,6793.650794


In [105]:
grouped = df.groupby('Occupation').mean(numeric_only = True)
grouped

Unnamed: 0_level_0,Person ID,Age,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,Heart Rate,Daily Steps
Occupation,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Accountant,153.054054,39.621622,7.113514,7.891892,58.108108,4.594595,68.864865,6881.081081
Doctor,64.056338,32.676056,6.970423,6.647887,55.352113,6.732394,71.521127,6808.450704
Engineer,245.920635,46.587302,7.987302,8.412698,51.857143,3.888889,67.190476,5980.952381
Lawyer,153.893617,39.425532,7.410638,7.893617,70.425532,5.06383,69.638298,7661.702128
Manager,264.0,45.0,6.9,7.0,55.0,5.0,75.0,5500.0
Nurse,295.849315,51.794521,7.063014,7.369863,78.589041,5.547945,72.0,8057.534247
Sales Representative,4.5,28.0,5.9,4.0,30.0,8.0,85.0,3000.0
Salesperson,218.375,43.53125,6.403125,6.0,45.0,7.0,72.0,6000.0
Scientist,75.5,33.5,6.0,5.0,41.0,7.0,78.5,5350.0
Software Engineer,46.25,31.25,6.75,6.5,48.0,6.0,75.5,5800.0


In [107]:
# Group the DataFrame by multiple columns and calculate the sum of each group.
grouped = df.groupby('BMI Category').sum(numeric_only = True)
grouped

Unnamed: 0_level_0,Person ID,Age,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,Heart Rate,Daily Steps
BMI Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Normal,27310,7504,1441.8,1494,11250,1000,13402,1343000
Normal Weight,2854,806,154.0,156,1267,109,1497,142100
Obese,1349,380,69.6,64,550,57,843,33500
Overweight,38612,7087,1002.0,1021,9063,848,10500,1030900


In [108]:
grouped = df.groupby('Sleep Disorder').sum(numeric_only = True)
grouped

Unnamed: 0_level_0,Person ID,Age,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,Heart Rate,Daily Steps
Sleep Disorder,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Insomnia,16653,3351,507.4,503,3605,452,5426,454400
Sleep Apnea,21285,3877,548.5,562,5834,442,5701,594300


In [131]:
# Using agg method t oapply multiple aggregation functions
grouped = df.groupby('Sleep Disorder')[['Age', 'Sleep Duration']].agg(['mean', 'min', 'max'])
grouped

Unnamed: 0_level_0,Age,Age,Age,Sleep Duration,Sleep Duration,Sleep Duration
Unnamed: 0_level_1,mean,min,max,mean,min,max
Sleep Disorder,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Insomnia,43.519481,28,53,6.58961,5.9,8.3
Sleep Apnea,49.705128,28,59,7.032051,5.8,8.2


In [119]:
grouped = df.groupby('Gender').agg({
    'Age' : 'mean',
    'Daily Steps' : 'sum'
})
grouped

Unnamed: 0_level_0,Age,Daily Steps
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,47.405405,1265500
Male,37.074074,1284000


In [129]:
# Using agg method t oapply multiple aggregation functions
grouped = df.groupby('Sleep Disorder')[['Age', 'Sleep Duration']].agg(['mean', 'min', 'max'])
grouped

Sleep Disorder
Insomnia       77
Sleep Apnea    78
dtype: int64

In [132]:
# Using agg method t oapply multiple aggregation functions
grouped = df.groupby('BMI Category').size()
grouped

BMI Category
Normal           195
Normal Weight     21
Obese             10
Overweight       148
dtype: int64

In [134]:
group_sizes = df.groupby('Sleep Disorder').count()
group_sizes

Unnamed: 0_level_0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps
Sleep Disorder,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Insomnia,77,77,77,77,77,77,77,77,77,77,77,77
Sleep Apnea,78,78,78,78,78,78,78,78,78,78,78,78


In [135]:
group_sizes = df.groupby('Occupation').count()
group_sizes

Unnamed: 0_level_0,Person ID,Gender,Age,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
Occupation,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Accountant,37,37,37,37,37,37,37,37,37,37,37,7
Doctor,71,71,71,71,71,71,71,71,71,71,71,7
Engineer,63,63,63,63,63,63,63,63,63,63,63,6
Lawyer,47,47,47,47,47,47,47,47,47,47,47,5
Manager,1,1,1,1,1,1,1,1,1,1,1,0
Nurse,73,73,73,73,73,73,73,73,73,73,73,64
Sales Representative,2,2,2,2,2,2,2,2,2,2,2,2
Salesperson,32,32,32,32,32,32,32,32,32,32,32,30
Scientist,4,4,4,4,4,4,4,4,4,4,4,2
Software Engineer,4,4,4,4,4,4,4,4,4,4,4,1


In [137]:
query = df.query('25 <= Age <= 28')
query

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
5,6,Male,28,Software Engineer,5.9,4,30,8,Obese,140/90,85,3000,Insomnia


In [152]:
result = df.query("Occupation == 'Doctor' and `Sleep Duration` > 6")
result.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
7,8,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
8,9,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
9,10,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,


In [153]:
result = df.query("`BMI Category` == 'Obese' and `Physical Activity Level` < 40")
result

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
5,6,Male,28,Software Engineer,5.9,4,30,8,Obese,140/90,85,3000,Insomnia


In [155]:
result = df.query('`Sleep Disorder`.notna()')
result

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
5,6,Male,28,Software Engineer,5.9,4,30,8,Obese,140/90,85,3000,Insomnia
6,7,Male,29,Teacher,6.3,6,40,7,Obese,140/90,82,3500,Insomnia
16,17,Female,29,Nurse,6.5,5,40,7,Normal Weight,132/87,80,4000,Sleep Apnea
...,...,...,...,...,...,...,...,...,...,...,...,...,...
369,370,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
370,371,Female,59,Nurse,8.0,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
371,372,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
372,373,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea


In [156]:
# Use isin to filter rows based on a list of values.
occupations = ['Doctor', 'Software Engineer']
result = df[df['Occupation'].isin(occupations)]
result

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
5,6,Male,28,Software Engineer,5.9,4,30,8,Obese,140/90,85,3000,Insomnia
7,8,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
266,267,Male,48,Doctor,7.3,7,65,5,Obese,142/92,83,3500,Insomnia
276,277,Male,49,Doctor,8.1,9,85,3,Obese,139/91,86,3700,Sleep Apnea
277,278,Male,49,Doctor,8.1,9,85,3,Obese,139/91,86,3700,Sleep Apnea
341,342,Female,56,Doctor,8.2,9,90,3,Normal Weight,118/75,65,10000,


In [158]:
bmi_categories = ['Normal', 'Obese']
result = df[df['BMI Category'].isin(bmi_categories)]
result.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
5,6,Male,28,Software Engineer,5.9,4,30,8,Obese,140/90,85,3000,Insomnia


In [159]:
sleep_disorders = ['Sleep Apnea', 'Insomnia']
result = df[df['Sleep Disorder'].isin(sleep_disorders)]
result.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
5,6,Male,28,Software Engineer,5.9,4,30,8,Obese,140/90,85,3000,Insomnia
6,7,Male,29,Teacher,6.3,6,40,7,Obese,140/90,82,3500,Insomnia
16,17,Female,29,Nurse,6.5,5,40,7,Normal Weight,132/87,80,4000,Sleep Apnea


In [162]:
# Select specific columns and rename them.
df.rename(columns={
    'Person ID': 'ID',
    'Age': 'Years',
    'Occupation': 'Job'}, inplace = True)
df.head()

Unnamed: 0,ID,Gender,Years,Job,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
