<a href="https://colab.research.google.com/github/Tedodor423/bms-collab-notebooks/blob/main/stats/1.1_DescribingData/1.1.11_ExtraPractice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Extra Practice

This is meant to help you practise the same core skills you developed in the previous exercises. Completing these exercises are **optional** and only meant to provide a little extra practice if you want.


### Set up Python Libraries

As usual you will need to run this code block to import the relevant Python libraries

In [1]:
# Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas as pd
import seaborn as sns
sns.set_theme(style='white')
import statsmodels.api as sm
import statsmodels.formula.api as smf
import warnings
warnings.simplefilter('ignore', category=FutureWarning)

The Pandas library includes a range of built-in functions that make this quick and straightforward:

df.mean() - gets the mean

df.median() - gets the median

df.var() - gets the variance

df.std() - gets the standard deviation

df.min() - gets the minimum value

df.max() - gets the maximum value

df.corr() - gets the correlation coefficient (Pearson or Spearman)


### Import a dataset to work with

Here we will read in a data set which covers a wide range of variables related to sleep and daily habits.

* `Person ID`: An identifier for each individual.
* `Gender`: The prefered gender identity of the person.
* `Age`: The age of the person in years.
* `Occupation`: The occupation or profession of the person.
* `Sleep Duration` (hours): The number of hours the person sleeps per day.
* `Quality of Sleep` (scale: 1-10): A subjective rating of the quality of sleep, ranging from 1 to 10.
* `Physical Activity Level` (minutes/day): The number of minutes the person engages in physical activity daily.
* `Stress Level (scale: 1-10)`: A subjective rating of the stress level experienced by the person, ranging from 1 to 10.
* `BMI Category`: The BMI category of the person (e.g., Underweight, Normal, Overweight).
* `Blood Pressure` (systolic/diastolic): The blood pressure measurement of the person, indicated as systolic pressure over diastolic pressure.
* `Heart Rate` (bpm): The resting heart rate of the person in beats per minute.
* `Daily Steps`: The number of steps the person takes per day.
* `Sleep Disorder`: The presence or absence of a sleep disorder in the person (None, Insomnia, Sleep Apnea).

In [3]:
sleep = pd.read_csv("https://raw.githubusercontent.com/SageBoettcher/StatsCourseBook_2026/main/data/sleep_health_data.csv")
display(sleep)

Unnamed: 0,PersonID,Gender,Age,Occupation,SleepDuration,QualityofSleep,PhysicalActivityLevel,StressLevel,BMICategory,BloodPressure,HeartRate,DailySteps,SleepDisorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
...,...,...,...,...,...,...,...,...,...,...,...,...,...
369,370,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
370,371,Female,59,Nurse,8.0,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
371,372,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
372,373,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea


## Exercises

In the following questions, you'll use **descriptive statistics** and **indexing** to explore questions about sleep and health.

When you are asked to calculate a value (for example, a mean or standard deviation) rather than produce a full table, you should **report your answer in words** in the text box below the code block. This is exactly how you would do it in a written report.


When the question asks you to “comment”, you are being asked to *interpret* the data. That is, explain what you notice, what patterns stand out, or what the numbers might mean in context. Use plain English and discuss your ideas with your tutor and classmates. Developing the skill of turning numbers into insight is one of the most important parts of learning data analysis.

### Part 1: Sleep Duration

a. What is the average sleep duration across all participants?

In [4]:
# Your code here
sleep.SleepDuration.mean()

np.float64(7.132085561497325)

*your text here*

b. Compare the mean sleep duration across the Gender

In [7]:
# Your code here
print(sleep.query("Gender == 'Male'").SleepDuration.mean())
print(sleep.query("Gender == 'Female'").SleepDuration.mean())

7.036507936507937
7.22972972972973


*your text here*

c. Comment on your findings.

### Part 2: Stress and Activity

a. What is the average **physical activity level**, **sleep duration**, and **DailySteps** for participants across stress levels?

In [8]:
# Your code here
sleep.groupby("StressLevel").agg({"PhysicalActivityLevel": ["mean"], "SleepDuration": ["mean"], "DailySteps": ["mean"], })

Unnamed: 0_level_0,PhysicalActivityLevel,SleepDuration,DailySteps
Unnamed: 0_level_1,mean,mean,mean
StressLevel,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
3,54.71831,8.226761,6132.394366
4,55.785714,7.03,6422.857143
5,74.253731,7.483582,7616.41791
6,67.152174,7.454348,7363.043478
7,43.84,6.468,5662.0
8,58.342857,6.05,7605.714286


b. Split the data set into high stress and low stress individuals

In [10]:
# Your code here
highstress = sleep.query("StressLevel <= 5").agg({"PhysicalActivityLevel": ["mean"], "SleepDuration": ["mean"], "DailySteps": ["mean"], })
lowstress =  sleep.query("StressLevel >= 6").agg({"PhysicalActivityLevel": ["mean"], "SleepDuration": ["mean"], "DailySteps": ["mean"], })

display(highstress)
display(lowstress)

Unnamed: 0,PhysicalActivityLevel,SleepDuration,DailySteps
mean,61.370192,7.584615,6708.173077


Unnamed: 0,PhysicalActivityLevel,SleepDuration,DailySteps
mean,56.415663,6.56506,6953.012048


c. Which group is most physically active?

In [11]:
# see above

### Part 3: Age and Sleep

a. What is the relationship between Age and Sleep

In [12]:
# Your code here
sleep.Age.corr(sleep.SleepDuration)

np.float64(0.34470935816474396)

b. What sort of things might explain this relationship

*your text here*

### Part 4: Open Exploration

What other relationships might be interesting to explore?
