# Lab Instructions

Find a dataset that interests you. I'd recommend starting on [Kaggle](https://www.kaggle.com/). Read through all of the material about the dataset and download a .CSV file.

1. Write a short summary of the data.  Where did it come from?  How was it collected?  What are the features in the data?  Why is this dataset interesting to you?  

2. Identify 5 interesting questions about your data that you can answer using Pandas methods.  

3. Answer those questions!  You may use any method you want (including LLMs) to help you write your code; however, you should use Pandas to find the answers.  LLMs will not always write code in this way without specific instruction.  

4. Write the answer to your question in a text box underneath the code you used to calculate the answer.



## 1. Dataset

For this project, I am using the The Sleep Disorder Diagnosis Dataset or Sleep Health and Lifestyle Dataset which I found on Kaggle. contains 374 rows and 13 columns, capturing details about sleep patterns, lifestyle habits, and related health indicators.  It is a synthetic dataset created for educational purproses.

The features of the data are id, age, gender, occupation, sleep duration, quality of sleep, physical activity, stress level, BMI category, blood pressure.  

I picked this dataset because I've always been interested in applications of data science in health and medical research

(Note: Any reasonable answer from students is acceptable).

## 2. Questions

1. What occupations get the most and least sleep?

2. Is longer sleep duration correlated with lower blood pressure?

3. Is shorter sleep duration correlated with higher stress level?

4. Is greater physical activity correlated with longer sleep duration?

5. Is greater physical activity correlated with higher sleep quality?

(Note: Any reasonable answer from students to this question or those following is acceptable as long as they are using **Pandas** methods in their code.)

In [12]:
import pandas as pd

df = pd.read_csv("Sleep_health_and_lifestyle_dataset.csv")


### 1. What occupations get the most and least sleep?


In [13]:
sleep_by_occupation = df.groupby('Occupation')['Sleep Duration'].mean().sort_values(ascending=False)

print("Occupations with most and least sleep on average:")
print(sleep_by_occupation)

Occupations with most and least sleep on average:
Occupation
Engineer                7.987302
Lawyer                  7.410638
Accountant              7.113514
Nurse                   7.063014
Doctor                  6.970423
Manager                 6.900000
Software Engineer       6.750000
Teacher                 6.690000
Salesperson             6.403125
Scientist               6.000000
Sales Representative    5.900000
Name: Sleep Duration, dtype: float64


Engineers get the most sleep on average and sales representatives get the least sleep on average.

### 2. Is longer sleep duration correlated with lower blood pressure?

In [14]:


bp_split = df['Blood Pressure'].str.split('/', expand=True).astype(float)
df['Systolic_BP'] = bp_split[0]
df['Diastolic_BP'] = bp_split[1]

corr_sleep_systolic = df['Sleep Duration'].corr(df['Systolic_BP'])
corr_sleep_diastolic = df['Sleep Duration'].corr(df['Diastolic_BP'])

print("\nCorrelation between Sleep Duration and Blood Pressure:")
print("Systolic:", corr_sleep_systolic)
print("Diastolic:", corr_sleep_diastolic)



Correlation between Sleep Duration and Blood Pressure:
Systolic: -0.18040627643004584
Diastolic: -0.16656986850262193


There is a very slight correlation between longer sleep duration and lower systolic and diastolic blood pressure.

### 3. Is shorter sleep duration correlated with higher stress level? 

In [15]:

corr_sleep_stress = df['Sleep Duration'].corr(df['Stress Level'])
print("\nCorrelation between Sleep Duration and Stress Level:", corr_sleep_stress)



Correlation between Sleep Duration and Stress Level: -0.8110230278940451


There is a decently high correlation between longer sleep and lower stress level.

### 4. Is greater physical activity correlated with longer sleep duration?

In [16]:

corr_activity_sleep = df['Physical Activity Level'].corr(df['Sleep Duration'])
print("\nCorrelation between Physical Activity Level and Sleep Duration:", corr_activity_sleep)



Correlation between Physical Activity Level and Sleep Duration: 0.21236031472575861


Physical activity has only a slight correlation with longer sleep duration.

### 5. Is greater physical activity correlated with higher sleep quality?

In [17]:

corr_activity_quality = df['Physical Activity Level'].corr(df['Quality of Sleep'])
print("\nCorrelation between Physical Activity Level and Quality of Sleep:", corr_activity_quality)


Correlation between Physical Activity Level and Quality of Sleep: 0.19289645493975302


Physical activity has only a slight positive correlation with higher sleep quality.