# COGS 108 - Data Checkpoint

## Authors

- Joel Abutin : Data  wrangling
- Nitika Bhawe : Background Research
- Gabriel Hilmen : Research Design
- Arushi Patra : Hypothesis
- Ishaanee Roy : Ethics

## Research Question

Are demographic and biological variables that individuals cannot change (such as age and gender) more strongly correlated with self-rated daytime sleepiness (or sleep quality) than lifestyle variables that individuals can change (such as physical activity level and BMI), and do these two categories of variables interact with one another in predicting self-rated daytime sleepiness?

## Background and Prior Work

Sleep is an important process for cognitive functioning, emotional regulation, and physical health. Hence, understanding the factors that may influence how people sleep is important for both clinical research and public health interventions. Current research has identified certain externally influenceable factors in one’s lifestyle such as physical activity, screentime, chosen profession and use of drugs such as alcohol. We aim to observe the interaction between sleep quality and such factors through this project. 

Xu et al examined the relationship between Physical activity, self-reported screen time, and sleep quantity and quality. This study looks at a sample of 1136 adolescents aged 16-19  from the 2005–2006 National Health and Nutrition Examination Survey (NHANES) as this is a less common age group studied in such research. They used an accelerometer, a wearable device to estimate physical activity and self-reported data for screen time, sleep quality and quantity for 30 days. They found that meeting recommended screen time guidelines was associated with significantly lower odds of reporting poor sleep quality, and that adolescents who met both physical activity and screen time guidelines had even lower odds of poor sleep, especially among males [1]. These results illustrate that modifiable behaviors like screen time and physical activity are linked to self‑rated sleep quality and may interact differently depending on intrinsic factors such as the sex and behavior of the individual.

Bailey et al aimed to categorise data from Fitbit devices collected from 30,445 participants in the All of Us Research Program. This Program is a national effort to enroll more than 1 million participants for health research. It enables participants to donate Fitbit data, providing a unique dataset for physical activity (PA) and sleep research. For this study, days 15–21 post consent date were selected for analysis of demographic characteristics, wear days, and wear time proxy variables such as heart rate for amount of physical activity [2]. This study demonstrated another way to quantify variations in physical activity and sleep patterns other than surveys.

Nelson et al examined how work demands influence sleep among nearly 3,000 adults from the Midlife in the United States (MIDUS) cohort. The researchers assessed multiple aspects of job demands such as intensity, role conflict and job control, finding that there were significant linear and quadratic relationships between job demands and sleep outcomes. The linear effects indicated that participants with higher job demands had worse sleep health, such as shorter duration, greater irregularity, greater inefficiency, and more sleep dissatisfaction. The quadratic effects indicated that sleep regularity and efficiency outcomes were the best when participants’ job demands were moderate rather than too low or too high [3]. These findings illustrate how variables like occupational stress and control may intersect with both internal and external influences on sleep quality in real-world populations. 

Studies also show strong concurrence between insomnia and alcoholism. Colrain et al reviewed a number of studies involving different research methods from self reported data to EEG scans in order to analyse brain waves indicating different stages of sleep. Alcohol has a profound impact on sleep, with effects dependent on acute versus chronic use and dependence. While alcohol is initially sedating, this effect disappears after a few hours due to decrease in REM sleep. This results in a fragmented and disturbed sleep in the second half of the night. Sustained use of alcohol in chronic alcoholism is associated with major sleep problems [4]. Hence, this study shows multiple ways of gaining data to analyse sleep quality after alcohol use.

While these studies provide important insights, most rely on self-reported sleep measures and cross-sectional designs which introduce potential biases [1][3][4]. Nonetheless, they provide a strong foundation for examining how individual characteristics and lifestyle behaviors together influence perceived sleep quality, which is the focus of the present project.

References:

1. Relationship between Physical Activity, Screen Time, and Sleep Quantity and Quality in US Adolescents Aged 16–19 https://pmc.ncbi.nlm.nih.gov/articles/PMC6539318/
2. Fitbit Physical Activity and Sleep Data in the All of Us Research Program: Data Exploration and Processing Considerations for Research https://pmc.ncbi.nlm.nih.gov/articles/PMC12264798/#S22
3. Goldilocks at Work: Just the Right Amount of Job Demands May be Needed for Your Sleep Health https://pmc.ncbi.nlm.nih.gov/articles/PMC9991992/#S24
4. Alcohol and the sleeping brain https://pmc.ncbi.nlm.nih.gov/articles/PMC5821259/

## Hypothesis

Self-rated sleep quality is influenced by both modifiable and non-modifiable factors. Higher levels of modifiable health behaviors (e.g., greater physical activity and healthier BMI) will be associated with more positive self-reported sleep quality. However, this relationship will be moderated by non-modifiable characteristics such as age and gender. Specifically, the strength and direction of the association between modifiable factors and sleep quality will differ across age groups and between genders, indicating an interaction effect between variables that can be changed and variables that cannot be changed.

## Data

### Data overview

- **Dataset #1**
  - **Dataset Name:** NHANES 2017-2020 Customized Dataset
  - **Link to the dataset:** https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?Cycle=2017-2020
  - **Number of observations:** 680
  - **Number of variables:** 12
  - **Description of the variables most relevant to this project:** `daytime sleepiness` is the variable to measure quality of sleep the night before.
  - **Descriptions of any shortcomings this dataset has with respect to the project:** Most of the survey data is through a questionnare rather than using measurement tools.
  
The NHANES dataset is split into multiple XPT files where each subject has a unique ID called a SEQN (respondent sequence number) as the subject may be surveyed in multiple XPT files. A python tool (https://github.com/Yousuf28/xpt2csv) will convert the XPT files to CSV. When merging datasets into a single one, observations from different CSV files that have the same ID will have their rows merged together. After the merger, any observations/rows containing missing values will be removed; all rows will have every column filled with data.

The following individual NHANES datasets are used for merging into a single dataset: demographics, body measures, alcohol use, physical activity, sleep disorders, and cigarette smoking use.

### Dataset #1 - NHANES 2017-2020 Customized Dataset

The variable `daytime sleepiness` is what will be used to determine daytime sleepiness (or sleep quality the night before). All other variables will be examined solely and combined on how they affect daytime sleepiness. High daytime sleepiness correlates with low sleep quality, and low daytime sleepiness correlates with high sleep quality. For example, one of the responses was 2-4/month, meaning that the person got 2-4 days of poor quality sleep that month. This data is qualitative and ordinal, ordered by increasing days of sleepiness out of the month: 0/mo, 1/mo, 2-4/mo, 5-15/mo, 16-30/mo. Other qualitative ordinal variables are `alcohol use`, `snore`, and `cigarette use`. They are in the form of events/time_period, such as 1/day, 1/week, 1/mo, and 1-2/year. The variable `gender` is a qualitative nominal variable with the unsorted values of male or female.

Quantitative variables are `age` measured in years, `BMI` measured in kg/m^2, and `walk/bicycle (min/day)` measuring the average minutes per day the subject walked or biked. The variable `sedentary` measures stationary activities such as sitting at school, at home, getting to and from places, or with friends including time spent sitting at a desk, traveling in a car or bus, reading, playing cards, watching television, or using a computer; time spent sleeping is not included in this measurement. The variables `hours slept (weekday)` and `hours slept (weekend)` measure average sleep time during weekdays and weekends.

There are some concerns involved with dataset. The body measures data for `BMI` were collected in the Mobile Examination Center (MEC) by trained health technicians. The rest of the variables are from interviewing subjects with a questionnare list, which can be inaccurate by the subject's answers being ignorant, forgetful, or lieing. Sleep measurement data can be objectively measured with a tracker device, but the sleep measurement data for this dataset was achieved through an interview question, presumabely to save on money and speed up things. Another concern is inconsistency of qualitative variable measurements. For example, the values for the variable `cigarette use` are never, 2-7/week, and 1/day; compared with the `snore` variable where the values are never, 1-2/week, 3-4/week, 5+/week. Also, there is no clear information on socioeconomic status, occupation, health conditions, or geographic location, which are important confounders when performing data analysis on this dataset.



In [None]:
from functools import reduce
import numpy as np
import pandas as pd

# --------------------
# Demographics
# --------------------

df_demo = pd.read_csv('data/00-raw/NHANES_2017-2020_DEMO_DEMO.csv') # Import dataset
df_demo = df_demo[['SEQN', 'RIAGENDR', 'RIDAGEYR']]                 # Keep specific variables
df_demo = df_demo.rename(columns = {'SEQN' : 'ID',
                                    'RIAGENDR' : 'gender',
                                    'RIDAGEYR' : 'age (year)'})     # Make columns more readable
df_demo['ID'] = df_demo['ID'].astype(int)                           # Remove decimal from ID

df_demo = df_demo[df_demo['age (year)'].between(0, 80)]       # Only ages 0-80
df_demo['age (year)'] = df_demo['age (year)'].astype(int)     # Remove decimal from age
df_demo['gender'] = df_demo['gender'].replace({1 : 'male',
                                               2 : 'female'}) # Convert numbers to string

# --------------------
# Body measures
# --------------------

df_bm = pd.read_csv('data/00-raw/NHANES_2017-2020_EXAM_BM.csv') # Import dataset
df_bm = df_bm[['SEQN', 'BMXBMI']]                               # Keep specific variables
df_bm = df_bm.rename(columns = {'SEQN' : 'ID',
                                'BMXBMI' : 'BMI'})              # Make columns more readable
df_bm['ID'] = df_bm['ID'].astype(int)                           # Remove decimal from ID

df_bm[df_bm['BMI'].between(11.9, 92.3)] # Only BMI 11.9-92.3

# --------------------
# Alcohol use
# --------------------

df_al = pd.read_csv('data/00-raw/NHANES_2017-2020_QUES_AL.csv') # Import dataset
df_al = df_al[['SEQN', 'ALQ121']]                               # Keep only ID and BMI
df_al = df_al.rename(columns = {'SEQN' : 'ID',
                                'ALQ121' : 'alcohol use'})      # Make columns more readable
df_al['ID'] = df_al['ID'].astype(int)                           # Remove decimal from ID

df_al['alcohol use'] = df_al['alcohol use'].replace({0 : 'never',
                                                     1 : '1/day',
                                                     2 : '5-6/week',
                                                     3 : '3-4/week',
                                                     4 : '2/week',
                                                     5 : '1/week',
                                                     6 : '2-3/mo',
                                                     7 : '1/mo',
                                                     8 : '7-11/year',
                                                     9 : '3-6/year',
                                                     10 : '1-2/year'}) # Convert numbers to description
order = pd.CategoricalDtype(categories = ['never',
                                          '1-2/year',
                                          '3-6/year',
                                          '7-11/year',
                                          '1/mo',
                                          '2-3/mo',
                                          '1/week',
                                          '2/week',
                                          '3-4/week',
                                          '5-6/week',
                                          '1/day'],
                                          ordered = True)              # 1-2 sort categorical variables
df_al['alcohol use'] = df_al['alcohol use'].astype(order)              # 2-2 sort categorical variables

# --------------------
# Physical activity
# --------------------

df_pa = pd.read_csv('data/00-raw/NHANES_2017-2020_QUES_PA.csv')      # Import dataset
df_pa = df_pa[['SEQN', 'PAD645', 'PAD680']]                          # Keep specific variables
df_pa = df_pa.rename(columns = {'SEQN' : 'ID',
                                'PAD645' : 'walk/bicycle (min/day)',
                                'PAD680' : 'sedentary (min/day)'})   # Make columns more readable
df_pa['ID'] = df_pa['ID'].astype(int)                                # Remove decimal from ID

df_pa = df_pa[df_pa['walk/bicycle (min/day)'].between(10, 840)]               # Only walk/bicycle 10-840
df_pa = df_pa[df_pa['sedentary (min/day)'].between(0, 1320)]                  # Only sedentary 0-1320
df_pa['walk/bicycle (min/day)'] = df_pa['walk/bicycle (min/day)'].astype(int) # Remove decimal from walk/bicycle
df_pa['sedentary (min/day)'] = df_pa['sedentary (min/day)'].astype(int)       # Remove decimal from sedentary

# --------------------
# Sleep disorders
# --------------------

df_sl = pd.read_csv('data/00-raw/NHANES_2017-2020_QUES_SL.csv')     # Import dataset
df_sl = df_sl[['SEQN', 'SLD012', 'SLD013', 'SLQ030', 'SLQ120']]     # Keep specific variables
df_sl = df_sl.rename(columns = {'SEQN' : 'ID',
                                'SLD012' : 'hours slept (weekday)',
                                'SLD013' : 'hours slept (weekend)',
                                'SLQ030' : 'snore',
                                'SLQ120' : 'daytime sleepiness'})   # Make columns more readable
df_sl['ID'] = df_sl['ID'].astype(int)                               # Remove decimal from ID

df_sl = df_sl[df_sl['hours slept (weekday)'].between(2, 14)] # Only sleep (weekday) 2-14
df_sl = df_sl[df_sl['hours slept (weekend)'].between(2, 14)] # Only sleep (weekend) 2-14

df_sl['snore'] = df_sl['snore'].replace({0 : 'never',
                                         1 : '1-2/week',
                                         2 : '3-4/week',
                                         3 : '5+/week'})  # Convert numbers to description
order = pd.CategoricalDtype(categories = ['never',
                                          '1-2/week',
                                          '3-4/week',
                                          '5+/week'],
                                          ordered = True) # 1-2 sort categorical variables
df_sl['snore'] = df_sl['snore'].astype(order)             # 2-2 sort categorical variables

df_sl['daytime sleepiness'] = df_sl['daytime sleepiness'].replace({0 : 'never',
                                                                   1 : '1/mo',
                                                                   2 : '2-4/mo',
                                                                   3 : '5-15/mo',
                                                                   4 : '16-30/mo'}) # Convert numbers to description
order = pd.CategoricalDtype(categories = ['never',
                                          '1/mo',
                                          '2-4/mo',
                                          '5-15/mo',
                                          '16-30/mo'],
                                          ordered = True)                           # 1-2 sort categorical variables
df_sl['daytime sleepiness'] = df_sl['daytime sleepiness'].astype(order)             # 2-2 sort categorical variables

# --------------------
# Smoking
# --------------------

df_sm = pd.read_csv('data/00-raw/NHANES_2017-2020_QUES_SM.csv') # Import dataset
df_sm = df_sm[['SEQN', 'SMQ040']]                               # Keep specific variables
df_sm = df_sm.rename(columns = {'SEQN' : 'ID',
                                'SMQ040' : 'cigarette use'})    # Make columns more readable
df_sm['ID'] = df_sm['ID'].astype(int)                           # Remove decimal from ID

df_sm['cigarette use'] = df_sm['cigarette use'].replace({1 : '1/day',
                                                         2 : '2-7/week',
                                                         3 : 'never'})   # Convert numbers to description
order = pd.CategoricalDtype(categories = ['never',
                                          '2-7/week',
                                          '1/day'],
                                          ordered = True)                # 1-2 sort categorical variables
df_sm['cigarette use'] = df_sm['cigarette use'].astype(order)            # 2-2 sort categorical variables

# --------------------
# Cleanup
# --------------------

df_demo = df_demo.dropna() # Remove rows with NaN data
df_bm = df_bm.dropna()     # Remove rows with NaN data
df_al = df_al.dropna()     # Remove rows with NaN data
df_pa = df_pa.dropna()     # Remove rows with NaN data
df_sl = df_sl.dropna()     # Remove rows with NaN data
df_sm = df_sm.dropna()     # Remove rows with NaN data

# --------------------
# Merge dataset
# --------------------

df_list = [df_demo, df_bm, df_al, df_pa, df_sl, df_sm]
df_final = reduce(lambda left, right: pd.merge(left, right, on='ID', how='inner'), df_list) # Merge rows with same ID

df_final

Unnamed: 0,ID,gender,age (year),BMI,alcohol use,walk/bicycle (min/day),sedentary (min/day),hours slept (weekday),hours slept (weekend),snore,daytime sleepiness,cigarette use
0,109334,female,54,24.5,never,60,120,7.0,5.0,1-2/week,1/mo,1/day
1,109335,female,55,39.6,1/mo,120,240,9.0,9.0,5+/week,1/mo,1/day
2,109342,female,43,35.5,2/week,60,600,6.5,8.0,3-4/week,16-30/mo,1/day
3,109365,female,49,32.0,1/day,20,120,9.5,9.5,never,5-15/mo,1/day
4,109393,male,76,21.5,3-6/year,45,60,9.0,8.0,3-4/week,5-15/mo,2-7/week
...,...,...,...,...,...,...,...,...,...,...,...,...
675,124730,male,74,31.1,never,10,180,10.0,10.0,never,never,never
676,124737,male,66,21.1,1/mo,30,10,7.5,8.0,never,5-15/mo,1/day
677,124753,male,21,27.4,1/day,15,120,8.5,6.0,never,never,1/day
678,124814,male,64,37.5,2/week,20,300,8.0,7.0,3-4/week,2-4/mo,never


In [2]:
df_final.size

8160

In [3]:
df_final.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 680 entries, 0 to 679
Data columns (total 12 columns):
 #   Column                  Non-Null Count  Dtype   
---  ------                  --------------  -----   
 0   ID                      680 non-null    int64   
 1   gender                  680 non-null    object  
 2   age (year)              680 non-null    int64   
 3   BMI                     680 non-null    float64 
 4   alcohol use             680 non-null    category
 5   walk/bicycle (min/day)  680 non-null    int64   
 6   sedentary (min/day)     680 non-null    int64   
 7   hours slept (weekday)   680 non-null    float64 
 8   hours slept (weekend)   680 non-null    float64 
 9   snore                   680 non-null    category
 10  daytime sleepiness      680 non-null    category
 11  cigarette use           680 non-null    category
dtypes: category(4), float64(3), int64(4), object(1)
memory usage: 46.2+ KB


In [4]:
df_final.describe()

Unnamed: 0,ID,age (year),BMI,walk/bicycle (min/day),sedentary (min/day),hours slept (weekday),hours slept (weekend)
count,680.0,680.0,680.0,680.0,680.0,680.0,680.0
mean,117315.660294,49.457353,28.59,62.873529,299.161765,7.577941,8.008824
std,4557.478584,16.914441,6.786278,77.314795,190.182155,1.800985,1.975903
min,109334.0,18.0,14.9,10.0,0.0,2.0,2.0
25%,113338.0,34.75,23.6,20.0,180.0,6.5,7.0
50%,117375.0,51.0,27.45,30.0,240.0,7.5,8.0
75%,121324.5,63.0,32.625,60.0,420.0,8.5,9.0
max,124815.0,80.0,61.9,660.0,1200.0,14.0,14.0


## Ethics

### A. Data Collection

 - [X] **A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?

> The data by NHANES has been conducted on an ongoing basis, with public-use data being released in two-year cycles. The sample for each two-year cycle is representative of the non-institutionalized U.S. population. As participation in NHANES is voluntary, participants had informed consent. 

 - [X] **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?

> Until 2020, NHANES had an oversampling of certain races, Hispanic origin, age, and income groups. However, the sample design was modified to remove this bias. Additionally, to reduce oversampling of certain age groups such as 0-12, 12-19 and >70, the in-household survey was modified to include 0-19 and >60 also eligible, and then they created a system to randomly select adults in the age range 25-59. Plus, bilingual field interviewers were present while interviewing English and Spanish language respondents. Only Spanish or English-speaking participants were chosen.

 - [X] **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII), for example through anonymization or not collecting information that isn't relevant for analysis?

> The data is taken from the 2024 National Center of Health Statistics(NCHS) under Centers for Disease Control and Prevention (CDC). These surveys for statistical analysis are under the authority of the Public Health Service Act and are protected by federal confidentiality laws. Therefore, these are anonymised and will be used only for statistical analysis. NCHS does its best not to disclose any personal information by omitting personal data and identifiers, and has strict rules for anyone who tries to violate them. 

 - [X] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?

> The data includes demographic variables like race/ethnicity. However, as our research question is more inclined towards sleep health based on other factors, we may or may not use data on protected groups. If we do, it will have an unbiased random sample because it is representative of the non-institutionalized U.S. population.

### B. Data Storage

 - [X] **B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?

 - [X] **B.2 Right to be forgotten**: Do we have a mechanism through which an individual can request their personal information be removed?

> As this is a public and de-identified dataset, individuals will have to request the original contributors to remove their data.

 - [X] **B.3 Data retention plan**: Is there a schedule or plan to delete the data after it is no longer needed?

### C. Analysis

 - [X] **C.1 Missing perspectives**: Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g., checking assumptions and discussing implications with affected communities and subject matter experts)?

 - [X] **C.2 Dataset bias**: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)?

 - [ ] **C.3 Honest representation**: Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data?

 - [X] **C.4 Privacy in analysis**: Have we ensured that data with PII are not used or displayed unless necessary for the analysis?

 - [X] **C.5 Auditability**: Is the process of generating the analysis well documented and reproducible if we discover issues in the future?

### D. Modeling

 - [ ] **D.1 Proxy discrimination**: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory?

 - [ ] **D.2 Fairness across groups**: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)?

 - [ ] **D.3 Metric selection**: Have we considered the effects of optimizing for our defined metrics and considered additional metrics?

 - [ ] **D.4 Explainability**: Can we explain in understandable terms a decision the model made in cases where a justification is needed?

 - [ ] **D.5 Communicate limitations**: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood?

### E. Deployment

 - [ ] **E.1 Monitoring and evaluation**: Do we have a clear plan to monitor the model and its impacts after it is deployed (e.g., performance monitoring, regular audit of sample predictions, human review of high-stakes decisions, reviewing downstream impacts of errors or low-confidence decisions, testing for concept drift)?
 
 - [ ] **E.2 Redress**: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)?

 - [ ] **E.3 Roll back**: Is there a way to turn off or roll back the model in production if necessary?

 - [ ] **E.4 Unintended use**: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed?

## Team Expectations 

Instructions: 
- All project members will communicate through Discord and respond to messages preferably within 8-12 hours.
- Meetings will occur at minimum weekly through Discord and tasks will be assigned during these meetings.
- Project members struggling on their tasks will ask for help as soon as possible so other members can provide assistance.
- If a member has not responded in a lengthly time, such as 48 hours or more, a welfare check will be attempted by contacting through Discord, email, and phone. If the member still has not responded, contact will be made with a TA or professor on what to do next.

## Project Timeline Proposal

| Meeting Date  | Meeting Time | Completed Before Meeting | Discuss at Meeting |
|---|---|---|---|
| 2025-02-02 Monday | 2:30 PM | Review the project proposal Jupyter notebook | Assign each project member to complete a section of the project propsal |
| 2025-02-04 Wednesday | Project proposal due | | |
| 2025-02-09 Monday | 3:00 PM | Review the data checkpoint Jupyter notebook | Assign each project member to complete a section of the data checkpoint |
| 2025-02-16 Monday | 3:00 PM | Each member does as much of their assigned task | Checkup on each member's progress, assist other members if necessary |
| 2025-02-18 Wednesday | Data checkpoint due | | |
| 2025-02-23 Monday | 3:00 PM | Review the EDA checkpoint Jupyter notebook | Assign each project member to complete a section of the EDA checkpoint |
| 2025-03-02 Monday | 3:00 PM | Each member does as much of their assigned task | Checkup on each member's progress, assist other members if necessary |
| 2025-03-04 Wednesday | EDA checkpoint due | | |
| 2025-02-09 Monday | 3:00 PM | Review the final project Jupyter notebook | Assign each project member to complete a section of the final project |
| 2025-03-16 Monday | 3:00 PM | Each member does as much of their assigned task | Checkup on each member's progress, assist other members if necessary |
| 2025-03-18 Wednesday | Final project due | | |