**If you lost points on the last checkpoint you can get them back by responding to TA/IA feedback**  

Update/change the relevant sections where you lost those points, make sure you respond on GitHub Issues to your TA/IA to call their attention to the changes you made here.

Please update your Timeline... no battle plan survives contact with the enemy, so make sure we understand how your plans have changed.

# COGS 108 - Data Checkpoint

# Names

- Edna Ho
- Amanda Chang
- Alina Pham
- Noelle Lam
- Aarya Gupta

# Research Question

Is there a relationship between sleep and physical activity among adults that are overweight? To analyze this relationship, we will be quantifying quality of sleep by measuring the average hours slept each night. We will be quantifying physical activity by measuring an individuals average step count per day.


## Background and Prior Work

There are many factors that influence sleep, whether that be age, alcohol, or stress. It is shared knowledge that exercise plays a role in the quality and quantity of sleep. It is normal for one to feel more sleepy or drained after exercising. When exercising you're increasing both blood flow and heart rate making you more awake and wired. After some time, sleepiness occurs. The reason for that is because as we exercise our energy in the form of ATP is depleted as our muscles contract and there's an increase in neurotransmitters, resulting in muscle fatigue.  <a href="#cite_note-1">[1]</a> We hope to further prove this point by tracking the heart rate of runners and the quality of sleep one gets through their sleep cycle. Of course we understand that there are other factors to consider when discussing what impacts sleep quality like stress and lifestyle, however, we'll focus on the affects of exercise specifically.

There have been many studies done to support the idea of running having positive effects on people's sleep. One study conducted a test with 51 healthy adolescents running regularly in the morning and compared them to control subjects. They examined the electoencepahlographic patterns and psychological functionings of both groups before and after the three week period. By the end of the test the results were that "objective sleep improved (slow-wave sleep increased; sleep onset latency decreased) in the running group compared with the control group".  <a href="#cite_note-2">[2]</a> Supporting the hypothesis that the quality of sleep improves when running. However, this is only accounting for those in the age of adolescence, the mean age of 18.30 years, so it fails to account for any people older than the age of adolescence.

Another study was done to investigate the effects of exercise in those of an older age group. This study was conducted on participants 40 years or older with the mean age being 62 years old. Participants were then a part of a 12-week exercise training program. Data was measured through sleep quality assessments, a cardiopulmonary exercise test, and heart rate variability assessment. At the end of the program the results were that moderate intensity exercise training had a beneficial significance between exercise and cardiac autonomic function.  <a href="#cite_note-3">[3]</a> The study suggests, even those older who exercise also experience better sleep, similar to the benefits observed in the adolescents.

While many studies we have found support the claim that exercise enhances the sleep quality, we aim to be more specific in our findings. Showcasing the correlation of heart rate when running to the sleep cycles, and the improvement in sleep quality. Additionally considering how the level of experience and gender can connect and affect the outcome as well.

References:
References:
1. <a name="cite_note-1"></a> [^](#cite_ref-1) Nunez, K. (26 Jun 2020) Is It Normal to Take a Nap After a Workout? *healthline*. https://www.healthline.com/health/healthy-sleep/sleep-after-workout
2. <a name="cite_note-2"></a> [^](#cite_ref-2) Kalak, N., Gerber, M., Kirov, R., Mikoteit, T., Yordanova, J., Phuse, U., Holsboer, E., Brand, S. (2012) Daily Morning Running for 3 Weeks Improved Sleep and Psychological Functioning in Healthy Adolescents Compared With Controls *ScienceDirect*. https://www.sciencedirect.com/science/article/pii/S1054139X12001115
3. <a name="cite_note-3"></a> [^](#cite_ref-3) Tseng, T., Chen, H., Wang, Li., Chien, M. (15 Sep 2020) Effects of exercise training on sleep quality and heart rate variability in middle-aged and older adults with poor sleep quality: a randomized controlled trial *Journal of Clinical Sleep Medicine*. https://jcsm.aasm.org/doi/full/10.5664/jcsm.8560


# Hypothesis


Our hypothesis for this questions is that there will be a strong positive correlation between physical activity (measured by daily step counts)  and sleep quality (measured by average hours slept per night) among overweight adults. Adults with high step counts per day are more likely to experience longer sleep durations on average and could also experience lower resting heart rates which indicate good or improving cardiovascular health and sleep quality. We think this is because evidence points to the idea that regular physical activity can promote better sleep because of the release of adenosine that occurs during physical activity. Adenosine can result in a lower resting heart rate and can lead to deeper and higher quality of sleep.
We expect that the positive correlation will be shown through an increase in average hours slept per night, a higher rating for quality of sleep, and a consistent or sustained decrease in individual’s resting heart rate when adults have a consistent or increasing number of daily step counts.

# Data

## Data overview

For each dataset include the following information
- Dataset #1
  - Dataset Name: Sleep and physical activity among adults with overweight and obesity
  - Link to the dataset: https://osf.io/ga2mw
  - Number of observations: 33
  - Number of variables: 17
  - In this dataset, 33 overweight or obese adults participated in this experiment. All participants wore the Fitbit Charge 3 day and night for 7,094 in order to see total sleep time (minutes), sleep efficiency (percent), and daily step counts (discrete integer) which are the 3 important variables to consider. Sleep outcomes were estimated via combined accelerometry, heart rate, and heart rate variability signals via Fitbit’s proprietary algorithm while daily step counts were estimated from the Fitbit via triaxial accelerometry.


**sex:** a numerical value of 1 or 2 that identifies what sex the participant is. 1 indicates male, 2 indicates female \
**age:** a numerical value that represents the age the participant is \
**study_length:** a numerical value that represents the number of days the participant has done this study for \
**perc_NA_steps:** The percentage of missing or non-available step data relative to the total possible step data points for a participant \
**perc_NA_sleep:** The percentage of missing or non-available sleep data relative to the total possible step data points for a participant \
**avg_steps:** the average number of steps that a participant takes in a day during the duration of the study  \
**avg_sleep;** the average number of sleep in hours that a participant sleeps in a day during the duration of the study \
**avg_sleep_eff:** the percentage of the average sleep in which the participant actually spends sleeping. This variable omits the amount of time they are awake in bed


## Dataset #1: supplemental_material#2.csv

### Setup

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format ='retina'

In [2]:
# read in file
df = pd.read_csv('supplemental_material#2.csv')
df

Unnamed: 0,id,sex,age,NA_sleep,NA_steps,study_length,perc_NA_sleep,perc_NA_steps,min_steps,max_steps,avg_steps,min_sleep,max_sleep,avg_sleep,min_sleep_eff,max_sleep_eff,avg_sleep_eff
0,536,2,20,0,0,264,1.0,1.0,3865,27007,13534.830189,112,674,449.732075,0.775717,0.957143,0.874021
1,887,1,23,0,0,158,1.0,1.0,3914,32341,16091.798742,83,600,384.937107,0.780051,0.963504,0.870252
2,515,2,33,1,0,267,0.996255,1.0,1140,20037,7823.906716,188,783,432.782772,0.786164,0.981651,0.89601
3,878,1,23,1,0,197,0.994924,1.0,1161,26915,10613.944444,75,711,406.436548,0.734607,0.961538,0.859374
4,1063,1,20,1,1,142,0.992958,0.992958,603,14790,4369.570423,185,772,488.288732,0.787823,0.94837,0.874954
5,312,2,34,2,0,264,0.992424,1.0,1996,20463,8850.592453,293,594,441.342205,0.785567,0.977427,0.879201
6,1099,1,23,1,0,121,0.991736,1.0,1147,33573,14627.631148,147,692,387.578512,0.800959,1.0,0.865412
7,1073,1,24,1,0,108,0.990741,1.0,1973,36743,13420.385321,62,608,327.87037,0.680498,0.97491,0.884474
8,1077,2,35,1,0,106,0.990566,1.0,1297,17467,7815.037383,259,762,476.160377,0.763547,0.904,0.85019
9,310,2,34,4,0,284,0.985915,1.0,3237,32595,11921.252632,83,541,393.391459,0.754011,0.956204,0.869055


In [3]:
df.shape

(33, 17)

In [4]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 17 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             33 non-null     int64  
 1   sex            33 non-null     int64  
 2   age            33 non-null     int64  
 3   NA_sleep       33 non-null     int64  
 4   NA_steps       33 non-null     int64  
 5   study_length   33 non-null     int64  
 6   perc_NA_sleep  33 non-null     float64
 7   perc_NA_steps  33 non-null     float64
 8   min_steps      33 non-null     int64  
 9   max_steps      33 non-null     int64  
 10  avg_steps      33 non-null     float64
 11  min_sleep      33 non-null     int64  
 12  max_sleep      33 non-null     int64  
 13  avg_sleep      33 non-null     float64
 14  min_sleep_eff  33 non-null     float64
 15  max_sleep_eff  33 non-null     float64
 16  avg_sleep_eff  33 non-null     float64
dtypes: float64(7), int64(10)
memory usage: 4.5 KB
None


### Ensure No Missing Values

In [5]:
df.isnull().sum()

id               0
sex              0
age              0
NA_sleep         0
NA_steps         0
study_length     0
perc_NA_sleep    0
perc_NA_steps    0
min_steps        0
max_steps        0
avg_steps        0
min_sleep        0
max_sleep        0
avg_sleep        0
min_sleep_eff    0
max_sleep_eff    0
avg_sleep_eff    0
dtype: int64

### Data Cleaning

For data cleaning, we went forth with dropping columns id, NA_sleep, NA_steps, min_steps, max_steps, min_sleep, max_sleep, min_sleep_eff, max_sleep_eff. Id is a randomized number that does not have strict association with the participant, thus being removed from the dataset. NA_sleep and NA_steps are the number of days for each participant where sleep and steps were not measured. Because each participant had differing study lengths, we decided this was necessary to remove since having this value would not be helpful in our analysis. In relation, our group kept the variables perc_NA_sleep and perc_NA_steps because they contain the information about NA_sleep and NA_steps with correlation to the participant's unique study length. The min and max for sleep, steps, and sleep efficiency are also omitted because the average variable kept in our dataframe for each sleep, steps, and sleep efficiency will help see the general trend. 

Our group replaced the values 1 and 2 with 0 and 1 to keep consistent with common data wrangling practice. 0 represents males while 1 represents females. We converted average steps from float to integer since the hanging decimal is not helpful for our analysis. Average sleep was originally measured in minutes. Thus, we converted the average sleep to hours to generate a more digestable, meaningful integer to work with for our data analysis. 

In [6]:
# drop unused columns for future data analysis
df = df.drop(columns=['id', 'NA_sleep', 'NA_steps', 'min_steps', 'max_steps', 'min_sleep', 'max_sleep', 'min_sleep_eff', 'max_sleep_eff'], axis=1)

# male will be represented as 0 and females 1
df['sex'] = df['sex'].replace([1, 2], [0,1])

# convert average steps to integer to remove hanging decimal
df['avg_steps'] = df['avg_steps'].astype(int)

df

Unnamed: 0,sex,age,study_length,perc_NA_sleep,perc_NA_steps,avg_steps,avg_sleep,avg_sleep_eff
0,1,20,264,1.0,1.0,13534,449.732075,0.874021
1,0,23,158,1.0,1.0,16091,384.937107,0.870252
2,1,33,267,0.996255,1.0,7823,432.782772,0.89601
3,0,23,197,0.994924,1.0,10613,406.436548,0.859374
4,0,20,142,0.992958,0.992958,4369,488.288732,0.874954
5,1,34,264,0.992424,1.0,8850,441.342205,0.879201
6,0,23,121,0.991736,1.0,14627,387.578512,0.865412
7,0,24,108,0.990741,1.0,13420,327.87037,0.884474
8,1,35,106,0.990566,1.0,7815,476.160377,0.85019
9,1,34,284,0.985915,1.0,11921,393.391459,0.869055


In [7]:
def standardize_sleep(minutes):
    return minutes / 60

In [8]:
# convert average sleep from minutes to hours
df['avg_sleep'] = df['avg_sleep'].apply(standardize_sleep)
df

Unnamed: 0,sex,age,study_length,perc_NA_sleep,perc_NA_steps,avg_steps,avg_sleep,avg_sleep_eff
0,1,20,264,1.0,1.0,13534,7.495535,0.874021
1,0,23,158,1.0,1.0,16091,6.415618,0.870252
2,1,33,267,0.996255,1.0,7823,7.213046,0.89601
3,0,23,197,0.994924,1.0,10613,6.773942,0.859374
4,0,20,142,0.992958,0.992958,4369,8.138146,0.874954
5,1,34,264,0.992424,1.0,8850,7.355703,0.879201
6,0,23,121,0.991736,1.0,14627,6.459642,0.865412
7,0,24,108,0.990741,1.0,13420,5.464506,0.884474
8,1,35,106,0.990566,1.0,7815,7.936006,0.85019
9,1,34,284,0.985915,1.0,11921,6.556524,0.869055


### Check to ensure all Correct Data Types

In [9]:
df.dtypes

sex                int64
age                int64
study_length       int64
perc_NA_sleep    float64
perc_NA_steps    float64
avg_steps          int64
avg_sleep        float64
avg_sleep_eff    float64
dtype: object

# Ethics & Privacy

With this research topic investigating the correlation between intensity of exercise and quality of sleep, it is critical to consider the ethical constraints regarding this research.

The research question, in particular, reveals weaknesses including overgeneralizing the research findings. If there is indeed a positive correlation between intensity of exercise and quality of sleep, it could possibly overlook other factors, such as stress or mental health, that could also play a part in these findings. In addition to this, we must consider how equitable and accessible exercise is for certain portions of the population. Not all participants will have equal access to exercise whether it be walking or running which could cause a disparity for if exercise affects sleep. If participants do not have access to exercise, yet have varying levels of sleep quality, this has the potential to skew the data.

As for the data proposed, there exists a possibility for bias in terms of the demographic of the sample population, self-reported data, and participants with more experienced exercise levels which would skew the data. The data could possibly reflect that of a certain age group, region, etc. Participants could also report their own quantifiable statistics such as heart rate and number of sleep cycles, disrupting the validity and control of the data collected. The third crucial bias was the varying exercise experience levels; one participant's high intensity rate could look entirely different from another participant's high intensity rate. Additionally, when considering sensitive personal data like heart rate and sleep cycles, it is also of importance for our group to consider the ethical implications of consent and data sharing.

In order to mitigate these ethical concerns, our groups plans to address them from a top-down perspective. When starting our research for more datasets, we plan to parse through for diverse datasets that cover a large variety of demographics and health status. This will aid us, going forward with an intentional goal of minimizing selection bias and considering the more nuanced intersections of participants. Additionally, to maximize accuracy of results and to reduce self-reported data, our datasets should reflect those that have utilized standardized measuring systems such as heart rate monitors or sleep detectors. Lastly, the participants' safety and privacy is of upmost priority. Thus, finding datasets where participants have given consent and also retaining their anonymity will remain at the forefront of our research.

Once datasets have been collected, our group will continue to remain attentive of potential bias by analyzing the patterns and ensuring that the data is taking account of different demographic groups. This will continue to persist for the entire duration of the analysis phase. The post-analysis phase will consist of being transparent with the findings and discussing how biases may have influenced the results in any way. Throughout the entire process, our group intends to check in regularly with the data to detect biases and pivot our direction, if needed. To further reduce self report bias, we will collect objective data such as the number of hours someone has slept, as well as the step count someone accumulates throughout the day. Collecting this data will help reduce biases since we are collecting objective data.

The main objective for detecting bias is staying transparent with not only each other, but with the public. It is important that we are transparent with the limitations our research faces as well as the ways in which our group mitigates them.

# Team Expectations 


Read over the [COGS108 Team Policies](https://github.com/COGS108/Projects/blob/master/COGS108_TeamPolicies.md) individually. Then, include your group’s expectations of one another for successful completion of your COGS108 project below. Discuss and agree on what all of your expectations are. Discuss how your team will communicate throughout the quarter and consider how you will communicate respectfully should conflicts arise. By including each member’s name above and by adding their name to the submission, you are indicating that you have read the COGS108 Team Policies, accept your team’s expectations below, and have every intention to fulfill them. These expectations are for your team’s use and benefit — they won’t be graded for their details.

* *Handle our dedicated sections by the assigned deadline*
* Communicate transparently by being open and honest when deadlines cannot be met or holding each other accountable
* *Make time for regularly scheduled meetings and if any time conflicts come up make sure to communicate with other members.*


* Conflict Resoltuion Plan:
    * In case of conflict we will...
        * Have a zoom meeting dicussing any issues that we have
        * Discuss how to prevent this issue from recurring again
        * Open the conversation to anyone else who has any problems that we need to address
        * Have empathy and understand that other members may be going through things as well

# Project Timeline Proposal

Specify your team's specific project timeline. An example timeline has been provided. Changes the dates, times, names, and details to fit your group's plan.

If you think you will need any special resources or training outside what we have covered in COGS 108 to solve your problem, then your proposal should state these clearly. For example, if you have selected a problem that involves implementing multiple neural networks, please state this so we can make sure you know what you’re doing and so we can point you to resources you will need to implement your project. Note that you are not required to use outside methods.



| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 10/29  |  8 PM | Read & Think about COGS 108 expectations; brainstorm topics/questions  | Determine best form of communication; Discuss and decide on final project topic; discuss hypothesis; begin background research | 
| 11/5  |  8 PM |  Background research & collected datasets | Wrangle data, assign parts | 
| 11/12  | 8 PM  | Finalize datasets with good variables  | Review datasets   |
| 11/19  | 8 PM  | Create graphs for datasets | Discuss progress on graphs   |
| 11/26  | 8 PM  | Start analysis & begin conclusion and editing | Discuss progress|
| 12/3  | 8 PM  | Wrap up project| Turn in final project and group project surveys |