# Sleep and COVID
*Sleep is fundamental to immune system function, we would perhaps expect that sleep quality declines before onset of symptoms. Alternatively we might expect it to decline after the onset of symptoms if sleep quality is affected by COVID.*

In [42]:
import os 
import pandas as pd 
import matplotlib.pyplot as plot
from IPython.display import display
from datetime import datetime
import math

In [16]:
!git clone https://github.com/Welltory/hrv-covid19.git

fatal: destination path 'hrv-covid19' already exists and is not an empty directory.


In [17]:
!ls hrv-covid19/data

blood_pressure.csv     participants.csv       surveys.csv
heart_rate.csv         scales_description.csv wearables.csv
hrv_measurements.csv   sleep.csv              weather.csv


In [18]:
hrv_data_dir = "hrv-covid19/data" 

dataset = []
files = os.listdir(path=hrv_data_dir)
for file in files: 
    dataset.append({ "name": file.split('.')[0],  "file": file, "path": hrv_data_dir + '/' + file })

In [19]:
dfs = {}
for d in dataset: 
    print(f"Loading {d['name']} ({d['file']}) into [{d['name']}] dataframe ") 
    dfs[d['name']] = pd.read_csv(d['path'])

Loading scales_description (scales_description.csv) into [scales_description] dataframe 
Loading participants (participants.csv) into [participants] dataframe 
Loading wearables (wearables.csv) into [wearables] dataframe 
Loading blood_pressure (blood_pressure.csv) into [blood_pressure] dataframe 
Loading surveys (surveys.csv) into [surveys] dataframe 
Loading heart_rate (heart_rate.csv) into [heart_rate] dataframe 
Loading weather (weather.csv) into [weather] dataframe 
Loading hrv_measurements (hrv_measurements.csv) into [hrv_measurements] dataframe 
Loading sleep (sleep.csv) into [sleep] dataframe 


## Investigate Sleep dataset 

In [38]:
dfsleep = dfs['sleep']
display(dfsleep)

Unnamed: 0,user_code,day,sleep_begin,sleep_end,sleep_duration,sleep_awake_duration,sleep_rem_duration,sleep_light_duration,sleep_deep_duration,pulse_min,pulse_max,pulse_average
0,0d297d2410,2019-12-31,2019-12-31 07:50:32,2019-12-31 08:45:22,3290.0,,,,,,,
1,0d297d2410,2020-01-01,2020-01-01 04:13:41,2020-01-01 09:45:02,19881.0,,,,,,,
2,0d297d2410,2020-01-02,2020-01-02 02:14:52,2020-01-02 08:06:00,21068.0,,,,,,,
3,0d297d2410,2020-01-03,2020-01-03 00:10:00,2020-01-03 08:45:10,30910.0,,,,,,,
4,0d297d2410,2020-01-04,2020-01-04 01:27:25,2020-01-04 08:52:20,26695.0,,,21480.0,,55.0,95.0,72.5
...,...,...,...,...,...,...,...,...,...,...,...,...
420,fcf3ea75b0,2020-04-22,2020-04-22 00:23:22,2020-04-22 07:17:23,24841.0,,,,,,,
421,fcf3ea75b0,2020-04-23,2020-04-22 22:40:51,2020-04-23 07:04:35,30224.0,,,,,,,
422,fcf3ea75b0,2020-05-06,2020-05-05 21:48:53,2020-05-06 08:02:09,36796.0,,,,,,,
423,fcf3ea75b0,2020-05-06,2020-05-06 00:18:53,2020-05-06 10:32:09,36796.0,,,,,,,


In [23]:
print(len(dfsleep))

425


425 observations is a good amount, but not all wearables record the same data

In [25]:
for attrb in dfsleep.columns:
    filtered_dfsleep = dfsleep[dfsleep[attrb].notna()]
    print(len(filtered_dfsleep), "values for", attrb)

425 values for user_code
425 values for day
425 values for sleep_begin
425 values for sleep_end
425 values for sleep_duration
9 values for sleep_awake_duration
7 values for sleep_rem_duration
27 values for sleep_light_duration
14 values for sleep_deep_duration
15 values for pulse_min
15 values for pulse_max
15 values for pulse_average


I dont think there is enough data to look at any relationships other than sleep duration, a shame because that would be very interesting.

In [29]:
#find number of individuals
unique_duplicates_count = dfsleep['user_code'].duplicated(keep=False)
unique_duplicate_values = dfsleep['user_code'][unique_duplicates_count].unique()
print(len(unique_duplicate_values))


9


9 individuals is quite a small sample size but we'll have a look anyway

In [39]:
# create a df that holds the id number and the sleep duration for each night
dfsleep_test = dfsleep[dfsleep['sleep_duration'].notna()]

dfsleep_pivot = dfsleep_test.pivot_table(index='user_code', columns='day', values='sleep_duration')

display(dfsleep_pivot)


day,2019-12-31,2020-01-01,2020-01-02,2020-01-03,2020-01-04,2020-01-05,2020-01-06,2020-01-07,2020-01-08,2020-01-09,...,2020-06-07,2020-06-08,2020-06-09,2020-06-10,2020-06-11,2020-06-12,2020-06-13,2020-06-14,2020-06-16,2020-06-17
user_code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0d297d2410,3290.0,19881.0,21068.0,30910.0,26695.0,31458.0,18480.0,27212.0,26407.0,29214.0,...,,,,,,,,,,
276ab22485,,,,,,,,,,,...,,,,,,,,,,
35c7355282,,19281.0,29108.0,25452.0,14983.0,29408.0,20719.0,21696.0,30990.0,27260.0,...,,,,,,,,,,
4985083f4d,,,,,,,,,,,...,,,,,,,,,,
6be5033971,3600.0,15000.0,33600.0,30300.0,34800.0,37200.0,24600.0,19200.0,29700.0,27000.0,...,29700.0,30600.0,21900.0,30600.0,20400.0,,22800.0,22800.0,30000.0,
9871ee5e7b,,,,,,,,,,,...,,,,,,,,,,
a1c2e6b2eb,18720.0,,24840.0,28260.0,22500.0,22664.0,21660.0,21600.0,24300.0,14700.0,...,28380.0,,27540.0,32040.0,28140.0,20880.0,,12782.0,19800.0,25320.0
c174f32d88,,,,,,,,,,,...,,,,,,,,,,
e8240b51a2,,,,,,,,,,,...,,,,,,,,,,
fcf3ea75b0,,,,,,,,,,,...,,,,,,,,,,


Start by looking at simple average sleep before and after Covid (very rudimentary)

In [31]:
dfsymp = dfs['participants']
#drop vlues without symptom onset
dfsymp = dfsymp[dfsymp['symptoms_onset'].notna()]
display(dfsymp)

Unnamed: 0,user_code,gender,age_range,city,country,height,weight,symptoms_onset
1,013f6d3e5b,f,18-24,São Paulo,Brazil,174.00,77.300,5/15/2020
2,01bad5a519,m,45-54,St Petersburg,Russia,178.00,92.000,4/5/2020
3,0210b20eea,f,25-34,Sochi,Russia,169.00,60.000,5/6/2020
4,024719e7da,f,45-54,St Petersburg,Russia,158.00,68.500,5/27/2020
5,02a2b827c9,m,25-34,,Russia,177.00,87.100,3/27/2020
...,...,...,...,...,...,...,...,...
178,f9edcb7056,f,65-74,Folsom,United States,154.94,130.300,3/16/2020
179,fcf3ea75b0,f,45-54,Moscow,Russia,168.00,92.644,5/1/2020
180,fd387f6269,f,35-44,Attleboro,United States,165.00,115.439,5/1/2020
182,fde84801d8,f,45-54,Tambov,Russia,168.00,79.500,4/16/2020


In [67]:
for user, row in dfsleep_pivot.iterrows():

    dfsymp['user_code'] = dfsymp['user_code'].astype(str)
    user = str(user)

    date_of_symptoms = dfsymp.loc[dfsymp['user_code'] == user, 'symptoms_onset']
    if(len(date_of_symptoms) == 0):
        print("No symptom data for", user)
        continue
    datesymp = pd.to_datetime(date_of_symptoms.iloc[0])
    sleep_dur_pre = []
    sleep_dur_post = []
    for date, sleep_duration in row.items():
        date_compa = pd.to_datetime(date)
        if math.isnan(sleep_duration):
            continue
        
        if (date_compa < datesymp):
            sleep_dur_pre.append(int(sleep_duration))
        else:
            sleep_dur_post.append(int(sleep_duration))
    if len(sleep_dur_pre) ==0:
        avg_pre = "*No Data*"
    else:
        avg_pre = sum(sleep_dur_pre)/len(sleep_dur_pre)
        avg_pre = avg_pre/3600
    if len(sleep_dur_post) ==0:
        avg_post = "*No Data*"
    else:
        avg_post = sum(sleep_dur_post)/len(sleep_dur_post)
        avg_post = avg_post/3600

    print("user:", user, "averaged", avg_pre, "hour before and", avg_post, "hours after")

No symptom data for 0d297d2410
No symptom data for 276ab22485
user: 35c7355282 averaged 7.153638888888889 hour before and 6.785932914046122 hours after
No symptom data for 4985083f4d
user: 6be5033971 averaged 8.06808943089431 hour before and 7.87280701754386 hours after
user: 9871ee5e7b averaged *No Data* hour before and 7.05 hours after
No symptom data for a1c2e6b2eb
user: c174f32d88 averaged 8.183333333333334 hour before and 11.117948717948718 hours after
user: e8240b51a2 averaged 7.8966666666666665 hour before and 8.955555555555556 hours after
user: fcf3ea75b0 averaged 7.647916666666666 hour before and 8.714166666666667 hours after


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfsymp['user_code'] = dfsymp['user_code'].astype(str)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfsymp['user_code'] = dfsymp['user_code'].astype(str)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfsymp['user_code'] = dfsymp['user_code'].astype(str)
A value is trying to be set on a copy of a 

A quick look at the data above is pretty inconclusive, for some it looks like total sleep increases as they try to sleep it off, for others it seems to decrease. Perhaps we need to limit our analysis to a few days before and a few days after.

In [70]:
for user, row in dfsleep_pivot.iterrows():

    dfsymp['user_code'] = dfsymp['user_code'].astype(str)
    user = str(user)

    date_of_symptoms = dfsymp.loc[dfsymp['user_code'] == user, 'symptoms_onset']
    if(len(date_of_symptoms) == 0):
        print("No symptom data for", user)
        continue
    datesymp = pd.to_datetime(date_of_symptoms.iloc[0])
    sleep_dur_pre = []
    sleep_dur_post = []
    for date, sleep_duration in row.items():
        date_compa = pd.to_datetime(date)
        if math.isnan(sleep_duration):
            continue
        if((abs(date_compa - datesymp)).days > 7):
            continue
        if (date_compa < datesymp):
            sleep_dur_pre.append(int(sleep_duration))
        else:
            sleep_dur_post.append(int(sleep_duration))
    if len(sleep_dur_pre) ==0:
        avg_pre = "*No Data*"
    else:
        avg_pre = sum(sleep_dur_pre)/len(sleep_dur_pre)
        avg_pre = avg_pre/3600
    if len(sleep_dur_post) ==0:
        avg_post = "*No Data*"
    else:
        avg_post = sum(sleep_dur_post)/len(sleep_dur_post)
        avg_post = avg_post/3600

    print("user:", user, "averaged", avg_pre, "hour before and", avg_post, "hours after")

No symptom data for 0d297d2410
No symptom data for 276ab22485
user: 35c7355282 averaged 6.261277777777777 hour before and 7.145714285714285 hours after
No symptom data for 4985083f4d
user: 6be5033971 averaged 9.61111111111111 hour before and *No Data* hours after
user: 9871ee5e7b averaged *No Data* hour before and *No Data* hours after
No symptom data for a1c2e6b2eb
user: c174f32d88 averaged *No Data* hour before and *No Data* hours after
user: e8240b51a2 averaged 7.8966666666666665 hour before and 8.955555555555556 hours after
user: fcf3ea75b0 averaged *No Data* hour before and 8.714166666666667 hours after


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfsymp['user_code'] = dfsymp['user_code'].astype(str)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfsymp['user_code'] = dfsymp['user_code'].astype(str)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfsymp['user_code'] = dfsymp['user_code'].astype(str)
A value is trying to be set on a copy of a 

The data availability has deteriorated to the point of simply being two case studies. But both now seem to exhibit an increased sleep duration as symptoms onset. This could be driven by the sleep before reducing immune function, or it could be due to an increased sleep need to recover from covid. It could very easily be random given the sample size.

The sleep dataset proved to be quite poor. Lets look at something with more data.