### **TODO and Notes**

 - [x] Make dataframes
 - [x] Convert datetimes
 - [x] rename date columns
 - [x] Find Nans
 - [x] Re-freq and fill blanks
 - [x] Turn logs into daily totals
 - [x] Save backup CSVs
 - [x] Combine dataframe
 
 **Notes**
Sample for iterating through different offsets 
```python
df["Input"].corr(df["Output"].shift(-1), method = 'pearson', min_periods = 1) #1
```
and more iteration 
```python
 xcov_monthly = [crosscorr(datax, datay, lag=i) for i in range(12)]
```
from [here](https://stackoverflow.com/questions/33171413/cross-correlation-time-lag-correlation-with-pandas)


## Imports, data, checks

In [1]:
import numpy as np
import requests
import pandas as pd
from urllib.request import urlopen
import json
from bokeh.models import CategoricalColorMapper, NumeralTickFormatter, HoverTool
from bokeh.models import ColumnDataSource, Grid, LinearAxis, Plot, VBar
from bokeh.plotting import output_notebook, figure
from bokeh.io import reset_output, show, output_file
from bokeh.layouts import column, row

The vaccine, cases, and deaths source data were relatively easy to grab diretly from the [Larimer county dashboard](https://www.larimer.org/health/communicable-disease/coronavirus-covid-19/larimer-county-positive-covid-19-numbers#/app?tab=risk) as the CSVs download through urls.

In [2]:
larimer_vac_source = pd.read_csv('https://speedtest.larimer.org/covid/index.php?file=vaccinations&csv')

larimer_cases_source = pd.read_csv('https://speedtest.larimer.org/covid/cases.csv', parse_dates=['ReportedDate'])

larimer_deaths_source = pd.read_csv('https://larimer-county-data-lake.s3-us-west-2.amazonaws.com/Public/covid/covid_deaths.csv?t=1631890252549')


The hospitalization data was much more tricky (at least finding a simple solution was tricky) I spent several hours in webscraping research and attempts purgatory. I checked BeautifulSoup, html5lib, lxml, etc. in multiple combinations and none of them had straightforward solutions because the table for hospitalizations is actually rendered through javascript so there is nothing to scrape without actually clicking the buttons. I started down the Selenium and phantomjs path but it seemed like a nightmare. I found this lifesaving article at [Towards Data Science](https://towardsdatascience.com/data-science-skills-web-scraping-javascript-using-python-97a29738353f) which shows how to find specific XHR request urls in the browser developer tools. The requested URL for the rendered table is a pretty vanilla json and not behind any authorization so there is a pretty clean way to get to it. Praise Satan I didn't have to use Selenium.  

In [3]:
url = 'https://larimer-county-data-lake.s3-us-west-2.amazonaws.com/Public/covid/covid_patient_trend.json?t=1632506827395'

response = urlopen(url)
json_data = response.read().decode('utf-8', 'replace')

d = json.loads(json_data)
larimer_hosp_source = pd.json_normalize(d['data'])

In [4]:
# make .csv backups of source data

larimer_vac_source.to_csv('larimer_vac_backup.csv')

larimer_cases_source.to_csv('larimer_cases_backup.csv')

larimer_deaths_source.to_csv('larimer_deaths_backup.csv')

larimer_hosp_source.to_csv('larimer_hosp_backup.csv')

Re-read the backup CSVs so that the notebook runs locally from this point forward.

In [5]:
larimer_vac = pd.read_csv('larimer_vac_backup.csv')

larimer_cases = pd.read_csv('larimer_cases_backup.csv')

larimer_deaths = pd.read_csv('larimer_deaths_backup.csv')

larimer_hosp = pd.read_csv('larimer_hosp_backup.csv')

So now we have all of our dataframes

In [6]:
display(larimer_vac)

display(larimer_cases)

display(larimer_deaths)

display(larimer_hosp)

Unnamed: 0.1,Unnamed: 0,Date,daily number of doses received by Larimer County residents,total number of doses recevied by residents,daily number of residents receiving first dose,total number of residents receiving first dose,daily number of residents vaccinated,total number of residents vaccinated,daily number of 70+ vaccinated,total number of 70+ vaccinated,...,daily number of Latinx residents vaccinated,total of Latinx residents vaccinated,daily number of White non-Latinx residents vaccinated,total of White non-Latinx residents vaccinated,daily number of non-White non-Latinx residents vaccinated,total of non-White non-Latinx residents vaccinated,dailyUnknown,totalUnknown,daily_additional_doses,total_additional_doses
0,0,12/14/2020,32,32,32,32,1,1,0.0,0,...,0.0,0,1,1,0.0,0,,0,0,0
1,1,12/15/2020,15,47,15,47,1,2,,0,...,,0,1,2,0.0,0,,0,0,0
2,2,12/16/2020,309,356,309,356,0,2,0.0,0,...,0.0,0,0,2,0.0,0,0.0,0,0,0
3,3,12/17/2020,996,1352,996,1352,0,2,0.0,0,...,0.0,0,0,2,0.0,0,0.0,0,0,0
4,4,12/18/2020,1052,2404,1052,2404,2,4,0.0,0,...,0.0,0,2,4,0.0,0,0.0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
367,367,12/16/2021,1233,553325,188,246471,144,227621,5.0,34795,...,10.0,12679,116,192706,13.0,13400,5.0,8836,915,96836
368,368,12/17/2021,1841,555166,268,246739,410,228031,5.0,34800,...,26.0,12705,340,193046,33.0,13433,11.0,8847,1174,98010
369,369,12/18/2021,1555,556721,131,246870,187,228218,4.0,34804,...,13.0,12718,158,193204,8.0,13441,8.0,8855,1238,99248
370,370,12/19/2021,415,557136,41,246911,77,228295,1.0,34805,...,5.0,12723,64,193268,2.0,13443,6.0,8861,297,99545


Unnamed: 0.1,Unnamed: 0,CaseCount,ReportedDate,Sex,Age,Type,City
0,0,1,2020-03-09,Female,52.0,Confirmed,Johnstown
1,1,2,2020-03-15,Male,49.0,Confirmed,Fort Collins
2,2,3,2020-03-17,Female,53.0,Confirmed,Fort Collins
3,3,4,2020-03-17,Female,94.0,Confirmed,Loveland
4,4,5,2020-03-18,Male,49.0,Confirmed,Fort Collins
...,...,...,...,...,...,...,...
48896,48896,50205,2021-12-23,Female,64.0,Confirmed,LaPorte
48897,48897,50206,2021-12-23,Male,65.0,Confirmed,Fort Collins
48898,48898,50207,2021-12-23,Unknown,69.0,Probable,Fort Collins
48899,48899,50208,2021-12-23,Female,71.0,Confirmed,Fort Collins


Unnamed: 0.1,Unnamed: 0,death_id,death_date,age,gender,city,case_status,count
0,0,a0U5w00000edbfjEAA,2020-03-09,91,Female,Loveland,Probable,1
1,1,a0U5w00000edbfiEAA,2020-03-13,95,Female,Loveland,Probable,2
2,2,a0U5w00000edbfOEAQ,2020-03-15,90,Female,Loveland,Probable,3
3,3,a0U5w00000edbfMEAQ,2020-03-25,74,Female,Loveland,Confirmed,4
4,4,a0U5w00000edbfJEAQ,2020-03-25,87,Female,Fort Collins,Confirmed,5
...,...,...,...,...,...,...,...,...
403,403,a0U5w00000fowFDEAY,2021-12-05,71,Female,Loveland,Confirmed,404
404,404,a0U5w00000foyJzEAI,2021-12-09,80,Female,Loveland,Confirmed,405
405,405,a0U5w00000foyJyEAI,2021-12-10,92,Female,Loveland,Confirmed,406
406,406,a0U5w00000foy2YEAQ,2021-12-15,90,Male,Loveland,Probable,407


Unnamed: 0.1,Unnamed: 0,Date,admission_count,kpi_admits_indicator,inpatient_count,kpi_patient_indicator,inpatient_count_pct_change
0,0,2020-03-31T00:00:00.000Z,,,47,0,
1,1,2020-04-01T00:00:00.000Z,,,46,0,
2,2,2020-04-02T00:00:00.000Z,,,46,0,
3,3,2020-04-03T00:00:00.000Z,2.0,0.0,46,0,
4,4,2020-04-04T00:00:00.000Z,1.0,0.0,42,0,
...,...,...,...,...,...,...,...
431,431,2021-12-16T00:00:00.000Z,,,88,1,15.789474
432,432,2021-12-17T00:00:00.000Z,3.0,0.0,83,1,3.750000
433,433,2021-12-20T00:00:00.000Z,4.0,0.0,72,1,-11.111111
434,434,2021-12-21T00:00:00.000Z,6.0,0.0,68,1,-16.049383


Make .CSVs of the raw data as a backup, and then re-read the DFs from those so that this notebook continues to work from that point forward even if the data sources are disconnected.

This looks like pretty good start. We'll have to make all the datetimes match and the **hospitalization** and **vaccine** data are daily totals while the **death** and **case counts** data is a case log (a row for each case) so we'll have to do some grouping to get that to match, that will come later.

## Explore, clean, manipulate

In [7]:
dfs = [larimer_vac, larimer_deaths, larimer_cases, larimer_hosp]

def get_obj_col():
    for df in dfs:
        print(list(df.select_dtypes(['object']).columns))

get_obj_col()

['Date']
['death_id', 'death_date', 'gender', 'city', 'case_status']
['ReportedDate', 'Sex', 'Type', 'City']
['Date']


---
I did this and don't like it
```python

dfs = [larimer_vac, larimer_deaths, larimer_cases, larimer_hosp]
df_names = ['larimer_vac', 'larimer_deaths', 'larimer_cases', 'larimer_hosp']


def get_obj_col():
    for df in dfs:
        obj_cols.append(list(df.select_dtypes(['object']).columns))
    zip(df_names, dfs)
    
obj_cols = []
get_obj_col()
zipped_list = zip(df_names, obj_cols)
print(tuple(zipped_list)
```
---

In [8]:
print(larimer_cases.dtypes)
print(larimer_hosp.dtypes)

Unnamed: 0        int64
CaseCount         int64
ReportedDate     object
Sex              object
Age             float64
Type             object
City             object
dtype: object
Unnamed: 0                      int64
Date                           object
admission_count               float64
kpi_admits_indicator          float64
inpatient_count                 int64
kpi_patient_indicator           int64
inpatient_count_pct_change    float64
dtype: object


Convert date columns from each df to datetimes

In [9]:
larimer_vac['Date'] = pd.to_datetime(larimer_vac['Date']).dt.tz_localize(None)
larimer_deaths['Date'] = pd.to_datetime(larimer_deaths['death_date']).dt.tz_localize(None)
larimer_cases['Date'] = pd.to_datetime(larimer_cases['ReportedDate']).dt.tz_localize(None)
larimer_hosp['Date'] = pd.to_datetime(larimer_hosp['Date']).dt.tz_localize(None)

```pd.to_datetime``` was sufficient for most of the dfs but the hospital data was TZ aware and I wanted all of them to match so had to add the ```.dt.tz_localize(None)``` 

In [10]:
def check_date_type():
    for df in dfs:
        print(list(df.select_dtypes(['datetime64']).columns))

check_date_type()

['Date']
['Date']
['Date']
['Date']


In [11]:
# create daily cases from case log
daily_cases = larimer_cases.groupby(['Date']).count().reset_index()

display(daily_cases)
display(daily_cases.dtypes)
print(f"Total case check {daily_cases['CaseCount'].sum()}")
display(daily_cases.describe()) 

Unnamed: 0.1,Date,Unnamed: 0,CaseCount,ReportedDate,Sex,Age,Type,City
0,2020-03-09,1,1,1,1,1,1,1
1,2020-03-15,1,1,1,1,1,1,1
2,2020-03-17,2,2,2,2,2,2,2
3,2020-03-18,1,1,1,1,1,1,1
4,2020-03-19,2,2,2,2,2,2,2
...,...,...,...,...,...,...,...,...
643,2021-12-19,64,64,64,64,64,64,64
644,2021-12-20,105,105,105,105,105,105,105
645,2021-12-21,196,196,196,196,196,196,196
646,2021-12-22,123,123,123,123,123,123,123


Date            datetime64[ns]
Unnamed: 0               int64
CaseCount                int64
ReportedDate             int64
Sex                      int64
Age                      int64
Type                     int64
City                     int64
dtype: object

Total case check 48901


Unnamed: 0.1,Unnamed: 0,CaseCount,ReportedDate,Sex,Age,Type,City
count,648.0,648.0,648.0,648.0,648.0,648.0,648.0
mean,75.464506,75.464506,75.464506,75.464506,75.317901,75.464506,75.464506
std,70.859453,70.859453,70.859453,70.859453,70.681563,70.859453,70.859453
min,1.0,1.0,1.0,1.0,1.0,1.0,1.0
25%,17.0,17.0,17.0,17.0,17.0,17.0,17.0
50%,58.0,58.0,58.0,58.0,58.0,58.0,58.0
75%,112.25,112.25,112.25,112.25,112.25,112.25,112.25
max,341.0,341.0,341.0,341.0,336.0,341.0,341.0


In [12]:
# create daily deaths from death log
daily_deaths = larimer_deaths.groupby(['Date']).count().reset_index()

display(daily_deaths)
display(daily_deaths.dtypes)
print(f"Total death check {daily_deaths['count'].sum()}")
display(daily_deaths.describe()) 

Unnamed: 0.1,Date,Unnamed: 0,death_id,death_date,age,gender,city,case_status,count
0,2020-03-09,1,1,1,1,1,1,1,1
1,2020-03-13,1,1,1,1,1,1,1,1
2,2020-03-15,1,1,1,1,1,1,1,1
3,2020-03-25,2,2,2,2,2,2,2,2
4,2020-03-29,2,2,2,2,2,2,2,2
...,...,...,...,...,...,...,...,...,...
240,2021-12-05,1,1,1,1,1,1,1,1
241,2021-12-09,1,1,1,1,1,1,1,1
242,2021-12-10,1,1,1,1,1,1,1,1
243,2021-12-15,1,1,1,1,1,1,1,1


Date           datetime64[ns]
Unnamed: 0              int64
death_id                int64
death_date              int64
age                     int64
gender                  int64
city                    int64
case_status             int64
count                   int64
dtype: object

Total death check 408


Unnamed: 0.1,Unnamed: 0,death_id,death_date,age,gender,city,case_status,count
count,245.0,245.0,245.0,245.0,245.0,245.0,245.0,245.0
mean,1.665306,1.665306,1.665306,1.665306,1.665306,1.665306,1.665306,1.665306
std,1.087475,1.087475,1.087475,1.087475,1.087475,1.087475,1.087475,1.087475
min,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
25%,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
50%,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
75%,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0
max,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0


In [13]:
daily_cases.set_index('Date', inplace=True)

daily_deaths.set_index('Date', inplace=True)

larimer_vac.set_index('Date', inplace=True)

larimer_hosp.set_index('Date', inplace=True)

In [14]:
# daily_cases.index = pd.to_datetime(daily_cases.index)
# daily_cases = daily_cases.resample("1D").mean()
# daily_cases


**Try this**

```python
x.dt = pd.to_datetime(x.dt)
```
One-liner using mostly @ayhan's ideas while incorporating stack/unstack and fill_value

```python
x.set_index(
    ['dt', 'user']
).unstack(
    fill_value=0
).asfreq(
    'D', fill_value=0
).stack().sort_index(level=1).reset_index()
```
**or this might be better**
```python
s.asfreq('D'))
```


In [15]:
larimer_hosp['admission_count'] = larimer_hosp['admission_count'].astype("Int64")
larimer_hosp

Unnamed: 0_level_0,Unnamed: 0,admission_count,kpi_admits_indicator,inpatient_count,kpi_patient_indicator,inpatient_count_pct_change
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-03-31,0,,,47,0,
2020-04-01,1,,,46,0,
2020-04-02,2,,,46,0,
2020-04-03,3,2,0.0,46,0,
2020-04-04,4,1,0.0,42,0,
...,...,...,...,...,...,...
2021-12-16,431,,,88,1,15.789474
2021-12-17,432,3,0.0,83,1,3.750000
2021-12-20,433,4,0.0,72,1,-11.111111
2021-12-21,434,6,0.0,68,1,-16.049383


In [16]:
larimer_hosp[larimer_hosp.index.duplicated()]

Unnamed: 0_level_0,Unnamed: 0,admission_count,kpi_admits_indicator,inpatient_count,kpi_patient_indicator,inpatient_count_pct_change
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-12-15,428,6,0.0,81,1,5.194805
2021-12-15,429,6,0.0,81,1,8.0
2021-12-15,430,6,0.0,81,1,5.194805


This weird 'Unamed:0" column appeared when I switched to using dfs from the backup CSVs so I had to drop it in place to make the following duplicate drops work.


In [17]:
larimer_hosp.drop(['Unnamed: 0'], axis = 1, inplace=True)

In [18]:
larimer_hosp.drop_duplicates(keep=False,inplace = True)

In [19]:
larimer_hosp[larimer_hosp.index.duplicated()]

Unnamed: 0_level_0,admission_count,kpi_admits_indicator,inpatient_count,kpi_patient_indicator,inpatient_count_pct_change
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1


In [20]:
daily_cases_filled = daily_cases.asfreq('D',fill_value=0)
daily_deaths_filled = daily_deaths.asfreq('D',fill_value=0)
larimer_vac_filled = larimer_vac.asfreq('D',fill_value=0)
larimer_hosp_filled = larimer_hosp.asfreq('D',fill_value=0)



## Quantify missing data

In [21]:
print(daily_cases_filled.isna().sum().sum())
print(daily_deaths_filled .isna().sum().sum())
print(larimer_vac_filled .isna().sum().sum())
print(larimer_hosp_filled.isna().sum().sum())


0
0
16
32


In [22]:
larimer_hosp_filled = larimer_hosp_filled.fillna(0)
larimer_vac_filled = larimer_vac_filled.fillna(0)

In [23]:
print(daily_cases_filled.isna().sum().sum())
print(daily_deaths_filled .isna().sum().sum())
print(larimer_vac_filled .isna().sum().sum())
print(larimer_hosp_filled.isna().sum().sum())


0
0
0
0


In [24]:
display(daily_cases_filled)
display(daily_deaths_filled)
display(larimer_vac_filled)
display(larimer_hosp_filled)

Unnamed: 0_level_0,Unnamed: 0,CaseCount,ReportedDate,Sex,Age,Type,City
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-03-09,1,1,1,1,1,1,1
2020-03-10,0,0,0,0,0,0,0
2020-03-11,0,0,0,0,0,0,0
2020-03-12,0,0,0,0,0,0,0
2020-03-13,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...
2021-12-19,64,64,64,64,64,64,64
2021-12-20,105,105,105,105,105,105,105
2021-12-21,196,196,196,196,196,196,196
2021-12-22,123,123,123,123,123,123,123


Unnamed: 0_level_0,Unnamed: 0,death_id,death_date,age,gender,city,case_status,count
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2020-03-09,1,1,1,1,1,1,1,1
2020-03-10,0,0,0,0,0,0,0,0
2020-03-11,0,0,0,0,0,0,0,0
2020-03-12,0,0,0,0,0,0,0,0
2020-03-13,1,1,1,1,1,1,1,1
...,...,...,...,...,...,...,...,...
2021-12-12,0,0,0,0,0,0,0,0
2021-12-13,0,0,0,0,0,0,0,0
2021-12-14,0,0,0,0,0,0,0,0
2021-12-15,1,1,1,1,1,1,1,1


Unnamed: 0_level_0,Unnamed: 0,daily number of doses received by Larimer County residents,total number of doses recevied by residents,daily number of residents receiving first dose,total number of residents receiving first dose,daily number of residents vaccinated,total number of residents vaccinated,daily number of 70+ vaccinated,total number of 70+ vaccinated,daily number of 70+ at least one dose,...,daily number of Latinx residents vaccinated,total of Latinx residents vaccinated,daily number of White non-Latinx residents vaccinated,total of White non-Latinx residents vaccinated,daily number of non-White non-Latinx residents vaccinated,total of non-White non-Latinx residents vaccinated,dailyUnknown,totalUnknown,daily_additional_doses,total_additional_doses
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-12-14,0,32,32,32,32,1,1,0.0,0,1.0,...,0.0,0,1,1,0.0,0,0.0,0,0,0
2020-12-15,1,15,47,15,47,1,2,0.0,0,0.0,...,0.0,0,1,2,0.0,0,0.0,0,0,0
2020-12-16,2,309,356,309,356,0,2,0.0,0,2.0,...,0.0,0,0,2,0.0,0,0.0,0,0,0
2020-12-17,3,996,1352,996,1352,0,2,0.0,0,11.0,...,0.0,0,0,2,0.0,0,0.0,0,0,0
2020-12-18,4,1052,2404,1052,2404,2,4,0.0,0,15.0,...,0.0,0,2,4,0.0,0,0.0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-12-16,367,1233,553325,188,246471,144,227621,5.0,34795,6.0,...,10.0,12679,116,192706,13.0,13400,5.0,8836,915,96836
2021-12-17,368,1841,555166,268,246739,410,228031,5.0,34800,10.0,...,26.0,12705,340,193046,33.0,13433,11.0,8847,1174,98010
2021-12-18,369,1555,556721,131,246870,187,228218,4.0,34804,4.0,...,13.0,12718,158,193204,8.0,13441,8.0,8855,1238,99248
2021-12-19,370,415,557136,41,246911,77,228295,1.0,34805,3.0,...,5.0,12723,64,193268,2.0,13443,6.0,8861,297,99545


Unnamed: 0_level_0,admission_count,kpi_admits_indicator,inpatient_count,kpi_patient_indicator,inpatient_count_pct_change
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-03-31,0,0.0,47,0,0.000000
2020-04-01,0,0.0,0,0,0.000000
2020-04-02,0,0.0,0,0,0.000000
2020-04-03,2,0.0,46,0,0.000000
2020-04-04,1,0.0,42,0,0.000000
...,...,...,...,...,...
2021-12-18,0,0.0,0,0,0.000000
2021-12-19,0,0.0,0,0,0.000000
2021-12-20,4,0.0,72,1,-11.111111
2021-12-21,6,0.0,68,1,-16.049383


In [25]:
display(len(larimer_vac_filled))
display(len(larimer_hosp_filled))
display(len(daily_cases_filled))
display(len(daily_deaths_filled))


372

632

655

648

In [26]:
# valid_entries = larimer_vac.count()
# total_rows = len(larimer_vac.index)
# missing_data = total_rows - valid_entries
# missing_data

```python
merge_ordered(df1,
              df2,
              fill_method="ffill",
              on='column',
              how='outer'
```

- [x] Experimenting with merging on 'Date' column but it's been put back as an int instead of a datetime so may need to re-type that in all the DFs
- [x] Need to rename the date column in one of the frames so they can all be merged

In [27]:
# daily_cases_filled['Date'] = pd.to_datetime(daily_cases_filled['Date']).dt.tz_localize(None)
# daily_deaths_filled['Date'] = pd.to_datetime(daily_deaths_filled['Date']).dt.tz_localize(None)
# larimer_hosp_filled['Date'] = pd.to_datetime(larimer_hosp_filled['Date']).dt.tz_localize(None)
# larimer_vac_filled['Date'] = pd.to_datetime(larimer_vac_filled['Date']).dt.tz_localize(None)

In [28]:
death_case = pd.merge_ordered(
    daily_deaths_filled,
    daily_cases_filled,
    fill_method=None,
    on='Date',
    how='outer')

death_case

Unnamed: 0,Date,Unnamed: 0_x,death_id,death_date,age,gender,city,case_status,count,Unnamed: 0_y,CaseCount,ReportedDate,Sex,Age,Type,City
0,2020-03-09,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1,1,1,1,1,1,1
1,2020-03-10,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0
2,2020-03-11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0
3,2020-03-12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0
4,2020-03-13,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
650,2021-12-19,,,,,,,,,64,64,64,64,64,64,64
651,2021-12-20,,,,,,,,,105,105,105,105,105,105,105
652,2021-12-21,,,,,,,,,196,196,196,196,196,196,196
653,2021-12-22,,,,,,,,,123,123,123,123,123,123,123


In [29]:
death_case_hosp = pd.merge_ordered(
    death_case,
    larimer_hosp_filled,
    fill_method=None,
    on='Date',
    how='outer')

death_case_hosp

Unnamed: 0,Date,Unnamed: 0_x,death_id,death_date,age,gender,city,case_status,count,Unnamed: 0_y,...,ReportedDate,Sex,Age,Type,City,admission_count,kpi_admits_indicator,inpatient_count,kpi_patient_indicator,inpatient_count_pct_change
0,2020-03-09,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1,...,1,1,1,1,1,,,,,
1,2020-03-10,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,...,0,0,0,0,0,,,,,
2,2020-03-11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,...,0,0,0,0,0,,,,,
3,2020-03-12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,...,0,0,0,0,0,,,,,
4,2020-03-13,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0,...,0,0,0,0,0,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
650,2021-12-19,,,,,,,,,64,...,64,64,64,64,64,0,0.0,0.0,0.0,0.000000
651,2021-12-20,,,,,,,,,105,...,105,105,105,105,105,4,0.0,72.0,1.0,-11.111111
652,2021-12-21,,,,,,,,,196,...,196,196,196,196,196,6,0.0,68.0,1.0,-16.049383
653,2021-12-22,,,,,,,,,123,...,123,123,123,123,123,4,0.0,66.0,1.0,-18.518519


In [30]:
combo_df = pd.merge_ordered(
    death_case_hosp,
    larimer_vac_filled,
    fill_method=None,
    on='Date',
    how='outer')

combo_df

Unnamed: 0,Date,Unnamed: 0_x,death_id,death_date,age,gender,city,case_status,count,Unnamed: 0_y,...,daily number of Latinx residents vaccinated,total of Latinx residents vaccinated,daily number of White non-Latinx residents vaccinated,total of White non-Latinx residents vaccinated,daily number of non-White non-Latinx residents vaccinated,total of non-White non-Latinx residents vaccinated,dailyUnknown,totalUnknown,daily_additional_doses,total_additional_doses
0,2020-03-09,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1,...,,,,,,,,,,
1,2020-03-10,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,...,,,,,,,,,,
2,2020-03-11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,...,,,,,,,,,,
3,2020-03-12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,...,,,,,,,,,,
4,2020-03-13,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
650,2021-12-19,,,,,,,,,64,...,5.0,12723.0,64.0,193268.0,2.0,13443.0,6.0,8861.0,297.0,99545.0
651,2021-12-20,,,,,,,,,105,...,17.0,12740.0,72.0,193340.0,6.0,13449.0,3.0,8864.0,466.0,100011.0
652,2021-12-21,,,,,,,,,196,...,,,,,,,,,,
653,2021-12-22,,,,,,,,,123,...,,,,,,,,,,


In [31]:
for col in combo_df.columns:
    print(col)

Date
Unnamed: 0_x
death_id
death_date
age
gender
city
case_status
count
Unnamed: 0_y
CaseCount
ReportedDate
Sex
Age
Type
City
admission_count
kpi_admits_indicator
inpatient_count
kpi_patient_indicator
inpatient_count_pct_change
Unnamed: 0
daily number of doses received by Larimer County residents
total number of doses recevied by residents
daily number of residents receiving first dose
total number of residents receiving first dose
daily number of residents vaccinated
total number of residents vaccinated
daily number of 70+ vaccinated
total number of 70+ vaccinated
daily number of 70+ at least one dose
total number of 70+ at least one dose
daily number of Latinx residents vaccinated
total of Latinx residents vaccinated
daily number of White non-Latinx residents vaccinated
total of White non-Latinx residents vaccinated
daily number of non-White non-Latinx residents vaccinated
total of non-White non-Latinx residents vaccinated
dailyUnknown
totalUnknown
daily_additional_doses
total_additi

In [32]:
combo_df.rename(columns = {'count':'Daily Death Count',
                           'daily number of doses received by Larimer County residents':'Daily doses',
                           'CaseCount':'Daily Cases',
                           'admission_count':'Daily Hospitalizations'
                          }, inplace = True)

In [33]:
combo_df[['Date','Daily doses','Daily Cases','Daily Hospitalizations','Daily Death Count']]

Unnamed: 0,Date,Daily doses,Daily Cases,Daily Hospitalizations,Daily Death Count
0,2020-03-09,,1,,1.0
1,2020-03-10,,0,,0.0
2,2020-03-11,,0,,0.0
3,2020-03-12,,0,,0.0
4,2020-03-13,,0,,1.0
...,...,...,...,...,...
650,2021-12-19,415.0,64,0,
651,2021-12-20,635.0,105,4,
652,2021-12-21,,196,6,
653,2021-12-22,,123,4,


In [34]:
print(combo_df.isna().sum().sum())

6114


In [35]:
combo_df = combo_df.fillna(0)
print(combo_df.isna().sum().sum())

0


In [36]:
combo_df[['Date','Daily doses','Daily Cases','Daily Hospitalizations','Daily Death Count']]

Unnamed: 0,Date,Daily doses,Daily Cases,Daily Hospitalizations,Daily Death Count
0,2020-03-09,0.0,1,0,1.0
1,2020-03-10,0.0,0,0,0.0
2,2020-03-11,0.0,0,0,0.0
3,2020-03-12,0.0,0,0,0.0
4,2020-03-13,0.0,0,0,1.0
...,...,...,...,...,...
650,2021-12-19,415.0,64,0,0.0
651,2021-12-20,635.0,105,4,0.0
652,2021-12-21,0.0,196,6,0.0
653,2021-12-22,0.0,123,4,0.0


# BOOKMARK


- [ ] Get overall plot layout
- [ ] Make plots

```python
show(row(column(fig1, fig2), column(fig3)))
```


## Visualize

In [37]:
lar_vac_data = ColumnDataSource(larimer_vac)

reset_output()
output_notebook()

# x = lar_vac_data['Date']
# y = lar_vac_data('daily number of doses received by Larimer County residents')

daily_vac_figure = figure(title='Daily Vaccinations',
                          x_axis_type="datetime")

daily_vac_figure.vbar(x='Date',
                      top='daily number of doses received by Larimer County residents',
                      source=lar_vac_data)



show(daily_vac_figure)

In [38]:
lar_vac_data = ColumnDataSource(larimer_vac)

reset_output()
output_notebook()

# x = lar_vac_data['Date']
# y = lar_vac_data('daily number of doses received by Larimer County residents')

daily_vac_figure = figure(title='Daily Vaccinations',
                         x_axis_type="datetime")

daily_vac_figure.line(x='Date',
                      y='daily number of doses received by Larimer County residents',
                      source=lar_vac_data)



show(daily_vac_figure)

In [39]:
combo_data = ColumnDataSource(combo_df)

reset_output()
output_notebook()


combo_data_figure = figure(title='Daily Cases',
                           x_axis_type="datetime")

combo_data_figure.vbar(x='Date',
                       top='Daily Cases',
                       source=combo_data)



show(combo_data_figure)