# Constructing Virginia COVID Statistic DataFrames 
#### The goal is to connect to the official Virginia Health Department API for current COVID statistics, and to create two dataframes:
1. A dataframe with the total COVID cases, hospitalizations, and deaths by county/city, both overall and per 1000 people
2. A dataframe with the trend of these statistics, defined as the total number in the last two weeks relative to the total number during the two weeks prior to that

First we load the following packages:

In [None]:
import numpy as np
import pandas as pd
import requests
import json
from datetime import timedelta

#The following packages are for making visualizations and dashboards
import plotly.express as px
import dash
from dash import dcc
from dash import html
from dash import dash_table
from dash.dependencies import Input, Output

### Accessing the VDH COVID API
We will be using the Virginia Department of Health's API endpoint for the official reported numbers on the number of COVID hospitalizations and deaths for each county and independent city in Virginia. Here is the endpoint, which provides the data in JSON format: https://data.virginia.gov/resource/bre9-aqqr.json

The data are organized such that one record is one locality on one day. The fields within each record are

* `report_date`: Date in YYYY-MM-DDTHH:MM:SS.SSS format

* `fips`: A unique numeric ID for the locality represented in the record. 51 refers to Virginia, and the last 3 digits vary depending on the location

* `locality`: The name of the county or independent city for the record

* `vdh_health_district`: The name of the regional health district (the entity chiefly responsible for keeping statistics and administering COVID vaccines) that the locality is located in

* `total_cases`: the total official number of COVID cases on the record's date since the beginning of data collection on COVID cases (although be careful, "official" does not mean true -- these numbers are almost certainly significant undercounts because many cases went unreported)

* `hospitalizations`: the total number of hospitalizations due to COVID on the record's date since the beginning of data collection

* `deaths`: the total number of deaths due to COVID on the record's date since the beginning of data collection

The API that provides the data only provides 1000 records, by default. However we can change this by specifying the $limit parameter (see the documentation here: https://dev.socrata.com/docs/paging.html). Anything more than 200,000 should be more than sufficient for our purposes.

In [None]:
endpoint = 'https://data.virginia.gov/resource/bre9-aqqr.json'
mypars = {'$limit': 200000}

r = requests.get('https://httpbin.org/user-agent')
useragent = r.text
useragent = json.loads(useragent)['user-agent']

In [None]:
useragent

In [None]:
headers = {'User-agent': useragent}
r = requests.get(endpoint, params=mypars, headers=headers)
r

### A dataframe with the total COVID cases, hospitalizations, and deaths by county/city, both overall and per 1000 people
We parse the JSON output of the API call, and we change the data types of the columns:

In [None]:
cases = pd.json_normalize(json.loads(r.text))
cases

In [None]:
cases['report_date'] = pd.to_datetime(cases['report_date'])
cases['total_cases'] = cases['total_cases'].astype('float')
cases['hospitalizations'] = cases['hospitalizations'].astype('float')
cases['deaths'] = cases['deaths'].astype('float')
cases

Note that the values represent the total counts since the beginning of data collection in March 2020 until the day in question, and not the total number reported on that day. Next we filter the data to just the most current totals:

In [None]:
cases_today = cases.loc[cases['report_date'] == max(cases['report_date'])]

To calculate the per-capita counts, we merge the data with the population statistics from the U.S. Census that we used in Module 9:

In [None]:
url = "https://demographics.coopercenter.org/sites/demographics/files/media/files/2020-07/Census_2019_RaceEstimates_forVA_0.xls"
pop = pd.read_excel(url, skiprows=4)
pop = pop.loc[~pop['FIPS'].isna()]
pop['FIPS'] = pop['FIPS'] + 51000
pop['FIPS'] = pop['FIPS'].astype('int').astype('str')
pop = pop[['FIPS', 'Total Population']]
pop

We merge the population data with the COVID case data and we calculate rates per 1000 people in each city/county:

In [None]:
cases_pop = pd.merge(cases_today, pop, 
                    left_on = ['fips'],
                    right_on = ['FIPS'],
                    how = 'inner')
cases_pop['Cases per 1000 people'] = round(1000*cases_pop['total_cases']/cases_pop['Total Population'],1)
cases_pop['Hospitalizations per 1000 people'] = round(1000*cases_pop['hospitalizations']/cases_pop['Total Population'],1)
cases_pop['Deaths per 1000 people'] = round(1000*cases_pop['deaths']/cases_pop['Total Population'],1)
cases_pop

### A dataframe with the trend of these statistics, defined as the total number in the last two weeks relative to the total number during the two weeks prior to that
We start by reshaping the data to long format to place cases, hospitalizations, and deaths on separate rows:

In [None]:
cases_trend = cases[['report_date', 'fips', 'locality', 'total_cases', 'hospitalizations', 'deaths']]
cases_trend = pd.melt(cases_trend, id_vars = ['report_date', 'fips', 'locality'],
                     value_vars = ['total_cases', 'hospitalizations', 'deaths'])
cases_trend

To calculate the totals in rolling 14 day windows, we create two additional versions of the date: the date 14 days in the future and the date 28 days in the future. We will use both new dates to merge the dataset to itself. That places the total count on a given day on the same row as the total counts 14 and 28 days ago, and that enables us to create our trend indices:

In [None]:
cases_trend['date14'] = cases_trend['report_date'] + timedelta(14)
cases_trend['date28'] = cases_trend['report_date'] + timedelta(28)
cases_trend

Next we merge on the date 14 days in the future:

In [None]:
cases_trend = pd.merge(cases_trend, cases_trend,
                      right_on = ['report_date', 'fips', 'locality', 'variable'],
                      left_on = ['date14', 'fips', 'locality', 'variable'])

cases_trend = cases_trend.drop(['report_date_x','date14_x','date28_x'], axis=1)
cases_trend = cases_trend.rename({'report_date_y':'report_date',
                                 'date14_y':'date14',
                                 'date28_y':'date28',
                                 'value_y':'value',
                                 'value_x':'value14'}, axis=1)
cases_trend

Then we merge on the date 28 days in the future:

In [None]:
cases_trend = pd.merge(cases_trend, cases_trend,
                      right_on = ['report_date', 'fips', 'locality', 'variable'],
                      left_on = ['date28', 'fips', 'locality', 'variable'])

cases_trend = cases_trend.drop(['report_date_x','date14_y','date28_y', 'value14_x', 'date14_x', 'date28_x'], axis=1)
cases_trend = cases_trend.rename({'report_date_y':'report_date',
                                 'value_y':'value',
                                 'value14_y':'value14',
                                 'value_x':'value28'}, axis=1)
cases_trend

Then we create the trend indices. We also create a string version for presentation in a table:

In [None]:
cases_trend['Most recent 14 days'] = cases_trend['value'] - cases_trend['value14']
cases_trend['Previous 14 days'] = cases_trend['value14'] - cases_trend['value28']
cases_trend['Trend'] = 100*(cases_trend['Most recent 14 days'] - cases_trend['Previous 14 days']) / cases_trend['Previous 14 days']
cases_trend['Trend_string'] = round(cases_trend['Trend'], 1).astype('str') + "%"
cases_trend

1. Web-enabled table of current rates for one location
2. Interactive line plot of cases for a location over time
3. Interactive per-capita map

In [None]:
url = 'https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json'
r = requests.get(url, headers=headers)
counties = json.loads(r.text)

In [None]:
cases_pop['Cases per 1000 people'].min()

In [None]:
cases_pop['Cases per 1000 people'].max()

In [None]:
cases_pop.columns

In [None]:
fig_map = px.choropleth(cases_pop, geojson=counties,
                   locations = 'fips',
                   color = 'Cases per 1000 people',
                   color_continuous_scale="reds",
                    range_color=(100, 550),
                    scope="usa",
                   hover_name = 'locality',
                   hover_data = ['vdh_health_district',
                                'total_cases',
                                'hospitalizations', 
                                 'deaths', 
                                 'Total Population', 
                                 'Hospitalizations per 1000 people',
                                 'Deaths per 1000 people'])
fig_map.update_geos(fitbounds='locations')
fig_map.show()

In [None]:
loc = 'Arlington'
ct = cases_trend.query(f"locality == '{loc}' & variable == 'total_cases'")
fig_line = px.line(ct, x = 'report_date', y = 'value',
             labels = {'report_date':'Date',
                      'value':'Total COVID Cases'},
             title = f'Total COVID Cases for {loc}')
fig_line.update(layout=dict(title=dict(x=0.5)))
fig_line.show()

In [None]:
loc = 'Alexandria'
ct = cases_trend.query(f"locality == '{loc}'")
ct = ct.loc[ct['report_date'] == max(ct['report_date'])]
ct = ct[['variable', 'value', 'Most recent 14 days', 'Previous 14 days', 'Trend_string']]
ct['variable'] = ct['variable'].replace({'total_cases': 'Total Cases',
                                        'hospitalizations': 'Hospitalizations',
                                        'deaths':'Deaths'})
table = ct.rename({'variable':'',
               'value':'Total since March 2020',
               'Trend_string': 'Current Trend'}, axis=1)
table

from plotly import figure_factory as ff
ff.create_table(table)

## Building the Dashboard

In [None]:
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

In [None]:
# jupyterdash can display the dashboard in the notebook
# regular dash displays in a tab in the browser
# high probability of crashing the notebook and getting an error

# 3 steps to dashboard
# 1. initialize dash
# 2. put stuff on dashboard
# 3. run dashboard

my_markdown_text = '''
The data are current as of 4/26/2023 and are from the Virginia Department of Health.
'''


app = dash.Dash(__name__, external_stylesheets=external_stylesheets) # __name__ is a special variable in python that is the name of the file

# define the layout of the dashboard
app.layout = html.Div([
    html.H1(children='COVID-19 Dashboard'),
    dcc.Markdown(my_markdown_text),
    dcc.Tabs(id='my_tabs', value='statewide', children=[
        dcc.Tab(label='Statewide', value='statewide', children=[
            dcc.Graph(figure=fig_map)
        ]),
        dcc.Tab(label='Local', value='local', children=[
            dcc.Dropdown(id='places',
            options = [{'label':x, 'value':x} for x in sorted(cases['locality'].unique().tolist())],
            value = 'Charlottesville'),
            dcc.Graph(id='lineplot', figure=fig_line),
            #dcc.Graph(id='table', figure=table)
            ])
        ])
    
    ])

# callback step 2: define the function that will take the input from the dashboard and return the output
# @ is a decorator: a function that takes another function as an argument
@app.callback(
    Output(component_id='lineplot', component_property='figure'), # output is a figure
    Input(component_id='places', component_property='value') # input is a value
)

# callback step 3: define the function that will take the input from the dashboard and return the output back to the callback
def lineplot(location):
    ct = cases_trend.query(f"locality == '{location}' & variable == 'total_cases'")
    fig_line = px.line(ct, x = 'report_date', y = 'value',
             labels = {'report_date':'Date',
                      'value':'Total COVID Cases'},
             title = f'Total COVID Cases for {location}')
    fig_line.update(layout=dict(title=dict(x=0.5)))
    return fig_line

if __name__ == '__main__':
     app.run_server(debug=True, port=8060, use_reloader=False) # use_reloader=False prevents the notebook from crashing

In [None]:
localities = cases_pop['locality'].unique()
# sort the list
localities = sorted(localities)
localities

In [None]:
[{'label':x, 'value':x} for x in sorted(cases['locality'].unique().tolist())]