# Covid-19 Visualization

## Tools
- Holoviews
- Folium

## Visualizations
- ScatterPlot with Slider
- Dot Map of all data points
- Size Map based on Cases
- Size Map based on Deaths
- Heat Map based on Deaths
- Heat Map based on Cases
- Heat Map based on Deaths Per Cases
- Heat Map based on Deaths | Time Slider Enabled
- Heat Map based on Cases | Time Slider Enabled
- Heat Map based on Deaths Per Cases | Time Slider Enabled

[Kaggle Dataset](https://www.kaggle.com/fireballbyedimyrnmom/us-counties-covid-19-dataset)

In [None]:
import pandas as pd
import holoviews as hv
import numpy as np
import matplotlib.pyplot as plt

import geopy
import folium

import datetime as dt
import math

hv.extension('bokeh')

## Load Dataset as Pandas Dataframe
Data is 1.5 million lines long. Includes 3247 distinct counties and runs for 552 days

In [None]:
df = pd.read_csv("../datasets/us-counties.csv")

In [None]:
df.head()

## Adding Deaths per Cases Metric

This will be useful at the end when we make a time slider enabled heatmap.

In [None]:
df['DPC'] = df['deaths'] / df['cases']

In [None]:
df.fillna(0) # fill nan values

In [None]:
df.tail()

## Seperate into days and then Counties

**Expected End Result**
<br>
Date -> County -> Cases | Deaths | Fips

In [None]:
def county_date_seperated(df):
    cdsd = df.groupby(['date', 'county', 'state']).sum()
    return cdsd

cdsd = county_date_seperated(df)

In [None]:
cdsd.head()

## ScatterPlot with Slider

- Plots the amount of deaths for each county at a given date
- Date is selected with the slider
- Slider selects a day that is {slider value} days away from the origin

**Names aren't clear because of the large amount of counties.**

In [None]:
def dgraph(td):
    origin = dt.date(2020, 1, 21) # data startpoint
    req_date = origin + dt.timedelta(days=td) # day td days after the origin
    req_data = cdsd['deaths'][str(req_date).split(' ')[0]] # returns df for the given day
    data = [(f"{county[0]}, {county[1]}", entry) for county, entry in req_data.iteritems()] # creates list of tuple pairs (county name, deaths)
    return hv.Scatter(data, hv.Dimension('Counties'), 'Deaths') # pass scatter plot to holoviews dynamicmap

dmap = hv.DynamicMap(dgraph, kdims=['Days_From_Origin'])
dmap.redim.range(Days_From_Origin=(0,552))

## Geocoding Each County

This script takes the names of each county and then returns a set of coordinates back. This is rate limited and will take a very long time. It is advised to run this once and then download the coordinate dataframe as a CSV.

In [None]:
# geocode the counties

from geopy.geocoders import Nominatim # open source geocoder
from geopy.extra.rate_limiter import RateLimiter
import tqdm # progress bar
from tqdm import tqdm

def geocode(df):
    geolocator = Nominatim(user_agent="Covid-19 Visualization")
    geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1) # rate limiter
    county_list = df[['county', 'state']].groupby(['county', 'state']).count().index # gets county values by grouping by county and state
    coords_df = {} # instantiates our final coordinates dictionary
    error_counties = [] # tracks any errors that occur
    for county in tqdm(county_list): # iterate through county list
        address = f"{county[0]}, {county[1]}" # county one line address
        location = geolocator.geocode(f"{address}, USA") # geocoder
        try: # error handling if the geocoder doesn't find a suitable address
            coords_df[address] = (location.latitude, location.longitude)
        except:
            error_counties.append(address)
    return coords_df, error_counties

coords_df, error_counties = geocode(df)

### Saves Coords_DF as CSV

In [None]:
coords_df = pd.DataFrame.from_dict(coords_df)
coords_df.to_csv('../datasets/coords-us-counties.csv')

### Pulls Coords_DF from CSV

In [None]:
coords_df = pd.read_csv('../datasets/coords-us-counties.csv')

## Dot Map of all data points

Plots a point at the geocoded coordinate for each county in our dataframe

In [None]:
dp_map = folium.Map(location=[39.5, -98.35]) # create map at the centermost points of the us

for coords in coords_df: # iterate through all the coordinates in the coordinate dataframe
    x, y = coords_df[coords][0], coords_df[coords][1]
    folium.CircleMarker(
        location=[x, y],
        popup=coords,
        radius=5
    ).add_to(dp_map) # adds dot to folium map
    
dp_map

## Size Map Function

Draws from CDSD which is grouped by Date, County, and State in that order.

Can be used with any metric and any day

In [None]:
def s_map(td: int, metric: str, folium_map: folium.Map, scaler=1): 
    # s_map is designed to be metric agnostic
    error_counties = [] # list to hold all counties we get errors on
    origin = dt.date(2020, 1, 21) # starting point of our data
    req_date = origin + dt.timedelta(days=td) # returns the requested date of this formula. original date + days_elapsed
    req_data = cdsd[metric][str(req_date).split(' ')[0]]
    normalizer = req_data.sum() # gets sum of all deaths in a day. used for normalization later
    for row in req_data.iteritems():
        location = row[0]
        address = f"{location[0]}, {location[1]}" # one line address
        metric_count = row[1]
        try:
            x, y = coords_df[address][0], coords_df[address][1] # look up coordinates in our coords dataframe
        except:
            error_counties.append(address) # return error if we are unable to find coords
        folium.CircleMarker(
            location=[x, y],
            popup=address,
            radius=metric_count / normalizer * scaler, # scaler is meant to down or upscale the data
            fill=True
        ).add_to(folium_map) # add marker to folium map
    return error_counties

## Size Map by Cases

In [None]:
s_map_cases = folium.Map(location=[39.5, -98.35])

error_counties_cases = s_map(500, 'cases', s_map_cases)

print(error_counties_cases)

s_map_cases

## Size Map by Deaths

In [None]:
s_map_deaths = folium.Map(location=[39.5, -98.35])

error_counties_deaths = s_map(500, 'deaths', s_map_deaths)

print(error_counties_deaths)

s_map_deaths

## Date Specific Heatmap Data Creation

Returns data for a specific date given a timedelta, and metric.

In [None]:
def date_specific_heatmap_data(td: int, metric: str, scaler=1):
    error_counties = [] # list to hold all counties we get errors on
    heat_data = [] # list to hold lists of each county in a day
    origin = dt.date(2020, 1, 21) # starting point of our data
    req_date = origin + dt.timedelta(days=td) # returns the requested date of this formula. original date + days_elapsed
    req_data = cdsd[metric][str(req_date).split(' ')[0]]
    normalizer = req_data.sum()
    for row in req_data.iteritems():
        location = row[0]
        address = f"{location[0]}, {location[1]}" # one line address
        metric_count = row[1]
        metric_count_normalized = metric_count / normalizer * scaler
        if math.isnan(metric_count_normalized):
            metric_count_normalized = 0
        try:
            x, y = coords_df[address][0], coords_df[address][1] # look up coordinates in our coords dataframe
        except:
            error_counties.append(address) # return error if we are unable to find coords
        heat_data.append([x, y, metric_count_normalized])
    return heat_data

## Heat Map based on Deaths

Map is insightful when zoomed into a multi-state or below scale

In [None]:
from folium.plugins import HeatMap

hmap_deaths = folium.Map(location=[39.5, -98.35])

heat_data_deaths_date = date_specific_heatmap_data(500, 'deaths', scaler=0.1)

HeatMap(heat_data_deaths_date).add_to(hmap_deaths)

hmap_deaths

## Heat Map based on Cases

Map is insightful when zoomed into a multi-state or below scale

In [None]:
hmap_cases = folium.Map(location=[39.5, -98.35])

heat_data_cases_date = date_specific_heatmap_data(500, 'cases', scaler=0.1)

HeatMap(heat_data_cases_date).add_to(hmap_cases)

hmap_cases

## Date Agnostic Heatmap Data Creation

Creates the time series data necessary for a heatmap with slider.

**Data Structure**
`[date1: [date_specific_heatmap_data(date)], date2: [date_specific_heatmap_data(date)]...]`

In [None]:
def data_agnostic_heatmap_data(metric: str, scaler=1):
    heat_data_date_agnostic = []
    for date in range(551):
        curr_data = date_specific_heatmap_data(date, metric)
        heat_data_date_agnostic.append(curr_data)
    return heat_data_date_agnostic

## Heat Map based on Deaths | Time Slider Enabled

In [None]:
from folium.plugins import HeatMapWithTime

hmap_timeslider_deaths = folium.Map(location=[39.5, -98.35])

heat_data_date_agnostic_deaths = data_agnostic_heatmap_data('deaths')

HeatMapWithTime(heat_data_date_agnostic_deaths).add_to(hmap_timeslider_deaths)

hmap_timeslider_deaths

## Heat Map based on Cases | Time Slider Enabled

In [None]:
hmap_timeslider_cases = folium.Map(location=[39.5, -98.35])

heat_data_date_agnostic_cases = data_agnostic_heatmap_data('cases')

HeatMapWithTime(heat_data_date_agnostic_cases).add_to(hmap_timeslider_cases)

hmap_timeslider_cases

## Implementing a new metric

Implementing a new metric is simple. Preprocessing can be done at the beginning. From there metrics can be selected through the extensible functions I've written.

## Heatmap based on Deaths per Cases | Time Slider Enabled

In [None]:
hmap_timeslider_dpc = folium.Map(location=[39.5, -98.35])

heat_data_date_agnostic_dpc = data_agnostic_heatmap_data('DPC', scaler=0.1)

HeatMapWithTime(heat_data_date_agnostic_dpc).add_to(hmap_timeslider_dpc)

hmap_timeslider_dpc