### Wisconsin Coronavirus Data Analysis-Statewide Data
We are here once again to look at the coronavirus data for the state of Wisconsin. Since the beginning of the academic year for the school age children, the number of COVID-SARS 19 cases in the state of Wisconsin has risen significantly along with other Midwestern states, worrying public health officials and some state politicians alike. With that in mind, I want to revisit the coronavirus data the Wisconsin Department of Health Services has provided that dates back to mid-March. If we have time, I will try to apply a model that utilizes a system of differential equations to predict how we may look in the future.

This part of the analysis will look at the county level data as time passes primarily by looking at the changes in the case counts at the count level.

In [1]:
# Import the Python packages necessary for performing this analysis
import os                           # Module for working with the operating system
import numpy as np                  # Module for performing mathematical operations
import pandas as pd                 # Module for manipulating tabular data
import matplotlib.pyplot as plt     # Module for plotting data
import bokeh as bk                  # Module for manipulating geographic data and creating map visualizations
import geopandas as gpd             # Module for manipulating geographic data and creating map visualizations
import datetime as dt               # Module for working with dates in Python
import re                           # Module for working with strings in Python
import sys                          # Module for changing the Python 3 operating system settings
import warnings                     # Module for handling warnings that arise when running Python code

# Hiding warnings from imports of Python modules
warnings.filterwarnings("ignore")

In order to work with the data, we need to:
- Change the working directory
- Load in the county-level data into our notebook
- Write each county's data into its own dataset

We need to also change the recursion limit on our operating system to allow the saving of our data to a JSON object.

In [2]:
# Change the working directory to allow access to the datasets
print("The current working directory of this script is" + os.getcwd() + "\n")

try:
    os.chdir("C:/Users/debro/OneDrive/Documents/Codes_and_Data/coding_projects/WI_Coronavirus_Study")
    print("The current working directory is: " + os.getcwd() + ". You successfully changed the working directory.")
except OSError:
    print("The working directory was not successfully changed.")
    
# Change the recursion limit
print("\nThe current recursion limit is: " + str(sys.getrecursionlimit()))
#sys.setrecursionlimit(25000000)
#print("\nThe new recursion limit is: " + str(sys.getrecursionlimit()))

The current working directory of this script isC:\Users\debro\OneDrive\Documents\Codes_and_Data\coding_projects\WI_Coronavirus_Study\analysis_scripts

The current working directory is: C:\Users\debro\OneDrive\Documents\Codes_and_Data\coding_projects\WI_Coronavirus_Study. You successfully changed the working directory.

The current recursion limit is: 3000


In [3]:
# Load the dataset from the datasets folder
data = pd.read_csv("./wisconsin_data/10012020/wi_county_covid_19_data.csv", encoding = "utf-8")

# View the first 10 rows of the dataframe
data.head(10)

Unnamed: 0,OBJECTID,GEOID,GEO,NAME,NEGATIVE,POSITIVE,HOSP_YES,HOSP_NO,HOSP_UNK,POS_FEM,...,DTH_E_NHSP,DTH_E_UNK,POS_HC_Y,POS_HC_N,POS_HC_UNK,DTH_NEW,POS_NEW,NEG_NEW,TEST_NEW,DATE
0,2,55001,County,Adams,,0,,,,,...,,,,,,,,,,2020/03/15 14:00:00+00
1,3,55003,County,Ashland,,0,,,,,...,,,,,,,,,,2020/03/15 14:00:00+00
2,4,55005,County,Barron,,0,,,,,...,,,,,,,,,,2020/03/15 14:00:00+00
3,5,55007,County,Bayfield,,0,,,,,...,,,,,,,,,,2020/03/15 14:00:00+00
4,6,55009,County,Brown,,0,,,,,...,,,,,,,,,,2020/03/15 14:00:00+00
5,7,55011,County,Buffalo,,0,,,,,...,,,,,,,,,,2020/03/15 14:00:00+00
6,8,55013,County,Burnett,,0,,,,,...,,,,,,,,,,2020/03/15 14:00:00+00
7,9,55015,County,Calumet,,0,,,,,...,,,,,,,,,,2020/03/15 14:00:00+00
8,10,55017,County,Chippewa,,0,,,,,...,,,,,,,,,,2020/03/15 14:00:00+00
9,11,55019,County,Clark,,0,,,,,...,,,,,,,,,,2020/03/15 14:00:00+00


In [4]:
data["NAME"][data.NAME == "St. Croix"] = "Saint Croix"
data[data.NAME == "Saint Croix"]

# Selecting data for only October 1, 2020
october_1_2020 = data[data.DATE == "2020/10/01 14:00:00+00"].reset_index()
len(october_1_2020)

72

So we have all 72 counties in here, but we need to try to create new tables (dataframes) for each county in WI.

In [5]:
county_bndry_shp = "./wisconsin_data/county_boundary_files/county_boundaries/County_Boundaries_24K.shp"

# Read the shapefile using geopandas
county_shp = gpd.read_file(county_bndry_shp)
county_shp.head()

Unnamed: 0,OBJECTID,DNR_REGION,DNR_CNTY_C,COUNTY_NAM,COUNTY_FIP,SHAPEAREA,SHAPELEN,geometry
0,321,Southeast Region,30,Kenosha,59,721045400.0,123267.303863,"MULTIPOLYGON (((699813.437 246226.688, 699794...."
1,322,South Central Region,33,Lafayette,65,1641795000.0,164707.65004,"POLYGON ((503148.082 260278.466, 503292.672 26..."
2,323,South Central Region,54,Rock,105,1879382000.0,174114.58767,"POLYGON ((600602.683 264347.425, 603850.419 26..."
3,324,Southeast Region,65,Walworth,127,1492598000.0,154833.279357,"POLYGON ((658404.520 263083.277, 658417.776 26..."
4,325,South Central Region,23,Green,45,1512855000.0,155741.104373,"POLYGON ((571551.903 263810.562, 571555.279 26..."


In [6]:
county_shp.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 72 entries, 0 to 71
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   OBJECTID    72 non-null     int64   
 1   DNR_REGION  72 non-null     object  
 2   DNR_CNTY_C  72 non-null     int64   
 3   COUNTY_NAM  72 non-null     object  
 4   COUNTY_FIP  72 non-null     object  
 5   SHAPEAREA   72 non-null     float64 
 6   SHAPELEN    72 non-null     float64 
 7   geometry    72 non-null     geometry
dtypes: float64(2), geometry(1), int64(2), object(3)
memory usage: 4.6+ KB


In [7]:
county_data = data.merge(county_shp, left_on = "NAME", right_on = "COUNTY_NAM").iloc[:, 1:]
county_data.head()

Unnamed: 0,GEOID,GEO,NAME,NEGATIVE,POSITIVE,HOSP_YES,HOSP_NO,HOSP_UNK,POS_FEM,POS_MALE,...,TEST_NEW,DATE,OBJECTID_y,DNR_REGION,DNR_CNTY_C,COUNTY_NAM,COUNTY_FIP,SHAPEAREA,SHAPELEN,geometry
0,55001,County,Adams,,0,,,,,,...,,2020/03/15 14:00:00+00,349,West Central Region,1,Adams,1,1781419000.0,207462.267914,"POLYGON ((529729.236 419584.292, 530887.237 41..."
1,55001,County,Adams,,0,,,,,,...,,2020/03/16 14:00:00+00,349,West Central Region,1,Adams,1,1781419000.0,207462.267914,"POLYGON ((529729.236 419584.292, 530887.237 41..."
2,55001,County,Adams,,0,,,,,,...,,2020/03/17 14:00:00+00,349,West Central Region,1,Adams,1,1781419000.0,207462.267914,"POLYGON ((529729.236 419584.292, 530887.237 41..."
3,55001,County,Adams,1138.0,8,4.0,3.0,1.0,-999.0,-999.0,...,23.0,2020/06/15 14:00:00+00,349,West Central Region,1,Adams,1,1781419000.0,207462.267914,"POLYGON ((529729.236 419584.292, 530887.237 41..."
4,55001,County,Adams,1152.0,8,4.0,3.0,1.0,-999.0,-999.0,...,14.0,2020/06/16 14:00:00+00,349,West Central Region,1,Adams,1,1781419000.0,207462.267914,"POLYGON ((529729.236 419584.292, 530887.237 41..."


### Can we create a static map for the COVID-19 cases in Wisconsin counties?

In [8]:
convert_dict = {'GEO': str, 
                'NAME': str,
                'DNR_REGION': str,
                'COUNTY_NAM': str,
                'COUNTY_FIP': str
               } 

county_data['DATE'] = pd.to_datetime(county_data['DATE'])

county_data = county_data.astype(convert_dict) 
print(county_data.dtypes) 

for col in county_data.columns:
    print(str(col) + " " + str(county_data[col].dtypes)) 

GEOID            int64
GEO             object
NAME            object
NEGATIVE       float64
POSITIVE         int64
                ...   
COUNTY_NAM      object
COUNTY_FIP      object
SHAPEAREA      float64
SHAPELEN       float64
geometry      geometry
Length: 110, dtype: object
GEOID int64
GEO object
NAME object
NEGATIVE float64
POSITIVE int64
HOSP_YES float64
HOSP_NO float64
HOSP_UNK float64
POS_FEM float64
POS_MALE float64
POS_OTH float64
POS_0_9 float64
POS_10_19 float64
POS_20_29 float64
POS_30_39 float64
POS_40_49 float64
POS_50_59 float64
POS_60_69 float64
POS_70_79 float64
POS_80_89 float64
POS_90 float64
DEATHS int64
DTHS_FEM float64
DTHS_MALE float64
DTHS_OTH float64
DTHS_0_9 float64
DTHS_10_19 float64
DTHS_20_29 float64
DTHS_30_39 float64
DTHS_40_49 float64
DTHS_50_59 float64
DTHS_60_69 float64
DTHS_70_79 float64
DTHS_80_89 float64
DTHS_90 float64
IP_Y_0_9 float64
IP_Y_10_19 float64
IP_Y_20_29 float64
IP_Y_30_39 float64
IP_Y_40_49 float64
IP_Y_50_59 float64
IP_Y_60_69 float6

In [9]:
# Since bokeh package works with GeoJson, we need to convert this GeoPandas object to a json
import json

# Read data to json.
o12020_tj = county_data[county_data['DATE'] == "2020/10/01 14:00:00+00"].reset_index().to_json(default_handler = str,
                                                                                             orient = "index")

In [10]:
o12020_json = json.loads(o12020_tj)

# JSON county shape file
october_1_2020_json_data = json.dumps(o12020_json)

# See the October 1, 2020 JSON data
october_1_2020_json_data

'{"0": {"index": 200, "GEOID": 55001, "GEO": "County", "NAME": "Adams", "NEGATIVE": 4118.0, "POSITIVE": 265, "HOSP_YES": 23.0, "HOSP_NO": 174.0, "HOSP_UNK": 68.0, "POS_FEM": 132.0, "POS_MALE": 133.0, "POS_OTH": -999.0, "POS_0_9": 8.0, "POS_10_19": 20.0, "POS_20_29": 41.0, "POS_30_39": 24.0, "POS_40_49": 28.0, "POS_50_59": 32.0, "POS_60_69": 68.0, "POS_70_79": 36.0, "POS_80_89": 7.0, "POS_90": -999.0, "DEATHS": 4, "DTHS_FEM": -999.0, "DTHS_MALE": -999.0, "DTHS_OTH": -999.0, "DTHS_0_9": -999.0, "DTHS_10_19": -999.0, "DTHS_20_29": -999.0, "DTHS_30_39": -999.0, "DTHS_40_49": -999.0, "DTHS_50_59": -999.0, "DTHS_60_69": -999.0, "DTHS_70_79": -999.0, "DTHS_80_89": -999.0, "DTHS_90": -999.0, "IP_Y_0_9": null, "IP_Y_10_19": null, "IP_Y_20_29": null, "IP_Y_30_39": null, "IP_Y_40_49": null, "IP_Y_50_59": null, "IP_Y_60_69": null, "IP_Y_70_79": null, "IP_Y_80_89": null, "IP_Y_90": null, "IP_N_0_9": null, "IP_N_10_19": null, "IP_N_20_29": null, "IP_N_30_39": null, "IP_N_40_49": null, "IP_N_50_59": 

In [11]:
# Import bokeh features
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer

# Input the geojson file with our plotting features
county_source = GeoJSONDataSource(geojson = october_1_2020_json_data)

# Define a sequential multi-hue color palette
palette = brewer['YlOrBr'][8]

# Reverse the color order so the darkest color is the color with the most cases
palette = palette[::-1]

# Instantiate LinearColorMapper that linearly maps numbers in a range, into a sequence of colors
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 5000)

# Define tick labels for color bar
tick_labels = {'0': '0', '500': '500', '1000': '1000', '1500': '1500', '2000': '2000', '2500': '2500',
              '3000': '3000', '3500': '3500', '4000': '4000', '4500': '4500', '5000': '5000<'}

# Create color bar
color_bar = ColorBar(color_mapper = color_mapper, label_standoff = 8, width = 500, height = 20,
                     border_line_color = None, location = (0,0), orientation = 'horizontal',
                     major_label_overrides = tick_labels)

p = figure(title = 'Wisconsinites with Positive COVID-19 Tests', plot_height = 600,
           plot_width = 950, toolbar_location = None)

p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None

# Add patch renderer to figure 
p.patches('xs','ys', source = county_source, fill_color = {'field': "POSITIVE", 'transform': color_mapper},
          line_color = 'black', line_width = 0.25, fill_alpha = 1)

# Specify figure layout
p.add_layout(color_bar, 'below')

# Output the image in Jupyter Notebook
output_notebook()

# Display figure
show(p, notebook_handle = True)

In [12]:
help(GeoJSONDataSource)

Help on class GeoJSONDataSource in module bokeh.models.sources:

class GeoJSONDataSource(ColumnarDataSource)
 |  GeoJSONDataSource(*args, **kwargs)
 |  
 |  Method resolution order:
 |      GeoJSONDataSource
 |      ColumnarDataSource
 |      DataSource
 |      bokeh.model.Model
 |      bokeh.core.has_props.HasProps
 |      bokeh.util.callback_manager.PropertyCallbackManager
 |      bokeh.util.callback_manager.EventCallbackManager
 |      builtins.object
 |  
 |  Data descriptors defined here:
 |  
 |  geojson
 |      GeoJSON that contains features for plotting. Currently
 |      ``GeoJSONDataSource`` can only process a ``FeatureCollection`` or
 |      ``GeometryCollection``.
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __cached_all__overridden_defaults__ = {}
 |  
 |  __cached_all__properties__ = {'geojson', 'js_event_callbacks', 'js_pro...
 |  
 |  __cached_all__properties_with_refs__ = {'js_event

Make sure the data is entered as a string type, in the format YYYY/MM/DD.

```python
from bokeh.io import curdoc, output_notebook
from bokeh.models import Slider, HoverTool
from bokeh.layouts import widgetbox, row, column
#Define function that returns json_data for year selected by user.
    
def json_data(selected_date):
    date = str(selected_date) + " 14:00:00+00"
    df_date = county_data[county_data['DATE'] == pd.to_datetime(date)]
    merged = county_shp.merge(df_date, left_on = 'COUNTY_NAM', right_on = "NAME", how = 'left')
    merged.fillna('No data', inplace = True)
    merged_json = json.loads(merged.to_json(default_handler = str))
    json_data = json.dumps(merged_json)
    return json_data

#Input GeoJSON source that contains features for plotting.
geosource = GeoJSONDataSource(geojson = json_data("2020/10/01"))

#Define a sequential multi-hue color palette.
palette = brewer['YlGnBu'][8]
#Reverse color order so that dark blue is highest obesity.
palette = palette[::-1]
#Instantiate LinearColorMapper that linearly maps numbers in a range, into a sequence of colors. Input nan_color.
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 2000, nan_color = '#d9d9d9')
#Define custom tick labels for color bar.
#Add hover tool
hover = HoverTool(tooltips = [ ('County Name','@COUNTY_NAM'), ('Number of Positive COVID-19 Tests', '@POS_NEW')])
#Create color bar. 
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
                     border_line_color=None, location = (0,0), orientation = 'horizontal')
#Create figure object.
p = figure(title = "New Positive Tests for Selected Date", plot_height = 600 , plot_width = 950,
           toolbar_location = None, tools = [hover])

p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
#Add patch renderer to figure. 
p.patches('xs','ys', source = geosource,fill_color = {'field': 'POS_NEW', 'transform' : color_mapper},
          line_color = 'black', line_width = 0.25, fill_alpha = 1)
#Specify layout
p.add_layout(color_bar, 'below')
# Define the callback function: update_plot
def update_plot(attr, old, new):
    date = slider.value
    new_data = json_data(date)
    geosource.geojson = new_data
    p.title.text = 'Number of Positive COVID-19 Cases in Each County, %d' %date
    
# Make a slider object: slider 
slider = Slider(title = 'DATE', start = pd.to_datetime("2020/03/15 14:00:00+00"),
                end = pd.to_datetime("2020/10/01 14:00:00+00"),
                step = 1, value = pd.to_datetime("2020/10/01 14:00:00+00"))
slider.on_change('value', update_plot)
# Make a column layout of widgetbox(slider) and plot, and add it to the current document
layout = column(p.widgetbox(slider))
curdoc().add_root(layout)
#Display plot inline in Jupyter notebook
output_notebook()
#Display plot
show(layout)
```