<a href="https://colab.research.google.com/github/maddyrlee/storytelling-with-data/blob/master/HongLeeRamTang_Assignment_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Set Up

In [0]:
import numpy as np
import pandas as pd
import math

import pandas_datareader as pdr

import plotly
import plotly.express as px
import seaborn as sns
!pip install us
import us
import datetime
from time import strptime
import calendar



#Project Team:


- Jae Hong - Created the youtube video and helped find trends in the data.

- Maddy Lee - Assisted data collection, finding trends, and completed documentation.

- Mira Ram - Created cumulative csv including data from governor party and state opening dates. Found links to various datasets.

- Bill Tang - Imported datasets and created graphs for state opening, governor party, and age distributions. Helped find trends.


#Background/Overview

During the current COVID-19 crisis, some states have been hit harder economically than others. Our goal is to investigate what possible reasons could cause some states to suffer more economic downturn than others. We noticed that a couple states/territories in particular, most notably Hawaii, Puerto Rico, and Vermont, were incredibly hard hit with some data points showing a decrease in economic activity as big as 90%. Since Dartmouth is so closely tied to Vermont and with some of our group members even having held jobs there, we wanted to explore what made Vermont so different. What could make Vermont drop as much as it is while the other 5 New England states seem to be faring better economically?

#Approach

We investigated different factors that could be the reason that certain states were affected more than others. These factors include party of the governor, number of COVID-19 cases and death, reopening dates, and population. We used COVID-19's impact on local businesses as a representation for economic loss for each state. We also looked at how the economic loss might impact the date each state plans on reopening.

#Quick summary

When examining local business as a representation for economic loss for each state, we found New York, Michigan, West Virginia, Nevada, Hawaii, and Vermont were the most negatively impacted. Four of the six states had Democratic parties but little conclusion can be drawn from this. New York and Michigan had large numbers of COVID-19 cases even when taking the percent of the population affected into account. Hawaii and Nevada are known to have tourism as one of their major industries. COVID-19 negatively affected this industry, potentially explaining the drop in economic prosperity. Little correlation was found with West Virginia and Vermont in terms of understanding why they were impacted so much more negatively compared to other states. Vermont especially so as it and Hawaii experienced up to 90% closure of small businesses on a given day.

#Data

Briefly describe your dataset(s), including links to original sources. Provide any relevant background information specific to your data sources.

In [0]:
# party_v_state includes the party of the governor for each state. Source: https://en.wikipedia.org/wiki/List_of_United_States_governors
party_v_state = pd.read_csv('https://raw.githubusercontent.com/maddyrlee/storytelling-with-data/master/assignments/assignment%204/csvs/gov_v_state.csv', sep=',')

# opening_by_state includes dates each US state is planning on reopening. Source: https://www.cnn.com/interactive/2020/us/states-reopen-coronavirus-trnd/
opening_by_state = pd.read_csv('https://raw.githubusercontent.com/maddyrlee/storytelling-with-data/master/assignments/assignment%204/csvs/state-reopenings.csv',sep=',')

# business_data includes COVID-19 impact on US local businesses per state. Source: https://joinhomebase.com/data/covid-19/ 
business_data = pd.read_csv('https://raw.githubusercontent.com/maddyrlee/storytelling-with-data/master/assignments/assignment%204/csvs/businessdata.csv', sep=',')

# states includes COVID-19 cases, deaths in each state from 1/21/2020 to present. Source: https://github.com/nytimes/covid-19-data/raw/master/us-states.csv
states = pd.read_csv('https://github.com/nytimes/covid-19-data/raw/master/us-states.csv')

# agedistribution includes number of COVID-19 patients within specific age distributions. Source: https://raw.githubusercontent.com/veltman/state-population-by-age/master/2010.csv
agedistribution = pd.read_csv('https://raw.githubusercontent.com/veltman/state-population-by-age/master/2010.csv', sep=',')

#Analysis

In the following analysis, the code indicates that if the party of the governor for that state is 'R' then it is assigned the number 0. This is then used to create a choropleth map of each state where red indicates the state has a Republican governor and blue indicates the state has a Democratic governor.



In [0]:
party_v_state['COLOR'] = party_v_state['PARTY'].map(lambda party: 0 if party == 'R' else 1)

pvsfig = px.choropleth(party_v_state,
                     locations='STATE',
                     color='COLOR',
                     hover_name='PARTY',
                     locationmode='USA-states',
                     color_continuous_scale=[(0, 'red'), (1, 'blue')])
pvsfig.update_layout(
    title_text = 'State Rankings', # Create a Title
    geo_scope='usa',  # Plot only the USA instead of globe
    coloraxis_showscale=False
)
pvsfig.show()

In the following analysis, the function dateToNumDays takes that each state intends on opening and converts it into the total number of days until the state plans on opening. Then this data is used to create a choropleth map of each state where the color distribution represents the amount of days until the state will open. States that have not decided on a date were set to be the largest number of days possible and therefore show up as the darkest shade of blue on the choropleth map.


In [0]:
def dateToNumDays(datestring):
  if(type(datestring) == float):
    return 204
  datearr = datestring.split("/")
  tot = int(datearr[0]) * 31 + int(datearr[1])
  return tot

opening_by_state['NUM_DAYS'] = opening_by_state['OPEN'].map(dateToNumDays)

obsfig = px.choropleth(opening_by_state,
                       locations='STATE',
                       color='NUM_DAYS',
                       locationmode='USA-states',
                       color_continuous_scale=px.colors.sequential.Jet[::-1])
obsfig.update_layout(
    title_text = 'State openings',
    geo_scope='usa'
)
obsfig.show()

In the following analysis, the states in the dataframe are correlated with their abbreviation. Then the percent of local businesses that closed per state is graphed on a choropleth map. The greater the loss, the more negative the number and the more red the color.

In [0]:
business_data = business_data.loc[:, ~business_data.columns.str.contains('^Unnamed')]
business_data['STATE'] = business_data['STATE'].map(lambda state: us.states.lookup(state).abbr)
nc = []
for k in business_data.columns.values:
  if k == 'STATE':
    nc.append(k)
    continue
  else:
    kspl = k.split('-')
    k = kspl[1] + '_' + kspl[0]
    nc.append(k)
business_data.columns = nc
business_data_reformat = []
bdcolumns = ['STATE', 'DATE', 'LOSSES']
for row in business_data.itertuples():
  state = row[1]
  for key, val in zip(row._fields, row):
    if(key == 'STATE' or key == 'Index'):
      continue
    else:
      datesplitstring = key.split('_')
      date = datetime.date(2020, list(calendar.month_abbr).index(datesplitstring[0]), int(datesplitstring[1]))
      date = pd.Timestamp(date)
      business_data_reformat.append([state, str(date),float(val.strip('%'))/100])
business_data_reformat_pd = pd.DataFrame(business_data_reformat, columns=bdcolumns)
bdfig = px.choropleth(business_data_reformat_pd, 
                      locations='STATE', 
                      color="LOSSES", 
                      animation_frame="DATE", 
                      locationmode='USA-states',
                      color_continuous_scale=px.colors.sequential.Jet[::-1],
                      range_color=[-0.8, 0])
bdfig.update_layout(
    title_text = 'State business health',
    geo_scope='usa',
)
bdfig.show()

In the following analysis, the state name in the dataframe is correlated with its abbreviation. This is done both for the COVID-19 cases and death per state dataset and the age distribution of COVID-19 cases dataset. The COVID-19 cases and death dataset is set to a log scale. A function was created to divide the population number per state from the age distribution dataset by the total number of COVID-19 cases in the cases and death dataset. Then the ratio is graphed on a choropleth map on a log scale where the deeper the red color, the more COVID-19 cases/population of the state. Below the log scale of the COVID-19 cases per state is graphed on a chloropleth map.

In [0]:
states['stateabbr'] = states['state'].map(lambda statename: us.states.lookup(statename).abbr)
states['logcases'] = states['cases'].map(lambda cases: math.log(cases + 1, 10))
agedistribution['abbr'] = agedistribution['State'].map(lambda statename: us.states.lookup(statename).abbr if us.states.lookup(statename) else "")

def divide_by_population(row):
  name = row['stateabbr']
  num = row['cases']
  if(name != 'PR' and name != 'VI' and name != 'GU' and name != 'MP'):
    population = agedistribution.loc[agedistribution['abbr'] == name].iloc[0]['Total']
  else:
    return 0.0000000000000001
  return float(num)/float(population)

states['popratiocases'] = states.apply(lambda row: divide_by_population(row), axis=1)
states['logpopratiocases'] = states['popratiocases'].map(lambda ratio: math.log(ratio))
sdfig = px.choropleth(states, 
                      locations='stateabbr', 
                      color="logpopratiocases", 
                      animation_frame="date", 
                      locationmode='USA-states',
                      color_continuous_scale=px.colors.sequential.Jet,
                      range_color=[-8, -4]
                      )
sdfig.update_layout(
    title_text = 'log of the ratio of cases/population by state',
    geo_scope='usa',
)
sdfig.show()

sd2fig = px.choropleth(states, 
                      locations='stateabbr', 
                      color="logcases", 
                      animation_frame="date", 
                      locationmode='USA-states',
                      color_continuous_scale=px.colors.sequential.Jet,
                      range_color=[0, 6]
                       )
sd2fig.update_layout(
    title_text = 'log of the cases by state',
    geo_scope='usa',
)
sd2fig.show()

#Interpretations and conclusions

We were interested in understanding how COVID-19 impacted the economic situation of different states. We were not surprised to find that New York and Michigan, which have a high number of COVID-19 cases in total and per population, experienced a large decrease in economic prosperity. We were also not surprised to see that states where a large percent of income comes from tourism also experienced large decreases, like Hawaii and Nevada (i.e. Las Vegas). We were, however, very surprised that West Virginia, and more specifically Vermont, were so negatively impacted. For example, as of 4/30/2020, Vermont experienced a greater percent loss compared to New York.

This Vermont finding was shocking to us. We attempted to find potential reasons as to why this was the case and whether future decisions were based on this reason such as state reopening. We found that the political party of the governor had little correlation to this question. We attempted to see whether Vermont had a different age distribution of those with COVID-19 compared to other states, but struggled to process the dataset into a readable plot. We are left unsure as to why Vermont was so economically negatively impacted by COVID-19 compared to other states. There is much left to be explored in this area.

#Future directions

There are many questions raised by the results we found. We were unable to find potential answers to why Vermont suffered so much more than other states economically. Perhaps it's due to the industries that dominate the workforce in that state? Or maybe the age demographic is very different in Vermont compared to surrounding states. Any interested individual or group can explore other potential reasons as to why Vermont has differed so much compared to others.

Other questions we thought of included how specific state policy reactions may have impacted economic proserity, from implementation timing to specific policies. We also were curious in comparing how local versus big businesses were impacted by COVID-19.