# Impact of Trump super-spreader rallies on COVID-19 deaths in United States #

This project investigates the possible effects of Trump campaign rallies on the spread of COVID-19 during the summer and early fall of 2020.

## Background and Motivation ##

Prior to May 25, 2020, COVID-19 deaths in the United States were falling precipitously. On May 25th, that descent slowed dramatically, and then, around the beginning of July, reversed itself, starting a second wave of COVID-19 deaths in the United States. The number of deaths in this second wave, 125K, now exceeds that of the first wave, 100K.

The date, May 25, 2020, is significant in that it is the date on which George Floyd died while in police custody in Minneapolis, MN. Subsequent to Floyd's death, protests occurred in over 2,000 cities in the United States. It has been suggested that the George Floyd Protests might have contributed to triggering the second wave of COVID-19. 

This "hypothesis" is confounded, however, by summer political campaigning in the run up to the 2020 Elections. In particular, President Trump was notable for holding campaign rallies in which the president did not follow normative behavior to control COVID-19 infections and neither did many of the rally attendees--possibly following the president's lead. These became known as _super-spreader rallies_. Some news agencies have reported that COVID-19 infections spiked in the areas where these rallies had been recently held. However, I have not found a _systematic_ investigation of whether and to what extent the rallies were correlated with COVID-19 spread.

_**The yellow arrows in the two graphs below identify May 26, 2020, that is, the day after George Floyd died while in police custody.**_

### COVID-19 deaths: Seven day moving average ###

![](viz/2020-10-23_original-screen-capture.png)

### COVID-19 deaths: Cumulative ###

![](viz/2020-10-23_accumulated-deaths-anno.png)

## Research Questions and Hypothesis ##

The key question is whether it is possible to identify increases in mortality from COVID-19 subsequent to President Trump's rallies and also relatively proximal to where the rallies were held. And further, to see if these increases are greater than we would expect when compared to changes in COVID-19 mortality during the same times periods in other locations.

My hypothesis is that I _can_ identify increases in COVID-19 mortality associated with President Trump's rallies and that these increases are greater than we would expect based on contemporaneous COVID-19 mortality in other areas.

## Data ##

For data that indicates the spread of COVID-19, I will use the _COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University_ hosted on GitHub at the following URL:

>  <https://github.com/CSSEGISandData/COVID-19>

For data on President Trump's campaign rallies, I will use the list maintained on Wikipedia at the following URL:

>  <https://en.wikipedia.org/wiki/List_of_post-election_Donald_Trump_rallies#2020_campaign_rallies>

## Methodology ##

The Johns-Hopkins data tracks COVID-19 deaths per county in the United States. Using geocoding, I can convert the City-State locations for President Trump's rallies into county locations. I should then be able to gather statistics for each of the counties which are also supported by the Johns-Hopkins data.

I can also gather similar statistics for a set of counties where President Trump did _not_ hold rallies and use these as something analogous to _controls_.

Specifically, the questions that I will investigate are, for each of the Trump rallies:

- What was the change in the seven-day moving average for COVID-19 deaths in the 60 days _**before**_ the date of the Trump rally in that county--and in each of the control counties.
- What was the change in the seven-day moving average for COVID-19 deaths in the 60 days _**after**_ the date of the Trump rally in that county--and in each of the control counties.

- What was the total accumulated COVID-19 deaths in the 60 days _**before**_ the date of the Trump rally in that county--and in each of the control counties.
- What was the total accumulated COVID-19 deaths in the 60 days _**after**_ the date of the Trump rally in that county--and in each of the control counties.

And also:

- Compare summary statistics (mean, median, std) for accumulated COVID-19 deaths in the Trump counties vs the control counties.

To be clear, I will look at COVID-19 deaths in the control counties before and after the date of each Trump rally _even though no Trump rally occurred in that county_.

## Unknowns ##

I have, so far, identified a couple unknowns that could affect the success of this investigation.

One unknown is that Trump's rallies were not always held in an urban center. For example, some were held at airports. The president would fly in, speak at a rally at the airport, and then fly out. **In these cases, it is not certain where the attendees originated from; that is, we can't assume that they came from the nearest urban center.** 

To take an example, on September 3, 2020, Trump spoke at a rally at Arnold Palmer Regional Airport outside Latrobe, PA. However, we can't necessarily infer from this that all the attendees were from Latrobe; they might have traveled in from out of area. More importantly, we don't know where the attendees traveled _to_ after the rally. 

(Actually, this issue, that attendees might reside outside the area where the rally was held, applies to some degree even for rallies held inside city centers.)

**Another unknown is that the locations of Trump's rallies are not uniformly distributed across the United States.** This could be for reasons such as campaign strategy. But in any case, regional differences between areas that hosted the rallies and those that didn't could introduce bias into the data.

In [415]:
import os
import geocoder
import pandas as pd

In [416]:
#
# Note that the BING_API_KEY variable needs to be set with your API key
# in the console window from which you launch this Jupyter notebook.
#
g = geocoder.bing( 'Kenosha,s WI', key=os.environ[ 'BING_API_KEY' ] )

print( g.json[ 'raw' ][ 'address' ][ 'adminDistrict2' ] )

Kenosha County


In [417]:
trump_rallies = pd.read_csv('data/trump-rallies.csv', 
        sep=',', 
        comment='#',
        skipinitialspace=True,
        header=0,
        na_values='?')

In [418]:
trump_rallies.columns

Index(['Date', 'City', 'State', 'County'], dtype='object')

In [419]:
trump_rallies.head()

Unnamed: 0,Date,City,State,County
0,2020-06-20,Tulsa,OK,
1,2020-06-23,Phoenix,AZ,
2,2020-08-17,Mankato,MN,
3,2020-08-17,Oshkosh,WI,
4,2020-08-18,Yuma,AZ,


In [420]:
trump_rallies.tail()

Unnamed: 0,Date,City,State,County
62,2020-11-02,Fayetteville,NC,
63,2020-11-02,Scranton,PA,
64,2020-11-02,Traverse City,MI,
65,2020-11-02,Kenosha,WI,
66,2020-11-02,Grand Rapids,MI,


In [421]:
trump_rallies.shape

(67, 4)

In [422]:
target_location = trump_rallies.loc[ 0, "City" ] + ", " + trump_rallies.loc[ 0, "State" ]
target_location

'Tulsa, OK'

In [423]:
g = geocoder.bing( target_location, key=os.environ[ 'BING_API_KEY' ] )
g.json[ 'raw' ][ 'address' ][ 'adminDistrict2' ] 

'Tulsa County'

In [424]:
geocoder.bing( 'The Villages' + ", " + 'FL', key=os.environ[ 'BING_API_KEY' ] ).json[ 'raw' ]

{'__type': 'Location:http://schemas.microsoft.com/search/local/ws/rest/v1',
 'bbox': [28.821828842163086,
  -82.04051208496094,
  28.982059478759766,
  -81.95328521728516],
 'name': 'The Villages, FL',
 'point': {'type': 'Point',
  'coordinates': [28.93280792236328, -81.95913696289062]},
 'address': {'adminDistrict': 'FL',
  'countryRegion': 'United States',
  'formattedAddress': 'The Villages, FL',
  'locality': 'The Villages'},
 'confidence': 'High',
 'entityType': 'PopulatedPlace',
 'geocodePoints': [{'type': 'Point',
   'coordinates': [28.93280792236328, -81.95913696289062],
   'calculationMethod': 'Rooftop',
   'usageTypes': ['Display']}],
 'matchCodes': ['Good']}

In [425]:
geocoder.bing( 'The Villages' + ", " + 'FL', key=os.environ[ 'BING_API_KEY' ] ).json[ 'raw' ][ 'address' ]

{'adminDistrict': 'FL',
 'countryRegion': 'United States',
 'formattedAddress': 'The Villages, FL',
 'locality': 'The Villages'}

In [426]:
def gcode( row ):
    g = geocoder.bing( row[ 'City' ] + ", " + row[ 'State' ], key=os.environ[ 'BING_API_KEY' ] )
    if 'adminDistrict2' in g.json[ 'raw' ][ 'address' ]:
        county = g.json[ 'raw' ][ 'address' ][ 'adminDistrict2' ] 
        return( county )
    else:
        return( 'Sumpter County' )


In [427]:
trump_rallies[ 'County' ] = trump_rallies.apply( gcode, axis = 1 )

In [428]:
trump_rallies.loc[ : , 'County' ].head()

0         Tulsa County
1      Maricopa County
2    Blue Earth County
3     Winnebago County
4          Yuma County
Name: County, dtype: object

In [429]:
trump_rallies.loc[ : , 'County' ].tail()

62        Cumberland County
63        Lackawanna County
64    Grand Traverse County
65           Kenosha County
66              Kent County
Name: County, dtype: object

# Read time series data from Johns-Hopkins COVID-19 repository #

In [430]:
covid_19_time_series_by_county = pd.read_csv('data/time_series_covid19_deaths_US.csv', 
        sep=',', 
        comment='#',
        skipinitialspace=True,
        header=0,
        na_values='?')

In [431]:
covid_19_time_series_by_county.shape

(3340, 324)

In [432]:
covid_19_time_series_by_county.head()

Unnamed: 0,UID,iso2,iso3,code3,FIPS,Admin2,Province_State,Country_Region,Lat,Long_,...,11/19/20,11/20/20,11/21/20,11/22/20,11/23/20,11/24/20,11/25/20,11/26/20,11/27/20,11/28/20
0,84001001,US,USA,840,1001.0,Autauga,Alabama,US,32.539527,-86.644082,...,39,39,39,39,39,39,41,42,42,42
1,84001003,US,USA,840,1003.0,Baldwin,Alabama,US,30.72775,-87.722071,...,84,84,84,84,84,84,98,98,98,98
2,84001005,US,USA,840,1005.0,Barbour,Alabama,US,31.868263,-85.387129,...,10,10,10,10,10,10,10,10,10,10
3,84001007,US,USA,840,1007.0,Bibb,Alabama,US,32.996421,-87.125115,...,18,18,17,17,17,17,17,17,17,17
4,84001009,US,USA,840,1009.0,Blount,Alabama,US,33.982109,-86.567906,...,35,35,36,36,36,36,39,40,40,40


In [433]:
covid_19_time_series_by_county.tail()

Unnamed: 0,UID,iso2,iso3,code3,FIPS,Admin2,Province_State,Country_Region,Lat,Long_,...,11/19/20,11/20/20,11/21/20,11/22/20,11/23/20,11/24/20,11/25/20,11/26/20,11/27/20,11/28/20
3335,84056039,US,USA,840,56039.0,Teton,Wyoming,US,43.935225,-110.58908,...,2,2,2,2,2,2,2,2,2,2
3336,84056041,US,USA,840,56041.0,Uinta,Wyoming,US,41.287818,-110.547578,...,4,4,4,4,4,4,4,4,4,4
3337,84090056,US,USA,840,90056.0,Unassigned,Wyoming,US,0.0,0.0,...,0,0,0,0,0,0,0,0,0,0
3338,84056043,US,USA,840,56043.0,Washakie,Wyoming,US,43.904516,-107.680187,...,7,7,7,7,7,7,8,8,8,8
3339,84056045,US,USA,840,56045.0,Weston,Wyoming,US,43.839612,-104.567488,...,0,0,0,0,1,1,1,1,1,1


Most of the columns are the COVID-19 deaths for a given date. Show the non-date columns.

In [434]:
covid_19_time_series_by_county.columns.array[ 0:12 ]

<PandasArray>
[           'UID',           'iso2',           'iso3',          'code3',
           'FIPS',         'Admin2', 'Province_State', 'Country_Region',
            'Lat',          'Long_',   'Combined_Key',     'Population']
Length: 12, dtype: object

The `Admin2` column contains the county. There are many duplicates in that column; we can't merge on it.

In [435]:
len( covid_19_time_series_by_county.loc[ :, 'Admin2' ] ) 

3340

In [436]:
len( covid_19_time_series_by_county.loc[ :, 'Admin2' ].unique() )

1979

The `Combined_Key` column provides a _primary key_ that uniquely identifies the row.

In [437]:
covid_19_time_series_by_county.loc[ :, 'Combined_Key' ]

0          Autauga, Alabama, US
1          Baldwin, Alabama, US
2          Barbour, Alabama, US
3             Bibb, Alabama, US
4           Blount, Alabama, US
                 ...           
3335         Teton, Wyoming, US
3336         Uinta, Wyoming, US
3337    Unassigned, Wyoming, US
3338      Washakie, Wyoming, US
3339        Weston, Wyoming, US
Name: Combined_Key, Length: 3340, dtype: object

Remove unneeded columns.

In [438]:
covid_19_time_series_by_county.drop( [ 'UID', 'iso2', 'iso3', 'code3', 'FIPS', 'Admin2', 'Province_State', 'Country_Region' ] , axis = 1, inplace = True )

In [439]:
covid_19_time_series_by_county.head()

Unnamed: 0,Lat,Long_,Combined_Key,Population,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,11/19/20,11/20/20,11/21/20,11/22/20,11/23/20,11/24/20,11/25/20,11/26/20,11/27/20,11/28/20
0,32.539527,-86.644082,"Autauga, Alabama, US",55869,0,0,0,0,0,0,...,39,39,39,39,39,39,41,42,42,42
1,30.72775,-87.722071,"Baldwin, Alabama, US",223234,0,0,0,0,0,0,...,84,84,84,84,84,84,98,98,98,98
2,31.868263,-85.387129,"Barbour, Alabama, US",24686,0,0,0,0,0,0,...,10,10,10,10,10,10,10,10,10,10
3,32.996421,-87.125115,"Bibb, Alabama, US",22394,0,0,0,0,0,0,...,18,18,17,17,17,17,17,17,17,17
4,33.982109,-86.567906,"Blount, Alabama, US",57826,0,0,0,0,0,0,...,35,35,36,36,36,36,39,40,40,40


# Synthesize a key for the Trump rallies dataframe to use for merging #

Read in a dataset that maps from state names to state abbreviations.

In [440]:
state_abbr = pd.read_csv('data/state-abbr.csv', 
        sep=',', 
        comment='#',
        skipinitialspace=True,
        header=0,
        na_values='?')

In [441]:
state_abbr.head()

Unnamed: 0,State,Abbr
0,Alabama,AL
1,Alaska,AK
2,Arizona,AZ
3,Arkansas,AR
4,California,CA


In [442]:
state_abbr.tail()

Unnamed: 0,State,Abbr
45,Virginia,VA
46,Washington,WA
47,West Virginia,WV
48,Wisconsin,WI
49,Wyoming,WY


Create a dictionary from the two columns of our state/abbr dataframe.

In [443]:
map_abbr_state = dict( zip( state_abbr.Abbr.str.strip(), state_abbr.State.str.strip() ) )

In [444]:
map_abbr_state.keys()

dict_keys(['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY'])

In [445]:
map_abbr_state[ 'VA' ]

'Virginia'

In [446]:
def create_combined_key_for_trump( row ):
    combined = row[ 'County' ][ 0:-6 ].rstrip() + ", " + map_abbr_state[ row[ 'State' ] ] + ", " + 'US'
    return combined

trump_rallies[ 'Combined_Key' ] = trump_rallies.apply( create_combined_key_for_trump, axis = 1 )

In [447]:
trump_rallies[ 'Combined_Key' ].head()

0          Tulsa, Oklahoma, US
1        Maricopa, Arizona, US
2    Blue Earth, Minnesota, US
3     Winnebago, Wisconsin, US
4            Yuma, Arizona, US
Name: Combined_Key, dtype: object

In [448]:
trump_rallies.head()

Unnamed: 0,Date,City,State,County,Combined_Key
0,2020-06-20,Tulsa,OK,Tulsa County,"Tulsa, Oklahoma, US"
1,2020-06-23,Phoenix,AZ,Maricopa County,"Maricopa, Arizona, US"
2,2020-08-17,Mankato,MN,Blue Earth County,"Blue Earth, Minnesota, US"
3,2020-08-17,Oshkosh,WI,Winnebago County,"Winnebago, Wisconsin, US"
4,2020-08-18,Yuma,AZ,Yuma County,"Yuma, Arizona, US"


In [449]:
trump_rallies = trump_rallies.merge( covid_19_time_series_by_county, how = "left", on = "Combined_Key")

In [450]:
trump_rallies.columns

Index(['Date', 'City', 'State', 'County', 'Combined_Key', 'Lat', 'Long_',
       'Population', '1/22/20', '1/23/20',
       ...
       '11/19/20', '11/20/20', '11/21/20', '11/22/20', '11/23/20', '11/24/20',
       '11/25/20', '11/26/20', '11/27/20', '11/28/20'],
      dtype='object', length=320)

### --- END --- ###