<a href="https://www.kaggle.com/code/an1ndya/us-vaccine-tracker?scriptVersionId=158094100" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Introduction

It can be a troubling time, but we do have hope on the horizon, with the news we get daily about vaccines. Multiple companies are releasing and getting their vaccines approved; we may  soon see a path forward. 

Using the robust toolset provided by Kaggle, I'll show you how to create an interactive map to track, for each state, the percentage of inhabitants that have been vaccinated against COVID-19.  

To get started, if you haven't already, make your own copy of this notebook by clicking on the **[Copy and Edit]** button in the top right corner. 

This notebook is an example of a project that you can create based on what you'd learn from taking Kaggle's [Geospatial Analysis course](https://www.kaggle.com/learn/geospatial-analysis).

# US Vaccine Tracker

We'll use two datasets.  

- The first dataset has the total number of inhabitants of each state, along with latitude and longitude data for each state's capital city.  This dataset is pulled from the 2019 US Census, and I've uploaded it [here](https://www.kaggle.com/peretzcohen/2019-census-us-population-data-by-state).
- The second dataset contains a recent estimate for the total number of people that have been vaccinated in each state.  This [vaccine dataset](https://github.com/owid/covid-19-data/blob/master/public/data/vaccinations/us_state_vaccinations.csv) is drawn from [Our World In Data](https://ourworldindata.org/), who update their vaccine datasets from the CDC quite regularly.  Every time you run this notebook, you'll use the most recent version of their data.

In the next code cell, we load and preprocess the data.  As output, you'll see the total percent of the population that has been vaccinated in the US, along with a preview of the Pandas DataFrame that we'll use to make the tracker.

In [1]:
# Imports
import pandas as pd
from datetime import date, timedelta
import folium
from folium import Marker
from folium.plugins import MarkerCluster
import math
import matplotlib.pyplot as plt
import seaborn as sns

# Population Data
populationData = pd.read_csv('/kaggle/input/2019-census-us-population-data-by-state/2019_Census_US_Population_Data_By_State_Lat_Long.csv')

# Get the most recent date for filtering
#mins days=1 to 700 to get result
freshDate = date.today() - timedelta(days=700)
freshDate = date.strftime(freshDate,"%Y%m%d")
freshDate = freshDate[0:4] + "-" + freshDate[4:6] + "-" + freshDate[6:8]

# Vaccination data, for most recent date
vaccinationData = pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/us_state_vaccinations.csv')

# As this is old data, freshdate comment 
vaccinationByLocation = vaccinationData.loc[(vaccinationData.date == freshDate)][["location", "people_vaccinated"]]
#vaccinationByLocation = vaccinationData[["location", "people_vaccinated"]]
# Vaccination and population data
vaccinationAndPopulationByLocation = pd.merge(populationData, vaccinationByLocation, left_on='STATE',right_on='location').drop(columns="location")

# Calculate percentage vaccinated by state
vaccinationAndPopulationByLocation["percent_vaccinated"] = vaccinationAndPopulationByLocation["people_vaccinated"] / vaccinationAndPopulationByLocation["POPESTIMATE2019"]

vaccinationAndPopulationByLocation


Unnamed: 0,STATE,POPESTIMATE2019,lat,long,people_vaccinated,percent_vaccinated
0,Alabama,4903185,32.377716,-86.300568,3010724.0,0.614034
1,Alaska,731545,58.301598,-134.420212,498030.0,0.680792
2,Arizona,7278717,33.448143,-112.096962,5140528.0,0.706241
3,Arkansas,3017804,34.746613,-92.288986,1967757.0,0.652049
4,California,39512223,38.576668,-121.493629,31963547.0,0.808953
5,Colorado,5758736,39.739227,-104.984856,4476355.0,0.777316
6,Connecticut,3565287,41.764046,-72.682198,3317919.0,0.930618
7,Delaware,973764,39.157307,-75.519722,786600.0,0.807793
8,District of Columbia,705749,38.89511,-77.03637,661870.0,0.937826
9,Florida,21477737,30.438118,-84.281296,16638144.0,0.774669


In [2]:
print("Date ran:", date.today())
print("Old Fresh Date(date on data):", freshDate)

# Calculate the total percent vaccinated in the US
percentageTotal = vaccinationAndPopulationByLocation["people_vaccinated"].sum() / vaccinationAndPopulationByLocation["POPESTIMATE2019"].sum()
print('Percentage Vaccinated in the US: {}%'.format(round(percentageTotal*100, 2))) 

Date ran: 2024-01-08
Old Fresh Date(date on data): 2022-02-07
Percentage Vaccinated in the US: 69.68%


The next code cell uses the data to create a tracker, with one marker for each state.  You can click on the markers to see the percentage of the population that has been vaccinated.

In [3]:
# Create the map
v_map = folium.Map(location=[42.32,-71.0589], tiles='cartodbpositron', zoom_start=4) 

# Add points to the map
mc = MarkerCluster()
for idx, row in vaccinationAndPopulationByLocation.iterrows(): 
    if not math.isnan(row['long']) and not math.isnan(row['lat']):
        mc.add_child(Marker(location=[row['lat'], row['long']],
                            tooltip=str(round(row['percent_vaccinated']*100, 2))+"%"))
v_map.add_child(mc)

# Display the map
v_map

# Your turn

Here are some ideas for how you might improve on the work here:
- In Kaggle's [Geospatial Analysis course](https://www.kaggle.com/learn/geospatial-analysis), you learn how to use folium to create many different types of interactive maps.  How might you use this data to instead create a choropleth map?
- In case you would like to work with more data sources,
  - The Centers for Disease Control and Prevention (CDC) in the US releases daily vaccine data and has a vaccination progress tracker on its [COVID Data Tracker site](https://covid.cdc.gov/covid-data-tracker/#vaccinations).
  - NBC News has a [vaccine tracker](https://www.nbcnews.com/health/health-news/map-covid-19-vaccination-tracker-across-u-s-n1252085) as well which is quite well done.
  
Once you have created your own extension of this work, let us know about it in the comments!

In [4]:
#Get the SHP file
import geopandas as geopd
us_shape = geopd.read_file('/kaggle/input/us-state/cb_2018_us_state_20m')
#filter the states
#states = us_shape[~us_shape['STUSPS'].isin(['PR', 'NH', 'NY', 'IL'])]
states = us_shape
states = states.loc[:, ['NAME', 'geometry']].set_index('NAME')
states.reset_index(drop = True)
#filter the vaccines data
vaccines = vaccinationAndPopulationByLocation.loc[:, ['STATE', 'people_vaccinated', 'percent_vaccinated']].sort_values('STATE')
vaccines['percent_vaccinated'] = vaccines['percent_vaccinated'] * 100
vaccines.set_index('STATE', inplace = True)
#combine both Dataframe
final = pd.concat([states, vaccines], axis = 1)
final = final.reset_index().rename(columns = {'index': 'state'})
final

Unnamed: 0,state,geometry,people_vaccinated,percent_vaccinated
0,Maryland,"MULTIPOLYGON (((-76.04621 38.02553, -76.00734 ...",5081707.0,84.055177
1,Iowa,"POLYGON ((-96.62187 42.77925, -96.57794 42.827...",2108521.0,66.829611
2,Delaware,"POLYGON ((-75.77379 39.72220, -75.75323 39.757...",786600.0,80.779326
3,Ohio,"MULTIPOLYGON (((-82.86334 41.69369, -82.82572 ...",7303877.0,62.484511
4,Pennsylvania,"POLYGON ((-80.51989 40.90666, -80.51964 40.987...",10559090.0,82.480074
5,Nebraska,"POLYGON ((-104.05314 41.11446, -104.05245 41.2...",1332446.0,68.881332
6,Washington,"MULTIPOLYGON (((-123.23715 48.68347, -123.0704...",6015038.0,78.990447
7,Puerto Rico,"MULTIPOLYGON (((-65.34207 18.34529, -65.25593 ...",,
8,Alabama,"POLYGON ((-88.46866 31.89386, -88.46866 31.933...",3010724.0,61.403435
9,Arkansas,"POLYGON ((-94.61792 36.49941, -94.36120 36.499...",1967757.0,65.20493


In [5]:
#mapping
m = folium.Map(location = [42.32,-71.0589], tiles = 'cartodbpositron', zoom_start = 4)
ch = folium.Choropleth(geo_data = final, 
                  data = final, 
                  columns = ['state', 'percent_vaccinated'], 
                  fill_color = 'YlGn', 
                  key_on = 'feature.properties.state', 
                  fill_opacity = 0.5, 
                  legend_name = 'Percent Vaccinated'
                 ).add_to(m)

folium.GeoJsonTooltip(['state', 'percent_vaccinated']).add_to(ch.geojson)


m
