# COVID-19 Data Analysis

Objective: Prepare data visualizations and a corresponding report which highlights CoVID-19 cases and deaths. Separate visualizations for confirmed cases and deaths are included for both the United States and Worldwide. 

- Data on number of cases in each region (US and global) should be displayed. 
- Trend of cases from March till today should be presented
- Deaths from CoVID-19 should be represented visually
- Data on number of cases in each region(US and global) should be displayed. Trend of cases from March till today should be presented

Data Source: Microsoft Bing COVID-19 Tracker https://www.bing.com/covid.
CSV Data - https://github.com/microsoft/Bing-COVID-19-Data

In [1]:
# Dependencies and Setup
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import gmaps
# Google API Key
from config import gkey
# Configure gmaps
gmaps.configure(api_key=gkey)

In [2]:
# store the url path to the raw csv data on Github, which is updated daily around 3AM PST. This way we can 
data_url = 'https://raw.githubusercontent.com/microsoft/Bing-COVID-19-Data/master/data/Bing-COVID19-Data.csv'


In [3]:
# use pandas to fetch and read in the csv file
covid_df = pd.read_csv(data_url, error_bad_lines=False)
covid_df.head()

Unnamed: 0,ID,Updated,Confirmed,ConfirmedChange,Deaths,DeathsChange,Recovered,RecoveredChange,Latitude,Longitude,ISO2,ISO3,Country_Region,AdminRegion1,AdminRegion2
0,338995,01/21/2020,262,,0.0,,,,,,,,Worldwide,,
1,338996,01/22/2020,313,51.0,0.0,0.0,,,,,,,Worldwide,,
2,338997,01/23/2020,578,265.0,0.0,0.0,,,,,,,Worldwide,,
3,338998,01/24/2020,841,263.0,0.0,0.0,,,,,,,Worldwide,,
4,338999,01/25/2020,1320,479.0,0.0,0.0,,,,,,,Worldwide,,


In [4]:
# Inspect the data. How many regions do we have to plot?
print('AdminRegion1 Counts:')
print(covid_df['AdminRegion1'].value_counts())
print('-----------------------------------')
print('AdminRegion2 Counts:')
print(covid_df['AdminRegion2'].value_counts())
# Stats show Region 1 to be at the state/province level
# Region 2 is more granular data for County/municipality

AdminRegion1 Counts:
Texas              10605
Georgia            10133
England             7138
Virginia            6973
North Carolina      5961
                   ...  
Lakshadweep            2
Occitanie              1
Western Visayas        1
Western Sahara         1
Bonaire                1
Name: AdminRegion1, Length: 819, dtype: int64
-----------------------------------
AdminRegion2 Counts:
Washington County     1694
Jefferson County      1462
Franklin County       1379
Jackson County        1285
Lincoln County        1231
                      ... 
Mahisagar                1
Gemeente Rotterdam       1
Greater Manchester       1
North Delhi              1
Cavalier County          1
Name: AdminRegion2, Length: 2361, dtype: int64


## Create Heatmaps to show Daily Cases and Daily Deaths from COVID-19

Because the data source is updated daily, we can create a subset of the data for the latest/maximum date available for cases and deaths, and make a daily heat map that will update anytime the notebook is run.

In [5]:
date_cases_deaths = covid_df[['Updated', 'Confirmed', 'Deaths', 'Latitude', 'Longitude']]
# remove Null and NaN
date_cases_deaths = date_cases_deaths.dropna()
# filter by the max date field
latest_date_cases_deaths = date_cases_deaths[date_cases_deaths['Updated'] == max(date_cases_deaths['Updated'])]
# rename a few fields
latest_date_cases_deaths = latest_date_cases_deaths.rename(columns={'Updated': 'Date', 'Confirmed': 'Cases'})
latest_date_cases_deaths

Unnamed: 0,Date,Cases,Deaths,Latitude,Longitude
303,05/28/2020,1076,33.0,41.13929,20.06431
397,05/28/2020,8997,630.0,28.15509,2.67875
830,05/28/2020,14689,508.0,-35.18115,-65.09386
1729,05/28/2020,8676,120.0,40.29264,44.93948
1805,05/28/2020,101,3.0,12.50914,-69.97044
...,...,...,...,...,...
230960,05/28/2020,380,36.0,48.82962,-121.86710
231025,05/28/2020,16,0.0,46.90122,-117.52300
239363,05/28/2020,811,22.0,-32.96974,-56.05606
239608,05/28/2020,1327,11.0,7.15699,-66.22324


## HeatMap of Daily Cases of COVID-19

In [31]:
# check min and max values of dataset to set granularity for heatmap
print('Min case value:', latest_date_cases_deaths['Cases'].min())
print('Max case value', latest_date_cases_deaths['Cases'].max())

Min case value: 0
Max case value 1709318


In [16]:
# normalize the data to produce a better user experience with the Heat Map formula = (value - min / max - min) * 100
latest_date_cases_deaths['Cases_norm'] = latest_date_cases_deaths['Cases'].apply(lambda x: ((x - min(latest_date_cases_deaths['Cases'])) / (max(latest_date_cases_deaths['Cases']) - min(latest_date_cases_deaths['Cases'])))*100)


In [40]:
# set locations and value
locations = latest_date_cases_deaths[['Latitude', 'Longitude']]
value = latest_date_cases_deaths['Cases']

In [54]:
# Plot Heatmap
fig = gmaps.figure(map_type='HYBRID', center=(-3,0), zoom_level=2)
# Create heat layer
heat_layer = gmaps.heatmap_layer(locations, weights=value, 
                                 dissipating=True, max_intensity=100000,
                                 point_radius=40)
# Add layer
fig.add_layer(heat_layer)
fig

Figure(layout=FigureLayout(height='420px'))