# COVID-19 Dashboard: Figures for the Pandemic
DATE ACQUIRED: 26th April 2020

DATASET SOURCE: https://data.world/covid-19-data-resource-hub/covid-19-case-counts/workspace/file?filename=COVID-19+Cases.csv

I have imported all the required library in the beginning of the notebook to give you an idea of what libraries are going to be used. 

In the following notebook, there are various interactive maps and graphs made using sliders, hovering your mouse on, toggle spike lines, drop down menu and visualization using colour. Each of them require a widely different kind of coding to be executed. 
Not all the datasets are coded from scratch and reference from various websites were used. reference can include the structure or even a single line from codes.

Although starting with world data, I have focused on US dataset within the notebook since it has highest amount of cases and most information on. I have provided markdowns throughout the notebook for better understanding rather than having it in a single paragraph.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.colors as mpl_colors
import pandas as pd
import numpy as np
import ipywidgets
import os

In [2]:
from plotly.offline import plot, iplot, init_notebook_mode
init_notebook_mode(connected=True)
import plotly.express as px
import ipywidgets
import plotly.offline as py
import plotly.graph_objs as go

In [3]:
import json
from bqplot import Lines, Figure, LinearScale, DateScale, Axis
import bqplot.pyplot as bqpl
from ipyleaflet import Map, GeoJSON, WidgetControl
from pathlib import Path
import requests 

In [4]:
#dfd = pd.read_csv("/Users/shazmeenshaikh/Downloads/COVID-19 Cases.csv",
               #parse_dates=['Date'])

In [5]:
dfs = pd.read_csv("https://query.data.world/s/tr6q6z7xmck47oiftpnfmyjx6g4qck",
                parse_dates=['Date'])


Columns (8) have mixed types. Specify dtype option on import or set low_memory=False.



In [6]:
dfs

Unnamed: 0,Case_Type,People_Total_Tested_Count,Cases,Difference,Date,Combined_Key,Country_Region,Province_State,Admin2,iso2,iso3,FIPS,Lat,Long,Population_Count,People_Hospitalized_Cumulative_Count,Data_Source,Prep_Flow_Runtime
0,Confirmed,,0,0,2020-02-03,Switzerland,Switzerland,,,CH,CHE,,46.818200,8.227500,8654618.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM
1,Confirmed,,23,0,2020-04-21,Antigua and Barbuda,Antigua and Barbuda,,,AG,ATG,,17.060800,-61.796400,97928.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM
2,Deaths,,0,0,2020-03-01,Cyprus,Cyprus,,,CY,CYP,,35.126400,33.429900,1207361.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM
3,Deaths,,0,0,2020-02-11,Jamaica,Jamaica,,,JM,JAM,,18.109600,-77.297500,2961161.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM
4,Confirmed,,0,0,2020-02-06,Belize,Belize,,,BZ,BLZ,,17.189900,-88.497600,397621.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
675643,Confirmed,,0,0,2020-01-22,"Traill, North Dakota, US",US,North Dakota,Traill,US,USA,38097.0,47.453678,-97.163233,8036.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM
675644,Deaths,,0,0,2020-01-22,Angola,Angola,,,AO,AGO,,-11.202700,17.873900,32866268.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM
675645,Deaths,,0,0,2020-01-22,"Loup, Nebraska, US",US,Nebraska,Loup,US,USA,31115.0,41.913720,-99.454404,664.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM
675646,Deaths,,0,0,2020-01-22,"Mercer, North Dakota, US",US,North Dakota,Mercer,US,USA,38057.0,47.312131,-101.831840,8187.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM


The following sample data was taken due to the need of having ISO 3166-3 country codes requirement by plotly to plot data for bubblemap. relevant rows were extracted from the dataset (country and iso_alpha) and merged with the original dataset. The "country" was then mapped with "Country_Region". The reason for this was, I couldn't find a 3 letter ISO code dictionary on the internet and had to find anything closest to it.

In [7]:
sample = px.data.gapminder().query("year==2007")
sample.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
11,Afghanistan,Asia,2007,43.828,31889923,974.580338,AFG,4
23,Albania,Europe,2007,76.423,3600523,5937.029526,ALB,8
35,Algeria,Africa,2007,72.301,33333216,6223.367465,DZA,12
47,Angola,Africa,2007,42.731,12420476,4797.231267,AGO,24
59,Argentina,Americas,2007,75.32,40301927,12779.37964,ARG,32


In [8]:
sample.replace(to_replace ="United States", value ="US",inplace = True) 
#Had to replace United States with US in order to match both the dataset.

In [9]:
#extracted 2 required column from sample dataset
temp = sample[['country', 'iso_alpha']]
temp

Unnamed: 0,country,iso_alpha
11,Afghanistan,AFG
23,Albania,ALB
35,Algeria,DZA
47,Angola,AGO
59,Argentina,ARG
...,...,...
1655,Vietnam,VNM
1667,West Bank and Gaza,PSE
1679,"Yemen, Rep.",YEM
1691,Zambia,ZMB


In [10]:
#There was no row for russia and thus I had to create a row in the final dataset
df2 = pd.DataFrame([['Russia','RUS']], columns=['country','iso_alpha'])
pd.concat([df2,temp])

Unnamed: 0,country,iso_alpha
0,Russia,RUS
11,Afghanistan,AFG
23,Albania,ALB
35,Algeria,DZA
47,Angola,AGO
...,...,...
1655,Vietnam,VNM
1667,West Bank and Gaza,PSE
1679,"Yemen, Rep.",YEM
1691,Zambia,ZMB


In [11]:
#merged both the datset
df = pd.merge(dfs, temp, left_on='Country_Region', right_on='country')
df

Unnamed: 0,Case_Type,People_Total_Tested_Count,Cases,Difference,Date,Combined_Key,Country_Region,Province_State,Admin2,iso2,iso3,FIPS,Lat,Long,Population_Count,People_Hospitalized_Cumulative_Count,Data_Source,Prep_Flow_Runtime,country,iso_alpha
0,Confirmed,,0,0,2020-02-03,Switzerland,Switzerland,,,CH,CHE,,46.818200,8.227500,8654618.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,Switzerland,CHE
1,Deaths,,1106,70,2020-04-12,Switzerland,Switzerland,,,CH,CHE,,46.818200,8.227500,8654618.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,Switzerland,CHE
2,Confirmed,,18,10,2020-02-29,Switzerland,Switzerland,,,CH,CHE,,46.818200,8.227500,8654618.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,Switzerland,CHE
3,Confirmed,,214,100,2020-03-06,Switzerland,Switzerland,,,CH,CHE,,46.818200,8.227500,8654618.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,Switzerland,CHE
4,Deaths,,0,0,2020-02-16,Switzerland,Switzerland,,,CH,CHE,,46.818200,8.227500,8654618.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,Switzerland,CHE
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
664699,Deaths,,0,0,2020-01-22,"Clermont, Ohio, US",US,Ohio,Clermont,US,USA,39025.0,39.048475,-84.153758,206428.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,US,USA
664700,Confirmed,,0,0,2020-01-22,"Traill, North Dakota, US",US,North Dakota,Traill,US,USA,38097.0,47.453678,-97.163233,8036.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,US,USA
664701,Deaths,,0,0,2020-01-22,"Loup, Nebraska, US",US,Nebraska,Loup,US,USA,31115.0,41.913720,-99.454404,664.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,US,USA
664702,Deaths,,0,0,2020-01-22,"Mercer, North Dakota, US",US,North Dakota,Mercer,US,USA,38057.0,47.312131,-101.831840,8187.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,US,USA


In [12]:
#pip install plotly==4.6.0

We will sort the cases based on date and display it using an interactive dashboard where we can view cases for a particular window frame.

In [13]:
#Case count over time for the world using df.melt function
#reference: https://pandas.pydata.org/docs/reference/api/pandas.melt.html
Case_Count= df.groupby('Date')['Cases'].sum().reset_index()
Case_Count = Case_Count.melt(id_vars="Date", value_vars=['Cases'],
                 var_name='Case', value_name='Count')
Case_Count.head()

Unnamed: 0,Date,Case,Count
0,2020-01-22,Cases,570
1,2020-01-23,Cases,670
2,2020-01-24,Cases,962
3,2020-01-25,Cases,1471
4,2020-01-26,Cases,2167


# Figure 1 -Cases over time with a slider for selecting window frame

In [14]:
# using plotly express
fig = px.area(Case_Count, x="Date", y="Count", color='Case', height=800,
             title='Cases over time', color_discrete_sequence = ['Crimson'], orientation = 'v')
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

Trial and error using confirmed cases.

In [15]:
Confirmed_Cases = df.loc[df['Case_Type'] == "Confirmed"]
Confirmed_Cases.head()

Unnamed: 0,Case_Type,People_Total_Tested_Count,Cases,Difference,Date,Combined_Key,Country_Region,Province_State,Admin2,iso2,iso3,FIPS,Lat,Long,Population_Count,People_Hospitalized_Cumulative_Count,Data_Source,Prep_Flow_Runtime,country,iso_alpha
0,Confirmed,,0,0,2020-02-03,Switzerland,Switzerland,,,CH,CHE,,46.8182,8.2275,8654618.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,Switzerland,CHE
2,Confirmed,,18,10,2020-02-29,Switzerland,Switzerland,,,CH,CHE,,46.8182,8.2275,8654618.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,Switzerland,CHE
3,Confirmed,,214,100,2020-03-06,Switzerland,Switzerland,,,CH,CHE,,46.8182,8.2275,8654618.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,Switzerland,CHE
13,Confirmed,,8,7,2020-02-27,Switzerland,Switzerland,,,CH,CHE,,46.8182,8.2275,8654618.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,Switzerland,CHE
15,Confirmed,,19606,779,2020-04-03,Switzerland,Switzerland,,,CH,CHE,,46.8182,8.2275,8654618.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,Switzerland,CHE


For our dataset, We have the number of cases adding up rather than distinct value and hence, we need to display the most recent case number to keep the visualization updated. Here, We selected the most recent date and only confirmed cases to move forward with our study. we grouped by the country and cases into a int datatype.

In [16]:
recent_date = df['Date'].max()
recent_cases = df.loc[(df['Date'] == recent_date) & (df['Case_Type']=="Confirmed")].groupby("Country_Region")["Cases"].sum()
recent_cases.head()

Country_Region
Afghanistan    1531
Albania         726
Algeria        3382
Angola           26
Argentina      3892
Name: Cases, dtype: int64

In [17]:
#creating a dataframe from the list above
Country_Wise_Sort = df.loc[(df['Date'] == recent_date) & (df['Case_Type']=="Confirmed")].groupby("Country_Region", as_index = False)["Cases"].sum()
Country_Wise_Sort.head()

Unnamed: 0,Country_Region,Cases
0,Afghanistan,1531
1,Albania,726
2,Algeria,3382
3,Angola,26
4,Argentina,3892


In [18]:
#we create a table to visualize the most hit country by the case
Most_Cases = Country_Wise_Sort.sort_values('Cases', ascending= False)
Most_Cases

Unnamed: 0,Country_Region,Cases
118,US,965633
106,Spain,226629
59,Italy,197675
40,France,162220
43,Germany,157770
...,...,...
83,Nicaragua,13
17,Burundi,11
42,Gambia,10
72,Mauritania,7


In [19]:
#viewed countries with top cases
iso_data = df.loc[(df['Date'] == recent_date) & (df['Case_Type']=="Confirmed")].groupby(["Country_Region","iso_alpha"],as_index=False)["Cases"].sum()
iso_data.sort_values('Cases',ascending = False)

Unnamed: 0,Country_Region,iso_alpha,Cases
118,US,USA,965633
106,Spain,ESP,226629
59,Italy,ITA,197675
40,France,FRA,162220
43,Germany,DEU,157770
...,...,...,...
83,Nicaragua,NIC,13
17,Burundi,BDI,11
42,Gambia,GMB,10
72,Mauritania,MRT,7


# Figure 2 - Bubblemap with Hover and point feature with world map (Zoom and pan feature using plotly)

In [20]:
fig1 = px.scatter_geo(iso_data, locations="iso_alpha", color="Country_Region",
                     hover_name="Cases", size="Cases",
                      projection="natural earth")
fig1.show()

# Figure 3 - Choropleth map for world with colorscale (Hover and point using Plotly graph object

In [21]:
fig2 = go.Figure(data=go.Choropleth(
    locations = iso_data['iso_alpha'],
    z = iso_data['Cases'],
    text = iso_data['Country_Region'],
    colorscale = 'RdPu',
    autocolorscale=False,
    marker_line_color='darkgray',
    marker_line_width=0.8,
    colorbar_title = 'Cases',
))

fig2.update_layout(
    title_text='Cases Around the World',
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='equirectangular'
    ),
    annotations = [dict(
        x=0.55,
        y=0.1,
        xref='paper',
        yref='paper',
        text='Source: https://plotly.com/python/choropleth-maps/#world-choropleth-map',
        showarrow = False
    )]
)

# fig2.show()
f2 = go.FigureWidget(fig2)
f2

FigureWidget({
    'data': [{'autocolorscale': False,
              'colorbar': {'title': {'text': 'Cases'}},
…

In [22]:
#Tried to create a callback function but failed
# create our callback function
#def update_point(trace, points, selector):
 #   print("Here")


#f2.on_click(update_point)

# US specific Data

In [23]:
us = df.loc[df['Country_Region'] == "US"]
us

Unnamed: 0,Case_Type,People_Total_Tested_Count,Cases,Difference,Date,Combined_Key,Country_Region,Province_State,Admin2,iso2,iso3,FIPS,Lat,Long,Population_Count,People_Hospitalized_Cumulative_Count,Data_Source,Prep_Flow_Runtime,country,iso_alpha
38784,Deaths,,0,0,2020-02-02,"Floyd, Indiana, US",US,Indiana,Floyd,US,USA,18043.0,38.321180,-85.903854,78522.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,US,USA
38785,Confirmed,,0,0,2020-03-09,"Grayson, Kentucky, US",US,Kentucky,Grayson,US,USA,21085.0,37.462311,-86.342490,26427.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,US,USA
38786,Deaths,,0,0,2020-03-25,"Mackinac, Michigan, US",US,Michigan,Mackinac,US,USA,26097.0,46.070290,-85.049805,10799.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,US,USA
38787,Deaths,,0,0,2020-01-23,"Hancock, Georgia, US",US,Georgia,Hancock,US,USA,13141.0,33.272157,-82.997669,8457.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,US,USA
38788,Deaths,,0,0,2020-03-02,"Lewis, Idaho, US",US,Idaho,Lewis,US,USA,16061.0,46.233153,-116.434146,3838.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,US,USA
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
664699,Deaths,,0,0,2020-01-22,"Clermont, Ohio, US",US,Ohio,Clermont,US,USA,39025.0,39.048475,-84.153758,206428.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,US,USA
664700,Confirmed,,0,0,2020-01-22,"Traill, North Dakota, US",US,North Dakota,Traill,US,USA,38097.0,47.453678,-97.163233,8036.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,US,USA
664701,Deaths,,0,0,2020-01-22,"Loup, Nebraska, US",US,Nebraska,Loup,US,USA,31115.0,41.913720,-99.454404,664.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,US,USA
664702,Deaths,,0,0,2020-01-22,"Mercer, North Dakota, US",US,North Dakota,Mercer,US,USA,38057.0,47.312131,-101.831840,8187.0,,2019 Novel Coronavirus COVID-19 (2019-nCoV) Da...,4/26/2020 11:21:38 PM,US,USA


In [24]:
us_cases = us.groupby("Date")["Cases"].sum()
us_cases

Date
2020-01-22          1
2020-01-23          1
2020-01-24          2
2020-01-25          2
2020-01-26          5
               ...   
2020-04-22     882410
2020-04-23     918739
2020-04-24     956696
2020-04-25     991754
2020-04-26    1020511
Name: Cases, Length: 96, dtype: int64

# Figure 4 - Dropdown line graph for USA specific cases overtime

In [25]:
us_temp = us.groupby("Date",as_index=False)["Cases"].sum()
def plot_us_cases(log="Linear"):
    if log=="Linear":
        fig_loglin = px.line(us_temp, x="Date", y="Cases", title='Cases over time')
    else:
        fig_loglin = px.line(us_temp, x="Date", y="Cases", title='Cases over time',log_y = True)
    fig_loglin.show()
ipywidgets.interact(plot_us_cases, log=['Linear','Log']);

interactive(children=(Dropdown(description='log', options=('Linear', 'Log'), value='Linear'), Output()), _dom_…

In [26]:
total_count = recent_cases.sum(skipna = True)
total_count

2803429

In [27]:
# State abbr needed because map only works with state codes.
state_abbrev = {
    'Alabama': 'AL',
    'Alaska': 'AK',
    'American Samoa': 'AS',
    'Arizona': 'AZ',
    'Arkansas': 'AR',
    'California': 'CA',
    'Colorado': 'CO',
    'Connecticut': 'CT',
    'Delaware': 'DE',
    'District of Columbia': 'DC',
    'Florida': 'FL',
    'Georgia': 'GA',
    'Guam': 'GU',
    'Hawaii': 'HI',
    'Idaho': 'ID',
    'Illinois': 'IL',
    'Indiana': 'IN',
    'Iowa': 'IA',
    'Kansas': 'KS',
    'Kentucky': 'KY',
    'Louisiana': 'LA',
    'Maine': 'ME',
    'Maryland': 'MD',
    'Massachusetts': 'MA',
    'Michigan': 'MI',
    'Minnesota': 'MN',
    'Mississippi': 'MS',
    'Missouri': 'MO',
    'Montana': 'MT',
    'Nebraska': 'NE',
    'Nevada': 'NV',
    'New Hampshire': 'NH',
    'New Jersey': 'NJ',
    'New Mexico': 'NM',
    'New York': 'NY',
    'North Carolina': 'NC',
    'North Dakota': 'ND',
    'Northern Mariana Islands':'MP',
    'Ohio': 'OH',
    'Oklahoma': 'OK',
    'Oregon': 'OR',
    'Pennsylvania': 'PA',
    'Puerto Rico': 'PR',
    'Rhode Island': 'RI',
    'South Carolina': 'SC',
    'South Dakota': 'SD',
    'Tennessee': 'TN',
    'Texas': 'TX',
    'Utah': 'UT',
    'Vermont': 'VT',
    'Virgin Islands': 'VI',
    'Virginia': 'VA',
    'Washington': 'WA',
    'West Virginia': 'WV',
    'Wisconsin': 'WI',
    'Wyoming': 'WY'
}

In [28]:
# DF with recent confirmed numbers of each state
us_data = df.loc[(df['Country_Region'] == "US") & (df['Case_Type'] == "Confirmed") & (df['Date'] == recent_date)]
#us_data.head()

In [29]:
#Get df with province names changed to ABBR, for map.
# us_abbrev_data = us_data.replace({"Province_State":state_abbrev}).groupby("Province_State", as_index=False)["Cases"].sum()
# us_abbrev_data

us_data['Province_Code'] = us_data['Province_State'].map(state_abbrev)
us_abbrev_data = us_data.groupby(["Province_State","Province_Code"], as_index=False)["Cases"].sum()
us_abbrev_data



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,Province_State,Province_Code,Cases
0,Alabama,AL,6421
1,Alaska,AK,340
2,American Samoa,AS,0
3,Arizona,AZ,6534
4,Arkansas,AR,3001
5,California,CA,43558
6,Colorado,CO,13441
7,Connecticut,CT,25269
8,Delaware,DE,4034
9,District of Columbia,DC,3841


In [30]:
for col in us_abbrev_data.columns:
    us_abbrev_data[col] = us_abbrev_data[col].astype(str)

us_abbrev_data['text'] = us_abbrev_data["Province_State"]

The main visualizations to look at are Viz 1 which shows increase in cases overtime, viz 3 which shows a worldmap with cases in bubblemap showing the intensity with size of the bubble and viz 6 which was US specific. These interactive visualizations can be linked in the next part of my assignment where we can create dashboard using plotly or voila. There visualizations are easy to understand which shows information when you hover your mouse on a particular area. Most of the information is mentioned in the comments above. Viz 3 represents an aggregate summary of the cases by shaded portions. The change in color can be analyzed using the colorscale which shows us the confirmed cases. The specific count can be seen by pointing the mouse at the state you want the information about.

### Other similar datasets which can be used in future are:
    1. https://github.com/beoutbreakprepared/nCoV2019/tree/master/dataset_archive --The nCov is another famous dataset that shows the sex,travel history, hospital admitted to, date admitted etc and it's updated regularly as well and thus needed for detailed information about cases.
    2. https://vizhub.healthdata.org/gbd-compare/ -- shows previous medical history and symptoms for the confirmed cases over the world
    3. https://github.com/echen102/COVID-19-TweetIDs -- The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 with the time and specific keywords

In [31]:
os.getcwd()

'/Users/shazmeenshaikh'

In [32]:
url_us = "https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv"
data = pd.read_csv(url_us,parse_dates = ['date'])

In [33]:
sample_data = data.loc[data['state'] == 'Washington']
sample_data

Unnamed: 0,date,state,fips,cases,deaths
0,2020-01-21,Washington,53,1,0
1,2020-01-22,Washington,53,1,0
2,2020-01-23,Washington,53,1,0
4,2020-01-24,Washington,53,1,0
7,2020-01-25,Washington,53,1,0
...,...,...,...,...,...
2815,2020-04-22,Washington,53,12539,696
2870,2020-04-23,Washington,53,12906,717
2925,2020-04-24,Washington,53,13120,731
2980,2020-04-25,Washington,53,13484,743


In [34]:
if not os.path.exists('us_states.geo.json'):
  url = 'https://raw.githubusercontent.com/PublicaMundi/MappingAPI/master/data/geojson/us-states.json'
  r = requests.get(url)
  with open('us_states.geo.json', 'w') as f:
    f.write(r.content.decode("utf-8"))

In [35]:
with open('us_states.geo.json') as f:
    states = json.load(f)

# Figure 5 - GeoJSON map with line graph over time for every state

In [36]:
m = Map(center= (39.8283, -98.5795),zoom=4)

geo = GeoJSON(data=states, style={'fillColor': 'red', 'weight': 0.1}, hover_style={'fillColor': '#1f77b4'}, name='States')
m.add_layer(geo)

m

Map(center=[39.8283, -98.5795], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'z…

In [37]:
state_current = "New York"
current_data = data[data['state'] == state_current]
x_data = current_data['date']
y_data = current_data['cases']/1000
y_data.map('{:,.2f}k'.format)

246       0.00k
261       0.00k
276       0.00k
293       0.01k
313       0.02k
338       0.04k
369       0.09k
404       0.11k
440       0.14k
478       0.17k
521       0.22k
568       0.33k
618       0.42k
669       0.61k
722       0.73k
775       0.95k
828       1.37k
882       2.38k
936       4.15k
990       7.10k
1044     10.36k
1098     15.17k
1152     20.88k
1206     25.66k
1260     33.07k
1314     38.99k
1368     44.63k
1422     53.36k
1477     59.57k
1532     67.17k
1587     75.83k
1642     83.89k
1697     92.38k
1752    102.87k
1807    115.00k
1862    122.91k
1917    130.69k
1972    140.08k
2027    149.40k
2082    159.94k
2137    170.51k
2192    180.46k
2247    188.69k
2302    195.03k
2357    202.21k
2412    213.78k
2467    222.28k
2522    229.64k
2577    236.76k
2632    242.82k
2687    247.54k
2742    251.72k
2797    257.25k
2852    263.46k
2907    271.62k
2962    282.17k
3017    288.08k
Name: cases, dtype: object

In [38]:
date_start = data['date'].min()
date_end = data['date'].max()
date_scale = DateScale(min=date_start, max=date_end)

x_scale = LinearScale()

lines = Lines(x=x_data, y=y_data, scales={'x': date_scale, 'y': x_scale},colors=["Red"],stroke_width=1)

ax_x = Axis(label='Time', scale=date_scale, num_ticks=10, tick_format='%d-%m',grid_lines='none',label_offset="40px",tick_rotate = 40 )
ax_y = Axis(label="Cases (in thousands)", scale=x_scale, orientation='vertical', side='left',grid_lines="none",label_offset="40px")

figure = Figure(axes=[ax_x, ax_y], title=state_current, marks=[lines], animation_duration=500,
                layout={'max_height': '300px', 'max_width': '400px'})
figure

Figure(animation_duration=500, axes=[Axis(grid_lines='none', label='Time', label_offset='40px', num_ticks=10, …

In [39]:
def update_figure(state_name):
    """
    Problem: The line starts at first reported case. Not before.
    """
    temp_data = data[data['state'] == state_name]
    cases = temp_data['cases']/1000
    cases.map('{:,.2f}k'.format)
    lines.y = cases
    lines.x = temp_data[['date']]
#     ax_y.label = state_current
    
    lines.scales['x']=date_scale
#     ax_x.scale = date_scale
    figure.title = state_name

In [40]:
update_figure("Washington")

In [41]:
widget_control = WidgetControl(widget=figure, position='bottomright')

m.add_control(widget_control)

def on_hover(event, feature, **kwargs):
    global state_name

    state_name = feature['properties']['name']
    update_figure(state_name)

geo.on_hover(on_hover)

# Figure 6 - Choropleth map for the USA with colorscale (Hover and point using Plotly graph object)

In [42]:
fig = go.Figure(data=go.Choropleth(
    locations=us_abbrev_data['Province_Code'], # Spatial coordinates
    z = us_abbrev_data['Cases'].astype(int), # Data to be color-coded
    locationmode = 'USA-states', # set of locations match entries in `locations`
    colorscale = 'sunsetdark',
    text=us_abbrev_data['text'],
    colorbar_title = "Confirmed Cases",
))

fig.update_layout(
    title_text = 'COVID-19 Cases in USA',
    geo_scope='usa', # limite map scope to USA
)

fig.show()

# Description: 
__Coronavirus disease 2019 (COVID-19)__ is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The disease was first identified in December 2019 in Wuhan, the capital of China's Hubei province, and has since spread globally, resulting in the ongoing 2019–20 coronavirus pandemic. (1) This dataset was shared publicly and is available from many different websites. All the data collected and displayed is made freely available, initially as google sheets and now in a GitHub repository. I've created a similar dashboard using dataset available. It illustrates the location and number of confirmed COVID-19 cases for all affected countries. 

Initially, you can see a filled graph for cases overtime all over the world as __Figure 1__ . you can select a particular window for looking closely at the graph. There is toggle option and slider on the date lets you select a particular time frame.
The world cases are shown with a world map. __Figure 2__ represents a bubble map. Bubble map has Hover and point feature for all the countries in the world map. You can use the Zoom and pan functionality. One can click on the country name to show the bubble for only those particular countries on the right. The larger the bubble, the more number of cases. __Figure 3__ represents a Chorolopeth map. It is a geographical map which shows the variation in the number of cases. It uses different shades from the colour scale range. The above graph represents that the USA has the most number of cases amongst all with the purple shade. Other countries like Spain, Italy, France and Germany also show a large number of cases and thus are coloured in pink.

As of 27 April 2020, more than 3 million cases have been reported across 185 countries and territories, resulting in more than 206,000 deaths. More than 865,000 people have recovered. Out of these confirmed cases, the USA is about to hit 1 million confirmed cases. Thus, taking a keen interest in the details, I created a map specifically for the states affected in the USA. __Figure 4__ is an extremely interactive graph and one of the main visualization for looking at the affected states and the increase overtime with a line graph. The graph has date as X-axis and cases in Y-axis.__Figure 5__ shows us the total number of cases in USA overtime in both log and linear fashion which can be chosen from the dropdown menu.For those who have a visual learning style, I have created a choropleth map as __Figure 6__ for states in the USA which has pretty much the same features as figure 2

References: 

    1.https://en.wikipedia.org/wiki/Coronavirus_disease_2019
    
    2.https://data.world/covid-19-data-resource-hub/covid-19-case-counts/workspace/file?filename=COVID-19+Cases.csv (dataset)
    
    3.https://github.com/CSSEGISandData/COVID-19
    
    4.https://github.com/nytimes/covid-19-data (dataset)
    
    5.https://uiuc-ischool-dataviz.github.io/spring2020/ (class notebooks really helped with basics)
    
    6.https://ipyleaflet.readthedocs.io/en/latest/api_reference/map.html
    