# COVID-19 - DATA ANALYSIS

## Data Analysis, Data Visualization & Comparison

This notebook contains data analysis and visualization of COVID-19 (Corona Virus) cases around the world.

## About COVID-19
![CoronaVirus](https://cdn.pixabay.com/photo/2020/04/23/09/59/coronavirus-5081887_1280.jpg)
*Image by <a href="https://pixabay.com/users/iXimus-2352783/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=5081887">iXimus</a> from <a href="https://pixabay.com/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=5081887">Pixabay</a>*

[Coronavirus disease 2019 (COVID-19)](https://en.wikipedia.org/wiki/Coronavirus_disease_2019) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

* **First Identified:** December 2019 in Wuhan, the capital of Hubei province, China
* **Common Symptoms:** Fever, Cough, Fatigue, Shortness of Breath and Loss of Smell
* **Concering Symptoms:** Difficulty breathing, Persistent Chest Pain, Confusion, Difficulty Waking, and Bluish Skin
* **Complications:**	Pneumonia, Viral Sepsis, Acute Respiratory Distress Syndrome, Kidney Failure
* **Usual Onset:**	2–14 days (typically 5) from infection (time from exposure to onset of symptoms)
* **Risk factors:**	Travel, Viral Exposure
* **Prevention:** 	Hand Washing, Face Coverings, Quarantine, Social Distancing

### Useful Information on Covid-19
* [WHO](https://www.who.int/emergencies/diseases/novel-coronavirus-2019) - World Health Organization 
* [CDC](https://www.cdc.gov/coronavirus/2019-ncov) - Centers for Disease Control and Prevention

# Dataset

1. Git repository of the **Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)**.

    * Master branch: https://github.com/CSSEGISandData/COVID-19
    * Web-data branch: https://github.com/CSSEGISandData/COVID-19/tree/web-data


2. Kaggle dataset: https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset

# Install Packages

**pycountry-convert**: Using country data derived from wikipedia, this package provides conversion functions between ISO country names, country-codes, and continent names. (https://pypi.org/project/pycountry-convert/)

In [None]:
!pip install pycountry_convert

# Import Packages

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter
#%matplotlib inline
import seaborn as sns
from datetime import datetime
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
import pycountry_convert as pc
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import folium
import json

# Get Data from Dataset

1. Overall Cases
1. Cases by Country
1. Cases by State
1. Cases timeline

1. Confirmed Cases Timeline Global
1. Confirmed Cases Timeline US
1. Deaths Cases Timeline Global
1. Deaths Cases Timeline US
1. Recovered Cases Timeline Global

In [None]:
df_cases = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases.csv")
df_cases_country = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv")
df_cases_state = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_state.csv")
df_cases_time = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_time.csv", parse_dates = ['Last_Update','Report_Date_String'])

df_confirmed_global = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
df_confirmed_us = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv')
df_deaths_global = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
df_deaths_us = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_US.csv')
df_recovered_global = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')

In [None]:
print (df_cases.shape)
print ('Last Update: ' + str(df_cases.Last_Update.max()))
df_cases.head(1)

In [None]:
print (df_cases_country.shape)
print ('Last Update: ' + str(df_cases_country.Last_Update.max()))
df_cases_country.head(1)

In [None]:
print (df_cases_state.shape)
df_cases_state.head(1)

In [None]:
print (df_cases_time.shape)
df_cases_time.head(1)

In [None]:
print (df_confirmed_global.shape)
df_confirmed_global.head(1)

In [None]:
print (df_confirmed_us.shape)
df_confirmed_us.head(1)

In [None]:
print (df_deaths_global.shape)
df_deaths_global.head(1)

In [None]:
print (df_deaths_us.shape)
df_deaths_us.head(1)

In [None]:
print (df_recovered_global.shape)
df_recovered_global.head(1)

In [None]:
df_data = pd.read_csv('/kaggle/input/novel-corona-virus-2019-dataset/covid_19_data.csv', parse_dates = ['ObservationDate','Last Update'])
print (df_data.shape)
print ('Last update: ' + str(df_data.ObservationDate.max()))
df_data.head(2)

In [None]:
# Clean data
df_data = df_data.drop(['SNo', 'Last Update'], axis=1)
df_data = df_data.rename(columns={
    'ObservationDate': 'Date', 
    'Country/Region': 'Country_Region', 
    'Province/State': 'Province_State'
})
df_data.head(2)

In [None]:
# Sort data
df_data = df_data.sort_values(['Date','Country_Region','Province_State'])
# Get first reported case date
df_data['first_date'] = df_data.groupby('Country_Region')['Date'].transform('min')
# Get days since first reported case date
df_data['days'] = (df_data['Date'] - df_data['first_date']).dt.days
print(df_data.shape)
df_data.head(2)

# Total Cases Global

In [None]:
total_confirmed = np.sum(df_cases_country['Confirmed'])
total_deaths = np.sum(df_cases_country['Deaths'])
total_recovered = np.sum(df_cases_country['Recovered'])
total_active = np.sum(df_cases_country['Active'])
total_mortality_rate = np.round((np.sum(df_cases_country['Deaths']) / np.sum(df_cases_country['Confirmed']) * 100), 2)
total_recover_rate = np.round((np.sum(df_cases_country['Recovered']) / np.sum(df_cases_country['Confirmed']) * 100), 2)

print ("Confirmed: %s" %format(total_confirmed, ","))
print ("Deaths: %s" %format(total_deaths, ","))
print ("Recovered: %s" %format(total_recovered, ","))
print ("Active: %s" %format(total_active, ","))
print ("Death rate %%: %.2f" %((total_deaths / total_confirmed) * 100))
print ("Recover rate %%: %.2f" %((total_recovered / total_confirmed) * 100))

data = {
    'Confirmed': [total_confirmed],
    'Deaths': [total_deaths],
    'Recovered': [total_recovered],
    'Active': [total_active],
    'Mortality Rate %': [total_mortality_rate],
    'Recover Rate %': [total_recover_rate]
}
df_total = pd.DataFrame(data)
# colormaps: https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html
df_total.style.hide_index().background_gradient(cmap='Wistia', axis=1)

# Cases by Country

In [None]:
df_total_counts = df_cases_country.sort_values(by=['Confirmed'],ascending=[False])
df_total_counts['Death Rate'] = df_total_counts['Deaths'] / df_total_counts['Confirmed'] * 100
df_total_counts['Recovery Rate'] = df_total_counts['Recovered'] / df_total_counts['Confirmed'] * 100

df_total_counts['Incident_Rate'].fillna(0, inplace=True)

# remove unnecessary columns
# add different gradient color to each column
# https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html
df_total_counts.drop(['Last_Update', 'Lat', 'Long_', 'People_Tested', 'People_Hospitalized', 'UID', 'ISO3', 'Mortality_Rate'], axis=1)\
.style.hide_index()\
.background_gradient(cmap='Blues',subset=["Confirmed"])\
.background_gradient(cmap='Reds',subset=["Deaths"])\
.background_gradient(cmap='Greens',subset=["Recovered"])\
.background_gradient(cmap='Purples',subset=["Active"])\
.background_gradient(cmap='GnBu',subset=["Incident_Rate"])\
.background_gradient(cmap='OrRd',subset=["Death Rate"])\
.background_gradient(cmap='PuBu',subset=["Recovery Rate"])\

In [None]:
fig = go.Figure(data=[
    go.Pie(labels=df_total_counts['Country_Region'], 
           values=df_total_counts['Confirmed'], 
           hole=.35,
           textinfo='label+percent'
          )
])

fig.update_layout(
    title_text="Confirmed Cases Percentage by Countries",
    # Add annotations in the center of the donut pies.
    annotations=[
        dict(text='Confirmed<br>Cases', showarrow=False),
    ]
)
fig.update_traces(textposition='inside')
fig.show()

In [None]:
fig = go.Figure(data=[
    go.Pie(labels=df_total_counts['Country_Region'], 
           values=df_total_counts['Deaths'], 
           hole=.35,
           textinfo='label+percent'
          )
])

fig.update_layout(
    title_text="Deaths Cases Percentage by Countries",
    # Add annotations in the center of the donut pies.
    annotations=[
        dict(text='Deaths<br>Cases', showarrow=False),
    ]
)
fig.update_traces(textposition='inside')
fig.show()

# Heat Map of World Countries

In [None]:
# https://plotly.com/python/choropleth-maps/
data = df_cases_country.copy()
data['Confirmed_Log'] = np.log10(df_cases_country['Confirmed'])
data['Mortality_Rate'] = np.round(data['Mortality_Rate'], 2)
fig = px.choropleth(data, 
                    locations='ISO3',
                    color='Confirmed_Log', # a column in the dataset
                    hover_name='Country_Region', # column to add to hover information
                    hover_data=['Confirmed', 'Deaths', 'Recovered', 'Mortality_Rate'],
                    color_continuous_scale=px.colors.sequential.Plasma)
fig.update_layout(title_text="Heat Map - Confirmed Cases")
fig.update_coloraxes(colorbar_title="Color (Confirmed Cases Log Scale)")
fig.update(layout_coloraxis_showscale=False)
fig.show()

In [None]:
# world_geo = 'https://raw.githubusercontent.com/python-visualization/folium/master/examples/data/world-countries.json'
# world_geo = 'https://github.com/johan/world.geo.json/blob/master/countries.geo.json'

world_geo_json = '/kaggle/input/country-state-geo-location/countries.geo.json'
with open(world_geo_json) as f:
  world_geo = json.load(f)

data = df_cases_country.copy()
#print (data[data['ISO3'] == 'AFG']['Confirmed'].iloc[0])

for index, item in enumerate(world_geo['features']):
    row = data[data['ISO3'] == item['id']]
    if row.empty: continue # skip for countries that are not present in the cases dataset
    world_geo['features'][index]['properties']['Confirmed'] = str(row.iloc[0]['Confirmed'])
    world_geo['features'][index]['properties']['Deaths'] = str(row.iloc[0]['Deaths'])
    world_geo['features'][index]['properties']['Recovered'] = str(row.iloc[0]['Recovered'])
    world_geo['features'][index]['properties']['Mortality Rate'] = str(np.round(row.iloc[0]['Mortality_Rate'],2)) + '%'
    world_geo['features'][index]['properties']['Recovery Rate'] = str(np.round(row.iloc[0]['Recovered'] / row.iloc[0]['Confirmed'] * 100, 2)) + '%'

print (world_geo['features'][0]['properties'])

In [None]:
data = df_cases_country.copy()

# for Kosovo, ISO3 in geoJson = CS-KM & in cases CSV = XKS
data['ISO3'].replace('XKS', 'CS-KM', inplace=True) 
#print (data[data['Country_Region'] == 'Kosovo'])

# logarithmic value is taken to avoid skewness
# as US cases count is very much higher than the rest of the world
data['Confirmed_Log'] = np.log2(data['Confirmed'])

# create a plain world map
world_map = folium.Map(location=[10,0], tiles="cartodbpositron", zoom_start=2, max_zoom=6, min_zoom=2)

# add tile layers to the map
tiles = ['stamenwatercolor', 'cartodbpositron', 'openstreetmap', 'stamenterrain']
for tile in tiles:
    folium.TileLayer(tile).add_to(world_map)

choropleth = folium.Choropleth(
    geo_data=world_geo,
    name='choropleth',
    data=data,
    columns=['ISO3', 'Confirmed_Log'],
    key_on='feature.id',
    fill_color='OrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    nan_fill_color='#fef0d9',
    nan_fill_opacity=0.2,
    legend_name='Confirmed Cases (Log Scale)',
    highlight=True,
    line_color='black'
).add_to(world_map)

style_function = "font-size: 15px; font-weight: bold"
choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(
        fields=['name', 'Confirmed', 'Deaths', 'Recovered', 'Mortality Rate', 'Recovery Rate'],
        aliases=['Country','Confirmed', 'Deaths', 'Recovered', 'Mortality Rate', 'Recovery Rate'], 
        labels=True
    )
)

folium.LayerControl(collapsed=True).add_to(world_map)
world_map



# Cases by Continent

In [None]:
def country_to_continent(country_name):
    country_alpha2 = pc.country_name_to_country_alpha2(country_name)
    country_continent_code = pc.country_alpha2_to_continent_code(country_alpha2)
    country_continent_name = pc.convert_continent_code_to_continent_name(country_continent_code)
    return country_continent_name

# Example
#country_name = 'Germany'
#print(country_to_continent(country_name))

In [None]:
def get_continent(iso3):
    try:
        continent = pc.convert_continent_code_to_continent_name( \
                    pc.country_alpha2_to_continent_code( \
                     pc.country_alpha3_to_country_alpha2(iso3) ) )
        if continent == 'Oceania':
            continent = 'Australia'
        return continent
    except:
        return 'others'

# cases_country data with continent
#cols = ['Country_Region', 'Confirmed', 'Deaths', 'Recovered', 'Active', 'ISO3']
#df_continent = df_cases_country[cols].copy()
df_continent = df_cases_country.copy()
data_continent = []
for index, row in df_continent.iterrows():
    data_continent.append(get_continent(row.ISO3))
df_continent['Continent'] = data_continent

# cases_time data with continent
df_continent_time = df_cases_time.copy()
data_continent = []
for index, row in df_continent_time.iterrows():
    data_continent.append(get_continent(row.iso3))
df_continent_time['Continent'] = data_continent
df_continent_time['Confirmed'].fillna(0, inplace=True)
df_continent_time['Deaths'].fillna(0, inplace=True)
start_date = df_continent_time['Report_Date_String'].min()
df_continent_time['Days'] = (df_continent_time['Report_Date_String'] - start_date).dt.days + 1

# Get continent total
df_continent_total = df_continent.groupby(["Continent"])['Country_Region', 'Confirmed', 'Deaths', 'Recovered', 'Active', 'ISO3'].sum()
df_continent_total['Mortality Rate (%)'] = df_continent_total['Deaths'] / df_continent_total['Confirmed'] * 100
df_continent_total['Recovery Rate (%)'] = df_continent_total['Recovered'] / df_continent_total['Confirmed'] * 100

df_continent_total.style \
.background_gradient(cmap='Blues',subset=["Confirmed"])\
.background_gradient(cmap='Reds',subset=["Deaths"])\
.background_gradient(cmap='Greens',subset=["Recovered"])\
.background_gradient(cmap='Purples',subset=["Active"])\
.background_gradient(cmap='OrRd',subset=["Mortality Rate (%)"])\
.background_gradient(cmap='PuBuGn',subset=["Recovery Rate (%)"])\

In [None]:
fig = go.Figure(data=[
    go.Bar(name='Confirmed', x=df_continent_total.index, y=df_continent_total['Confirmed'],
           text=df_continent['Confirmed'], texttemplate='%{text:.2s}', textposition='outside'),
    go.Bar(name='Deaths', x=df_continent_total.index, y=df_continent_total['Deaths'],
           text=df_continent['Deaths'], texttemplate='%{text:.2s}', textposition='outside'),
    go.Bar(name='Recovered', x=df_continent_total.index, y=df_continent_total['Recovered'],
           text=df_continent['Recovered'], texttemplate='%{text:.2s}', textposition='outside'),
])
# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

In [None]:
fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])
fig.add_trace(go.Pie(labels=df_continent_total.index, 
            values=df_continent_total['Confirmed'], 
            hole=.35,
            textinfo='label+percent',
            name='Confirmed'
          ),
        1, 1)
fig.add_trace(go.Pie(labels=df_continent_total.index, 
            values=df_continent_total['Deaths'], 
            hole=.35,
            textinfo='label+percent',
            name='Deaths'
          ),
        1, 2)

fig.update_layout(
    title_text="Confirmed & Deaths Cases Percentage by Continent",
    # Add annotations in the center of the donut pies.
    annotations=[
        dict(text='Confirmed<br>Cases', x=0.18, y=0.5, showarrow=False),
        dict(text='Deaths<br>Cases', x=0.80, y=0.5, showarrow=False),
    ]
)
#fig.update_traces(textposition='inside')
fig.show()

# Chart of Top Countries

High number of:

* Confirmed Cases
* Active Cases
* Recovered
* Recovery Rate
* Deaths
* Death Rate

In [None]:
df_top_confirmed = df_cases_country.sort_values(by=['Confirmed'],ascending=[False]).head(10)
df_top_confirmed.head()

In [None]:
df_top_confirmed = df_cases_country.sort_values(by=['Confirmed'],ascending=[False]).head(10)

fig = go.Figure(data=[
    go.Bar(name='Confirmed', x=df_top_confirmed['Country_Region'], y=df_top_confirmed['Confirmed'], 
           text=df_top_confirmed['Confirmed'], texttemplate='%{text:.2s}', textposition='outside'),
    go.Bar(name='Deaths', x=df_top_confirmed['Country_Region'], y=df_top_confirmed['Deaths'], 
           text=df_top_confirmed['Deaths'], texttemplate='%{text:.2s}', textposition='outside'),
    go.Bar(name='Recovered', x=df_top_confirmed['Country_Region'], y=df_top_confirmed['Recovered'], 
           text=df_top_confirmed['Recovered'], texttemplate='%{text:.2s}', textposition='outside'),
])
# Change the bar mode
fig.update_layout(barmode='group')
fig.show()

In [None]:
df_top_total_counts = df_total_counts.copy()
# remove 'Diamond Princess' cruise ship row from the list
index_name = df_top_total_counts[ df_top_total_counts['Country_Region'] == 'Diamond Princess' ].index
df_top_total_counts.drop(index_name , inplace=True)

def plot_top_cases(column, title='', count=10, data=df_top_total_counts):
    title = column if title == '' else title
    plot_data = data
    
    # for death rate plot, taking data with deaths >= 10
    if column == 'Death Rate': plot_data = plot_data[plot_data.Deaths>=10]
    # for recovery rate plot, taking data with recovered >= 100
    if column == 'Recovery Rate': plot_data = plot_data[plot_data.Recovered>=100]
    
    plot_data = plot_data.sort_values(by=[column],ascending=[False]).head(count)

    fig = px.bar(plot_data, y=column, x='Country_Region', 
                 text=column, orientation='v', 
                 title=title+': Top '+str(count)+' Countries')
    pc_str = ''
    if column in ['Death Rate', 'Recovery Rate']: pc_str = "%"
    fig.update_traces(texttemplate='%{text:.2s}'+pc_str, textposition='outside')
    
    fig.update_layout(
        uniformtext_minsize=8, 
        uniformtext_mode='hide',
        xaxis_title="",
        yaxis_title=""
    )
    fig.show()


plot_top_cases('Confirmed')
plot_top_cases('Deaths')
plot_top_cases('Recovered')
plot_top_cases('Death Rate', 'Death Rate (10+ deaths)')
plot_top_cases('Recovery Rate', 'Recovery Rate (100+ recovery)')

# Time Series of Cases

Trajectory of cases from the outbreak day to current day.

In [None]:
# Get sum of confirmed, deaths and recovered cases
def get_timeline(country = None, group_by = ['Date']):
    df_t = df_data if country is None else df_data[df_data['Country_Region'] == country]
    df_t = df_t.groupby(group_by)[['Confirmed','Deaths','Recovered']].sum().reset_index()
    df_t['Active'] = df_t['Confirmed'] - df_t['Deaths'] - df_t['Recovered']
    df_t['Death Rate'] = np.round(df_t['Deaths'] / df_t['Confirmed'] * 100, 2)
    df_t['Recovery Rate'] = np.round(df_t['Recovered'] / df_t['Confirmed'] * 100, 2)
    return df_t

# Get confirmed, deaths and recovered cases for each date
# Only the cases registered for particular day
# and not the cummulative sum from previous days
def get_timeline_daily():
    df_t = df_data
    df_t = df_t.groupby(['Date'])['Confirmed','Deaths','Recovered'].sum()
    '''
    # The cases count is not of the current day only
    # It is the sum of previous days total + current day total
    # So, the sum we do above will result in cummulative sum for each Date
    # Hence, we need to compute difference between 
    # current date and previous date total values of Confirmed, Deaths & Recovered Cases
    '''
    df_t = df_t.diff().fillna(df_t).reset_index()
    return df_t

In [None]:
df_t = get_timeline()
df_t.head()

In [None]:
#fig = px.line(df, x="Last_Update", y="Confirmed", title='Confirmed & Death Cases Trajectory')
fig = go.Figure()
fig.add_trace(go.Scatter(x=df_t['Date'], y=df_t['Confirmed'], 
                         mode='lines+markers', name='Confirmed'))
fig.add_trace(go.Scatter(x=df_t['Date'], y=df_t['Deaths'], 
                         mode='lines+markers', name='Deaths'))
fig.add_trace(go.Scatter(x=df_t['Date'], y=df_t['Recovered'], 
                         mode='lines+markers', name='Recovered'))
fig.update_layout(
        xaxis_title="",
        yaxis_title="",
        title = 'Time Series - Confirmed, Deaths & Recovered Cases'
    )
fig.show()

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=df_t['Date'], y=df_t['Confirmed'], 
                         mode='lines+markers', name='Confirmed'))
fig.add_trace(go.Scatter(x=df_t['Date'], y=df_t['Deaths'], 
                         mode='lines+markers', name='Deaths'))
fig.add_trace(go.Scatter(x=df_t['Date'], y=df_t['Recovered'], 
                         mode='lines+markers', name='Recovered'))
fig.update_layout(
        xaxis_title="",
        yaxis_title="",
        yaxis_type="log",
        title = 'Time Series - Confirmed, Deaths & Recovered Cases - Log Scale'
    )
fig.show()

In [None]:
df_t_daily = get_timeline_daily()
df_t_daily.head()

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=df_t_daily['Date'], y=df_t_daily['Confirmed'], 
                         mode='lines+markers', name='Confirmed'))
fig.add_trace(go.Scatter(x=df_t_daily['Date'], y=df_t_daily['Deaths'], 
                         mode='lines+markers', name='Deaths'))
fig.add_trace(go.Scatter(x=df_t_daily['Date'], y=df_t_daily['Recovered'], 
                         mode='lines+markers', name='Recovered'))
fig.update_layout(
        xaxis_title="",
        yaxis_title="",
        title = 'Time Series - Confirmed, Deaths & Recovered [Daily Cases]'
    )
fig.show()

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=df_t['Date'], y=df_t['Recovery Rate'], 
                         mode='lines+markers', name='Recovery Rate'))
fig.add_trace(go.Scatter(x=df_t['Date'], y=df_t['Death Rate'], 
                         mode='lines+markers', name='Death Rate'))
fig.update_layout(
        xaxis_title="",
        yaxis_title="",
        title = 'Time Series - Death Rate & Recovery Rate [Daily Cases]'
    )
fig.show()

In [None]:
# Get only the data of the latest date
df_data_latest = df_data[df_data.Date == df_data.Date.max()]
df_data_latest.head()

In [None]:
df_t = df_data.groupby(['Country_Region', 'Date'])['Confirmed','Deaths','Recovered'].sum().reset_index()

top10_confirmed_country_list = df_data_latest.sort_values('Confirmed', ascending=False).head(10)['Country_Region'].to_list()
top10_deaths_country_list = df_data_latest.sort_values('Deaths', ascending=False).head(10)['Country_Region'].to_list()
top10_recovered_country_list = df_data_latest.sort_values('Recovered', ascending=False).head(10)['Country_Region'].to_list()
print (top10_confirmed_country_list)
print (top10_deaths_country_list)
print (top10_recovered_country_list)

df_top10_confirmed = df_t[df_t.Country_Region.isin(top10_confirmed_country_list)]
df_top10_deaths = df_t[df_t.Country_Region.isin(top10_deaths_country_list)]
df_top10_recovered = df_t[df_t.Country_Region.isin(top10_recovered_country_list)]

def get_top10(country_list):    
    df_top = df_t[df_t.Country_Region.isin(country_list)]
    df_top = df_top.groupby(['Country_Region', 'Date']).sum()
    df_top10 = pd.DataFrame()
    for country, df_new in df_top.groupby(level=0):
        # 1. The cases count is cummulative in the dataset.
        #    Hence, we calculate the difference between current row and next row value
        # 2. Some rows after diff() were showing negative values.
        #    This is because in some cases, the next day entry had lesser value than current day entry.
        #    Therefore, used abs() to make them positive.
        #    Hoping that the dataset is corrected later on.
        df_new = df_new.diff().fillna(df_new).abs()
        df_top10 = df_top10.append(df_new, ignore_index=False)
    return df_top10.reset_index()

df_top10_confirmed_daily = get_top10(top10_confirmed_country_list)
df_top10_deaths_daily = get_top10(top10_deaths_country_list)
df_top10_recovered_daily = get_top10(top10_recovered_country_list)
df_top10_confirmed_daily.head(2)

In [None]:
fig = px.line(df_top10_confirmed, x="Date", y="Confirmed", color="Country_Region")
fig.update_layout(
    title='Time Series - Confirmed Cases: Top 10 Countries',
    xaxis_title='',
    yaxis_title='',
)
fig.show()

In [None]:
fig = px.line(df_top10_confirmed, x="Date", y="Confirmed", color="Country_Region")
fig.update_layout(
    title='Time Series - Confirmed Cases: Top 10 Countries - Log Scale',
    xaxis_title='',
    yaxis_title='',
    yaxis_type="log",
)
fig.show()

In [None]:
fig = px.line(df_top10_confirmed_daily, x="Date", y="Confirmed", color="Country_Region")
fig.update_layout(
    title='Time Series - Confirmed Cases [Daily]: Top 10 Countries',
    xaxis_title='',
    yaxis_title=''
)
fig.show()

In [None]:
fig = px.line(df_top10_deaths, x="Date", y="Deaths", color="Country_Region")
fig.update_layout(
    title='Time Series - Deaths Cases: Top 10 Countries',
    xaxis_title='',
    yaxis_title=''
)
fig.show()

In [None]:
fig = px.line(df_top10_deaths_daily, x="Date", y="Deaths", color="Country_Region")
fig.update_layout(
    title='Time Series - Deaths Cases [Daily]: Top 10 Countries',
    xaxis_title='',
    yaxis_title=''
)
fig.show()

In [None]:
fig = px.line(df_top10_recovered, x="Date", y="Recovered", color="Country_Region")
fig.update_layout(
    title='Time Series - Recovered Cases: Top 10 Countries',
    xaxis_title='',
    yaxis_title='',
)
fig.show()

In [None]:
fig = px.line(df_top10_recovered_daily, x="Date", y="Recovered", color="Country_Region")
fig.update_layout(
    title='Time Series - Recovered Cases [Daily]: Top 10 Countries',
    xaxis_title='',
    yaxis_title=''
)
fig.show()

# Time Series - Cases by Continent

In [None]:
df_continent_t = df_continent_time.copy()
df_continent_t = df_continent_t.groupby(['Continent', 'Country_Region', 'Last_Update']).max().reset_index()

In [None]:
# Get total count of each day for each continent
df_continent_t = df_continent_t.groupby(['Continent', 'Last_Update']).sum().reset_index()

# Calculate Death/Mortality Rate
df_continent_t['Death Rate'] = df_continent_t['Deaths'] / df_continent_t['Confirmed'] * 100
df_continent_t['Death Rate'].fillna(0, inplace=True)
df_continent_t.head()

In [None]:
fig = px.line(df_continent_t, x="Last_Update", y="Confirmed", color="Continent")
fig.update_layout(
    title='Time Series - Confirmed Cases by Continent',
    xaxis_title='',
    yaxis_title='',
)
fig.show()

In [None]:
fig = px.line(df_continent_t, x="Last_Update", y="Confirmed", color="Continent")
fig.update_layout(
    title='Time Series - Confirmed Cases by Continent [Log Scale]',
    xaxis_title='',
    yaxis_title='',
    yaxis_type="log",
)
fig.show()

In [None]:
fig = px.line(df_continent_t, x="Last_Update", y="Deaths", color="Continent")
fig.update_layout(
    title='Time Series - Deaths Cases by Continent',
    xaxis_title='',
    yaxis_title='',
)
fig.show()

In [None]:
fig = px.line(df_continent_t, x="Last_Update", y="Deaths", color="Continent")
fig.update_layout(
    title='Time Series - Deaths Cases by Continent [Log Scale]',
    xaxis_title='',
    yaxis_title='',
    yaxis_type='log',
)
fig.show()

In [None]:
fig = px.line(df_continent_t, x="Last_Update", y="Death Rate", color="Continent")
fig.update_layout(
    title='Time Series - Death Rate (%) by Continent',
    xaxis_title='',
    yaxis_title='',
)
fig.show()

# Progression over Time

## Confirmed Cases - Animation over Time

In [None]:
df_temp = df_cases_time.groupby(['Last_Update', 'Country_Region'])['Confirmed', 'Deaths'].max().reset_index()
df_temp["Last_Update"] = pd.to_datetime(df_temp["Last_Update"]).dt.strftime('%m/%d/%Y')
df_temp['Confirmed'].fillna(0, inplace=True)
df_temp.sort_values('Confirmed', ascending=False).head()

In [None]:
fig = px.scatter_geo(df_temp, locations="Country_Region", locationmode='country names', 
                     hover_name="Country_Region", hover_data=["Confirmed", "Deaths"], animation_frame="Last_Update",
                     color=np.log10(df_temp["Confirmed"]+1)-1, size=np.power(df_temp["Confirmed"]+1, 0.3)-1,
                     range_color= [0, max(np.log10(df_temp["Confirmed"]+1))],
                     title="COVID-19 Progression Animation Over Time",
                     color_continuous_scale=px.colors.sequential.Plasma,
                     projection="natural earth")
fig.update_coloraxes(colorscale="hot")
fig.update(layout_coloraxis_showscale=False)
#fig.update_coloraxes(colorbar_title="Color (Confirmed Cases Log Scale)")
fig.show()

## Confirmed Cases vs. Mortality Rate - Animation over Time

In [None]:
df_continent_t = df_continent_time.copy()
df_continent_t["Last_Update"] = pd.to_datetime(df_continent_t["Last_Update"]).dt.strftime('%m/%d/%Y')
# while calculating mortality rate, adding 1 to confirmed to avoid divide by zero
df_continent_t['Mortality Rate'] = df_continent_t['Deaths'] / (df_continent_t['Confirmed']+1) * 100
df_continent_t.sort_values('Confirmed', ascending=False).head(2)

In [None]:
fig = px.scatter(df_continent_t, y=df_continent_t['Mortality Rate'],
                    x=df_continent_t['Confirmed']+1,
                    color="Continent", 
                    hover_name="Country_Region",
                    hover_data=["Confirmed", "Deaths"],
                    color_continuous_scale=px.colors.sequential.Plasma,
                    size=np.power(df_continent_t["Confirmed"]+1, 0.3)-0.5,
                    size_max=30,
                    log_x=True,
                    height=600,
                    #title='COVID-19',
                    range_y=[-1, 20],
                    range_x=[1, df_continent_t["Confirmed"].max()],
                    animation_frame="Last_Update", 
                    animation_group="Country_Region",
                    )
fig.update_layout(
    title='Time Series - Confirmed Cases vs Mortality Rate by Continent',
    xaxis_title='Confirmed Cases',
    yaxis_title='Mortality Rate (%)',
    #xaxis_type='log'
)
fig.show()

## Useful Notebooks

* [Coronavirus 2019-20 Visualization](https://www.kaggle.com/holfyuen/coronavirus-2019-20-visualization)
* [COVID-19 Case Study - Analysis, Viz & Comparisons](https://www.kaggle.com/tarunkr/covid-19-case-study-analysis-viz-comparisons)
