# Coronavirus (COVID-19) Visualization & Prediction

Coronavirus is a family of viruses that are named after their spiky crown. The novel coronavirus, also known as 
SARS-CoV-2, is a contagious respiratory virus that first reported in Wuhan, China. On 2/11/2020, the World Health 
Organization designated the name COVID-19 for the disease caused by the novel coronavirus. This notebook aims at 
exploring COVID-19 through data analysis and projections.

Coronavirus disease 2019(COVID-19) is an infectious spreading disease,which is casued by severe acute respiratory syndrome coronavirus 2(SARS-Cov-2).This disease was first found in 2019 in Wuhan distirct of China, and is spreading tremendously across the globe,resulted in pandemic declaration by World Health Organization. Related Information about COVID-19 COVID-19 may not be fatal but it spreads faster than other diseases, like common cold. Every virus has Basic Reproduction number (R0) which implies how many people will get the disease from the infected person. As per inital reseach work R0 of COVID-19 is 2.7.

Currently the goal of all scientists around the world is to "Flatten the Curve". COVID-19 currently has exponential growth rate around the world which we will be seeing in the notebook ahead. Flattening the Curve typically implies even if the number of Confirmed Cases are increasing but the distribution of those cases should be over longer timestamp. To put it in simple words if say suppose COVID-19 is going infect 100K people then those many people should be infected in 1 year but not in a month.

# Pip Install

In [None]:
#! pip install --upgrade pip          # get latest version of pip

In [None]:
#! pip install calmap                # for calender map

In [None]:
#! pip install us                    # to convert us statenames to state codes

In [None]:
#! pip install pycountry_convert     # to get continent name from country name

In [None]:
#!pip install plotly_express         # for produces easy-to-style figures

# Load libraries

In [None]:
import us                                  # to USA states details
import json                                # is used to store and transfer the data.
import math                                # To use mathematical functions
import time                                # allows to work with time in Python
import random                              # provides access to functions
import datetime                            # they work with dates and times
import matplotlib                          # for data visualization and graphical plotting
import matplotlib.colors as mcolors        # make matplotlib work like MATLAB
from sklearn.metrics import r2_score       # to find the R_square
import numpy as np                         # to perform  mathematical operations on arrays
import pandas as pd                        # manipulation of tabular data in Dataframes.
import seaborn as sns                      # advanced ploting
import plotly_express as px                # contains functions that can create entire figures at once
import plotly.graph_objs as go             # automatically-generated hierarchy of Python classes
import plotly.graph_objects as go          # automatically-generated hierarchy of Python classes
import matplotlib.pyplot as plt            # basic visualization package
import plotly.figure_factory as ff         # create unique chart types
import matplotlib.colors as mcolors        # color has changed from grey to white, which matches
from datetime import timedelta             # datetime oprations
from urllib.request import urlopen         # to get web contents
from plotly.subplots import make_subplots  # Figures with subplots are created
import warnings                            # hide warnings
warnings.filterwarnings('ignore')
import operator                            # for various mathematical(relational, logical and bitwise) operations
plt.style.use('fivethirtyeight')
%matplotlib inline
import os                                  #  to interact with the underlying operating system
for dirname, _, filenames in os.walk('../input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
from plotly.offline import plot, iplot, init_notebook_mode      # for offline ploting
init_notebook_mode(connected=True)
from pandas.plotting import register_matplotlib_converters      # converter
register_matplotlib_converters() 

In [None]:
from sklearn.svm import SVR, SVC
from sklearn.metrics import r2_score
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import BayesianRidge, LinearRegression , Lasso, Ridge
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import RandomizedSearchCV, train_test_split

# Color, theme

In [None]:
cnf, dth, rec, act = '#393e46', '#ff2e63', '#21bf73', '#fe9801'  # color pallette
sns.set_style('darkgrid')                                        # Set the parameters

# Load Data

In [None]:
full_table = pd.read_csv('covid_19_clean_complete.csv')                # Covid-19 Cleaned data
full_grouped = pd.read_csv('full_grouped.csv')                         # Full_grouped
full_grouped['Date'] = pd.to_datetime(full_grouped['Date'])
day_wise = pd.read_csv('day_wise.csv')                                 # Day wise Data
day_wise['Date'] = pd.to_datetime(day_wise['Date'])
country_wise = pd.read_csv('country_wise_latest.csv')                  # country wise Data
country_wise = country_wise.replace('', np.nan).fillna(0)
worldometer_data = pd.read_csv('worldometer_data.csv')                 # Worldometer data
worldometer_data = worldometer_data.replace('', np.nan).fillna(0)
df_pk = pd.read_csv('Corona Data Pakistan.csv',index_col='SNo')        # Pakistan Data

In [None]:
confirmed_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
deaths_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
recoveries_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
latest_data = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/11-26-2020.csv')
us_medical_data = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports_us/11-26-2020.csv')

# Latest

In [None]:
temp = day_wise[['Date','Deaths', 'Recovered', 'Active']].tail(1)
temp = temp.melt(id_vars="Date", value_vars=['Active', 'Deaths', 'Recovered'])
fig = px.treemap(temp, path=["variable"], values="value", height=225, 
                 color_discrete_sequence=[act, rec, dth])
fig.data[0].textinfo = 'label+text+value'
fig.show()

# Maps

In [None]:
def plot_map(df, col, pal):
    df = df[df[col]>0]
    # to represent spatial variations of a quantity
    fig = px.choropleth(df, locations="Country/Region", locationmode='country names', 
                  color=col, hover_name="Country/Region", 
                  title=col, hover_data=[col], color_continuous_scale=pal)
    #fig.update_layout(coloraxis_showscale=False)
    fig.show()

In [None]:
plot_map(country_wise, 'Confirmed', 'matter')

In [None]:
plot_map(country_wise, 'Deaths', 'matter')

In [None]:
plot_map(country_wise, 'Deaths / 100 Cases', 'matter')

# Cases over the time

In [None]:
def plot_daywise(col, hue):
    fig = px.bar(day_wise, x="Date", y=col, width=700, color_discrete_sequence=[hue])
    fig.update_layout(title=col, xaxis_title="", yaxis_title="")
    fig.show()
def plot_daywise_line(col, hue):
    fig = px.line(day_wise, x="Date", y=col, width=700, color_discrete_sequence=[hue])
    fig.update_layout(title=col, xaxis_title="", yaxis_title="")
    fig.show()

In [None]:
fig = px.choropleth(full_grouped, locations="Country/Region", 
                    color=np.log(full_grouped["Confirmed"]),
                    locationmode='country names', hover_name="Country/Region", 
                    animation_frame=full_grouped["Date"].dt.strftime('%Y-%m-%d'),
                    title='Cases over time', color_continuous_scale=px.colors.sequential.matter)
fig.update(layout_coloraxis_showscale=False)
fig.show()

In [None]:
temp = full_grouped.groupby('Date')['Recovered', 'Deaths', 'Active'].sum().reset_index()
temp = temp.melt(id_vars="Date", value_vars=['Recovered', 'Deaths', 'Active'],
                 var_name='Case', value_name='Count')
temp.head()

fig = px.area(temp, x="Date", y="Count", color='Case', height=600, width=700,
             title='Cases over time', color_discrete_sequence = [rec, dth, act])
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

# WHO Region Wise

In [None]:
who = country_wise.groupby('WHO Region')['Confirmed', 'Deaths', 'Recovered', 'Active',
                                         'New cases', 'Confirmed last week'].sum().reset_index()
who['Fatality Rate'] = round((who['Deaths'] / who['Confirmed']) * 100, 2)
who['Recovery Rate'] = (who['Recovered'] / who['Confirmed']) * 100

who_g = full_grouped.groupby(['WHO Region', 'Date'])['Confirmed', 'Deaths', 'Recovered', 
                                                     'Active','New cases', 'New deaths'].sum().reset_index()

In [None]:
def plot_hbar(col, hover_data=[]):
    fig = px.bar(who.sort_values(col), 
                 x=col, y="WHO Region", color='WHO Region',  
                 text=col, orientation='h', width=700, hover_data=hover_data,
                 color_discrete_sequence = px.colors.qualitative.Dark2)
    fig.update_layout(title=col, xaxis_title="", yaxis_title="", 
                      yaxis_categoryorder = 'total ascending',
                      uniformtext_minsize=8, uniformtext_mode='hide')
    fig.show()

In [None]:
plot_hbar('Confirmed')

In [None]:
plot_hbar('Deaths')

In [None]:
temp = worldometer_data[worldometer_data['WHO Region']!=0]
fig = px.scatter(temp, x='TotalCases', y='TotalDeaths', color='WHO Region', 
                 height=700, hover_name='Country/Region', log_x=True, log_y=True, 
                 title='Confirmed vs Deaths',
                 color_discrete_sequence=px.colors.qualitative.Vivid)
fig.update_traces(textposition='top center')
fig.show()

In [None]:
fig = px.scatter(temp, x='Population', y='TotalCases', color='WHO Region', 
                 height=700, hover_name='Country/Region', log_x=True, log_y=True, 
                 title='Population vs Confirmed',
                 color_discrete_sequence=px.colors.qualitative.Vivid)
fig.update_traces(textposition='top center')
# fig.update_layout(showlegend=False)
# fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

In [None]:
px.bar(who_g, x="Date", y="Confirmed", color='WHO Region', 
       height=600, title='Confirmed', 
       color_discrete_sequence=px.colors.qualitative.Vivid)

In [None]:
px.bar(who_g, x="Date", y="New cases", color='WHO Region', 
       height=600, title='New cases', 
       color_discrete_sequence=px.colors.qualitative.Vivid)

In [None]:
px.box(worldometer_data, x='WHO Region', y='TotalCases', color='WHO Region',
       title='Distribution of country wise no. of cases in different WHO Region')

# Day wise

In [None]:
plot_daywise('Confirmed', '#333333')

In [None]:
plot_daywise('Deaths', dth)

In [None]:
plot_daywise('New cases', '#333333')

In [None]:
plot_daywise('Recovered', rec)

In [None]:
plot_daywise_line('Deaths / 100 Cases', dth)

In [None]:
plot_daywise_line('Recovered / 100 Cases', rec)

In [None]:
plot_daywise('No. of countries', '#035aa6')

In [None]:
temp = day_wise[['Date', 'Recovered', 'Active']]
temp = temp.melt(id_vars='Date', value_vars=['Recovered', 'Active'], 
                 var_name='Variable', value_name='Count')
px.line(temp, x='Date', y='Count', color='Variable')

# Weekly Statistics

In [None]:
full_grouped['Week No.'] = full_grouped['Date'].dt.strftime('%U')
week_wise = full_grouped.groupby('Week No.')['Confirmed', 'Deaths', 'Recovered', 'Active', 'New cases', 'New deaths', 'New recovered'].sum().reset_index()

In [None]:
def plot_weekwise(col, hue):
    fig = px.bar(week_wise, x="Week No.", y=col, width=700, color_discrete_sequence=[hue])
    fig.update_layout(title=col, xaxis_title="", yaxis_title="")
    fig.show()

In [None]:
plot_weekwise('Confirmed', '#000000')

In [None]:
plot_weekwise('Deaths', dth)

In [None]:
plot_weekwise('New cases', '#cd6684')

# Monthly statistics

In [None]:
full_grouped['Month'] = pd.DatetimeIndex(full_grouped['Date']).month
month_wise = full_grouped.groupby('Month')['Confirmed', 'Deaths', 'Recovered', 'Active', 'New cases', 'New deaths', 'New recovered'].sum().reset_index()

In [None]:
def plot_monthwise(col, hue):
    fig = px.bar(month_wise, x="Month", y=col, width=700, color_discrete_sequence=[hue])
    fig.update_layout(title=col, xaxis_title="", yaxis_title="")
    fig.show()

In [None]:
plot_monthwise('Confirmed', '#000000')

In [None]:
plot_monthwise('Deaths', dth)

In [None]:
plot_monthwise('New cases', '#cd6684')

# Healthy life expectancy vs Deaths / 100 Cases

In [None]:
happiness_report = pd.read_csv('2019.csv')
happiness_report = happiness_report[['Country or region', 'Healthy life expectancy']]

temp = country_wise.merge(happiness_report, left_on='Country/Region', right_on='Country or region')
px.scatter(temp, y='Deaths / 100 Cases', x='Healthy life expectancy', color='WHO Region', hover_data=['Country/Region'])

# Top 15 Countries

In [None]:
def plot_hbar(df, col, n, hover_data=[]):
    fig = px.bar(df.sort_values(col).tail(n), 
                 x=col, y="Country/Region", color='WHO Region',  
                 text=col, orientation='h', width=700, hover_data=hover_data,
                 color_discrete_sequence = px.colors.qualitative.Dark2)
    fig.update_layout(title=col, xaxis_title="", yaxis_title="", 
                      yaxis_categoryorder = 'total ascending',
                      uniformtext_minsize=8, uniformtext_mode='hide')
    fig.show()
def plot_hbar_wm(col, n, min_pop=1000000, sort='descending'):
    df = worldometer_data[worldometer_data['Population']>min_pop]
    df = df.sort_values(col, ascending=True).tail(n)
    fig = px.bar(df,
                 x=col, y="Country/Region", color='WHO Region',  
                 text=col, orientation='h', width=700, 
                 color_discrete_sequence = px.colors.qualitative.Dark2)
    fig.update_layout(title=col+' (Only countries with > 1M Pop)', 
                      xaxis_title="", yaxis_title="", 
                      yaxis_categoryorder = 'total ascending',
                      uniformtext_minsize=8, uniformtext_mode='hide')
    fig.show()

In [None]:
plot_hbar(country_wise, 'Confirmed', 15)

In [None]:
plot_hbar(country_wise, 'Active', 15)

In [None]:
plot_hbar(country_wise, 'New cases', 15)

In [None]:
plot_hbar(country_wise, 'Recovered', 15)

In [None]:
plot_hbar(country_wise, 'Deaths', 15)

In [None]:
plot_hbar(country_wise, 'Deaths / 100 Cases', 15)

In [None]:
plot_hbar_wm('Tot Cases/1M pop', 15, 1000000)

In [None]:
plot_hbar_wm('TotalTests', 15, 1000000)

# Date visualization of all countries

In [None]:
def plot_stacked(col):
    fig = px.bar(full_grouped, x="Date", y=col, color='Country/Region', 
                 height=600, title=col, 
                 color_discrete_sequence = px.colors.cyclical.mygbm)
    fig.update_layout(showlegend=True)
    fig.show()
def plot_line(col):
    fig = px.line(full_grouped, x="Date", y=col, color='Country/Region', 
                  height=600, title=col, 
                  color_discrete_sequence = px.colors.cyclical.mygbm)
    fig.update_layout(showlegend=True)
    fig.show()

In [None]:
plot_stacked('Confirmed')

In [None]:
plot_stacked('Deaths')

In [None]:
plot_stacked('New cases')

In [None]:
plot_stacked('Active')

In [None]:
plot_line('Confirmed')

In [None]:
plot_line('Deaths')

In [None]:
plot_line('New cases')

In [None]:
plot_line('Active')

In [None]:
temp = pd.merge(full_grouped[['Date', 'Country/Region', 'Confirmed', 'Deaths']], 
                day_wise[['Date', 'Confirmed', 'Deaths']], on='Date')
temp['% Confirmed'] = round(temp['Confirmed_x']/temp['Confirmed_y'], 3)*100
temp['% Deaths'] = round(temp['Deaths_x']/temp['Deaths_y'], 3)*100

In [None]:
fig = px.bar(temp, x='Date', y='% Confirmed', color='Country/Region', 
             range_y=(0, 100), title='% of Cases from each country', 
             color_discrete_sequence=px.colors.qualitative.Prism)
fig.show()

In [None]:
fig = px.bar(temp, x='Date', y='% Deaths', color='Country/Region', 
             range_y=(0, 100), title='% of Cases from each country', 
             color_discrete_sequence=px.colors.qualitative.Prism)
fig.show()

# Confirmed vs Deaths

In [None]:
fig = px.scatter(country_wise.sort_values('Deaths', ascending=False).iloc[:20, :], 
                 x='Confirmed', y='Deaths', color='Country/Region', size='Confirmed', 
                 height=700, text='Country/Region', log_x=True, log_y=True, 
                 title='Deaths vs Confirmed (Scale is in log10)')
fig.update_traces(textposition='top center')
fig.update_layout(showlegend=False)
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

# Composition of Cases

In [None]:
def plot_treemap(col):
    fig = px.treemap(country_wise, path=["Country/Region"], values=col, height=700,
                 title=col, color_discrete_sequence = px.colors.qualitative.Dark2)
    fig.data[0].textinfo = 'label+text+value'
    fig.show()

In [None]:
plot_treemap('Confirmed')

In [None]:
plot_treemap('Deaths')

# Bubble Plot

In [None]:
def plot_bubble(col, pal):
    temp = full_grouped[full_grouped[col]>0].sort_values('Country/Region', ascending=False)
    fig = px.scatter(temp, x='Date', y='Country/Region', size=col, color=col, height=3000,
                    color_continuous_scale=pal)
    fig.update_layout(yaxis = dict(dtick = 1))
    fig.update(layout_coloraxis_showscale=False)
    fig.show()

In [None]:
plot_bubble('New cases', 'Viridis')

In [None]:
plot_bubble('Active', 'Viridis')

# Graphing the number of confirmed cases, active cases, deaths, recoveries

In [None]:
# helper method for flattening the data, so it can be displayed on a bar graph 
def flatten(arr):
    a = [] 
    arr = arr.tolist()
    for i in arr:
        a.append(i[0])
    return a

In [None]:
cols = confirmed_df.keys()

In [None]:
confirmed = confirmed_df.loc[:, cols[4]:cols[-1]]
deaths = deaths_df.loc[:, cols[4]:cols[-1]]
recoveries = recoveries_df.loc[:, cols[4]:cols[-1]]

In [None]:
dates = confirmed.keys()
world_cases = []
total_deaths = [] 
mortality_rate = []
recovery_rate = [] 
total_recovered = [] 
total_active = [] 

for i in dates:
    confirmed_sum = confirmed[i].sum()
    death_sum = deaths[i].sum()
    recovered_sum = recoveries[i].sum()
    
    # confirmed, deaths, recovered, and active
    world_cases.append(confirmed_sum)
    total_deaths.append(death_sum)
    total_recovered.append(recovered_sum)
    total_active.append(confirmed_sum-death_sum-recovered_sum)
    
    # calculate rates
    mortality_rate.append(death_sum/confirmed_sum)
    recovery_rate.append(recovered_sum/confirmed_sum)

In [None]:
def daily_increase(data):
    d = [] 
    for i in range(len(data)):
        if i == 0:
            d.append(data[0])
        else:
            d.append(data[i]-data[i-1])
    return d 

def moving_average(data, window_size):
    moving_average = []
    for i in range(len(data)):
        if i + window_size < len(data):
            moving_average.append(np.mean(data[i:i+window_size]))
        else:
            moving_average.append(np.mean(data[i:len(data)]))
    return moving_average

# window size
window = 7

# confirmed cases
world_daily_increase = daily_increase(world_cases)
world_confirmed_avg= moving_average(world_cases, window)
world_daily_increase_avg = moving_average(world_daily_increase, window)

# deaths
world_daily_death = daily_increase(total_deaths)
world_death_avg = moving_average(total_deaths, window)
world_daily_death_avg = moving_average(world_daily_death, window)


# recoveries
world_daily_recovery = daily_increase(total_recovered)
world_recovery_avg = moving_average(total_recovered, window)
world_daily_recovery_avg = moving_average(world_daily_recovery, window)


# active 
world_active_avg = moving_average(total_active, window)

In [None]:
days_since_1_22 = np.array([i for i in range(len(dates))]).reshape(-1, 1)
world_cases = np.array(world_cases).reshape(-1, 1)
total_deaths = np.array(total_deaths).reshape(-1, 1)
total_recovered = np.array(total_recovered).reshape(-1, 1)

In [None]:
days_in_future = 10
future_forcast = np.array([i for i in range(len(dates)+days_in_future)]).reshape(-1, 1)
adjusted_dates = future_forcast[:-10]

In [None]:
start = '1/22/2020'
start_date = datetime.datetime.strptime(start, '%m/%d/%Y')
future_forcast_dates = []
for i in range(len(future_forcast)):
    future_forcast_dates.append((start_date + datetime.timedelta(days=i)).strftime('%m/%d/%Y'))

In [None]:
adjusted_dates = adjusted_dates.reshape(1, -1)[0]

In [None]:
def country_plot(x, y1, y2, y3, y4, country):
    # window is set as 14 in in the beginning of the notebook 
    confirmed_avg = moving_average(y1, window)
    confirmed_increase_avg = moving_average(y2, window)
    death_increase_avg = moving_average(y3, window)
    recovery_increase_avg = moving_average(y4, window)
    
    plt.figure(figsize=(16, 10))
    plt.plot(x, y1)
    plt.plot(x, confirmed_avg, color='red', linestyle='dashed')
    plt.legend(['{} Confirmed Cases'.format(country), 'Moving Average {} Days'.format(window)], prop={'size': 20})
    plt.title('{} Confirmed Cases'.format(country), size=30)
    plt.xlabel('Days Since 1/22/2020', size=30)
    plt.ylabel('# of Cases', size=30)
    plt.xticks(size=20)
    plt.yticks(size=20)
    plt.show()

    plt.figure(figsize=(16, 10))
    plt.bar(x, y2)
    plt.plot(x, confirmed_increase_avg, color='red', linestyle='dashed')
    plt.legend(['Moving Average {} Days'.format(window), '{} Daily Increase in Confirmed Cases'.format(country)], prop={'size': 20})
    plt.title('{} Daily Increases in Confirmed Cases'.format(country), size=30)
    plt.xlabel('Days Since 1/22/2020', size=30)
    plt.ylabel('# of Cases', size=30)
    plt.xticks(size=20)
    plt.yticks(size=20)
    plt.show()

    plt.figure(figsize=(16, 10))
    plt.bar(x, y3)
    plt.plot(x, death_increase_avg, color='red', linestyle='dashed')
    plt.legend(['Moving Average {} Days'.format(window), '{} Daily Increase in Confirmed Deaths'.format(country)], prop={'size': 20})
    plt.title('{} Daily Increases in Deaths'.format(country), size=30)
    plt.xlabel('Days Since 1/22/2020', size=30)
    plt.ylabel('# of Cases', size=30)
    plt.xticks(size=20)
    plt.yticks(size=20)
    plt.show()

    plt.figure(figsize=(16, 10))
    plt.bar(x, y4)
    plt.plot(x, recovery_increase_avg, color='red', linestyle='dashed')
    plt.legend(['Moving Average {} Days'.format(window), '{} Daily Increase in Confirmed Recoveries'.format(country)], prop={'size': 20})
    plt.title('{} Daily Increases in Recoveries'.format(country), size=30)
    plt.xlabel('Days Since 1/22/2020', size=30)
    plt.ylabel('# of Cases', size=30)
    plt.xticks(size=20)
    plt.yticks(size=20)
    plt.show()
      
# helper function for getting country's cases, deaths, and recoveries        
def get_country_info(country_name):
    country_cases = []
    country_deaths = []
    country_recoveries = []  
    
    for i in dates:
        country_cases.append(confirmed_df[confirmed_df['Country/Region']==country_name][i].sum())
        country_deaths.append(deaths_df[deaths_df['Country/Region']==country_name][i].sum())
        country_recoveries.append(recoveries_df[recoveries_df['Country/Region']==country_name][i].sum())
    return (country_cases, country_deaths, country_recoveries)
    
    
def country_visualizations(country_name):
    country_info = get_country_info(country_name)
    country_cases = country_info[0]
    country_deaths = country_info[1]
    country_recoveries = country_info[2]
    
    country_daily_increase = daily_increase(country_cases)
    country_daily_death = daily_increase(country_deaths)
    country_daily_recovery = daily_increase(country_recoveries)
    
    country_plot(adjusted_dates, country_cases, country_daily_increase, country_daily_death, country_daily_recovery, country_name)
    

In [None]:
countries = ['US', 'Russia', 'India', 'China','Pakistan'] 

for country in countries:
    country_visualizations(country)

In [None]:
# Country Comparison
# removed redundant code

compare_countries = ['US', 'Brazil', 'India', 'Pakistan'] 
graph_name = ['Coronavirus Confirmed Cases', 'Coronavirus Confirmed Deaths', 'Coronavirus Confirmed Recoveries']

for num in range(3):
    plt.figure(figsize=(16, 10))
    for country in compare_countries:
        plt.plot(get_country_info(country)[num])
    plt.legend(compare_countries, prop={'size': 20})
    plt.xlabel('Days since 3/1', size=30)
    plt.ylabel('# of Cases', size=30)
    plt.title(graph_name[num], size=30)
    plt.xticks(size=20)
    plt.yticks(size=20)
    plt.show()

In [None]:
unique_countries =  list(latest_data['Country_Region'].unique())
country_confirmed_cases = []
country_death_cases = [] 
country_active_cases = []
country_recovery_cases = []
country_incidence_rate = [] 
country_mortality_rate = [] 

no_cases = []
for i in unique_countries:
    cases = latest_data[latest_data['Country_Region']==i]['Confirmed'].sum()
    if cases > 0:
        country_confirmed_cases.append(cases)
    else:
        no_cases.append(i)
        
for i in no_cases:
    unique_countries.remove(i)
    
# sort countries by the number of confirmed cases
unique_countries = [k for k, v in sorted(zip(unique_countries, country_confirmed_cases), key=operator.itemgetter(1), reverse=True)]
for i in range(len(unique_countries)):
    country_confirmed_cases[i] = latest_data[latest_data['Country_Region']==unique_countries[i]]['Confirmed'].sum()
    country_death_cases.append(latest_data[latest_data['Country_Region']==unique_countries[i]]['Deaths'].sum())
    country_recovery_cases.append(latest_data[latest_data['Country_Region']==unique_countries[i]]['Recovered'].sum())
    country_active_cases.append(latest_data[latest_data['Country_Region']==unique_countries[i]]['Active'].sum())
    country_incidence_rate.append(latest_data[latest_data['Country_Region']==unique_countries[i]]['Incident_Rate'].sum())
    country_mortality_rate.append(country_death_cases[i]/country_confirmed_cases[i])

In [None]:
country_df = pd.DataFrame({'Country Name': unique_countries, 'Number of Confirmed Cases': country_confirmed_cases,
                          'Number of Deaths': country_death_cases, 'Number of Recoveries' : country_recovery_cases, 
                          'Number of Active Cases' : country_active_cases, 'Incidence Rate' : country_incidence_rate,
                          'Mortality Rate': country_mortality_rate})

In [None]:
# return the data table with province/state info for a given country
def country_table(country_name):
    states = list(latest_data[latest_data['Country_Region']==country_name]['Province_State'].unique())
    state_confirmed_cases = []
    state_death_cases = [] 
    # state_recovery_cases = []
    state_active = [] 
    state_incidence_rate = [] 
    state_mortality_rate = [] 

    no_cases = [] 
    for i in states:
        cases = latest_data[latest_data['Province_State']==i]['Confirmed'].sum()
        if cases > 0:
            state_confirmed_cases.append(cases)
        else:
            no_cases.append(i)

    # remove areas with no confirmed cases
    for i in no_cases:
        states.remove(i)

    states = [k for k, v in sorted(zip(states, state_confirmed_cases), key=operator.itemgetter(1), reverse=True)]
    for i in range(len(states)):
        state_confirmed_cases[i] = latest_data[latest_data['Province_State']==states[i]]['Confirmed'].sum()
        state_death_cases.append(latest_data[latest_data['Province_State']==states[i]]['Deaths'].sum())
    #     state_recovery_cases.append(latest_data[latest_data['Province_State']==states[i]]['Recovered'].sum())
        state_active.append(latest_data[latest_data['Province_State']==states[i]]['Active'].sum())
        state_incidence_rate.append(latest_data[latest_data['Province_State']==states[i]]['Incident_Rate'].sum())
        state_mortality_rate.append(state_death_cases[i]/state_confirmed_cases[i])
        
      
    state_df = pd.DataFrame({'State Name': states, 'Number of Confirmed Cases': state_confirmed_cases,
                              'Number of Deaths': state_death_cases, 'Number of Active Cases' : state_active, 
                             'Incidence Rate' : state_incidence_rate, 'Mortality Rate': state_mortality_rate})
    # number of cases per country/region
    return state_df

In [None]:
Pakistan_table = country_table('Pakistan')
Pakistan_table.style.background_gradient(cmap='Oranges')

# USA

In [None]:
usa_df = pd.read_csv('usa_county_wise.csv')
usa_latest = usa_df[usa_df['Date'] == max(usa_df['Date'])]
usa_grouped = usa_latest.groupby('Province_State')['Confirmed', 'Deaths'].sum().reset_index()

In [None]:
us_code = {'Alabama': 'AL', 'Alaska': 'AK', 'American Samoa': 'AS', 'Arizona': 'AZ', 'Arkansas': 'AR', 
    'California': 'CA','Colorado': 'CO','Connecticut': 'CT','Delaware': 'DE', 'District of Columbia': 'DC', 
    'Florida': 'FL', 'Georgia': 'GA', 'Guam': 'GU', 'Hawaii': 'HI', 'Idaho': 'ID', 'Illinois': 'IL',
    'Indiana': 'IN','Iowa': 'IA','Kansas': 'KS','Kentucky': 'KY','Louisiana': 'LA','Maine': 'ME',
    'Maryland': 'MD','Massachusetts': 'MA','Michigan': 'MI','Minnesota': 'MN','Mississippi': 'MS',
    'Missouri': 'MO','Montana': 'MT','Nebraska': 'NE','Nevada': 'NV','New Hampshire': 'NH', 'New Jersey': 'NJ',
    'New Mexico': 'NM', 'New York': 'NY', 'North Carolina': 'NC', 'North Dakota': 'ND', 'Northern Mariana Islands':'MP',
    'Ohio': 'OH', 'Oklahoma': 'OK', 'Oregon': 'OR', 'Pennsylvania': 'PA', 'Puerto Rico': 'PR',
    'Rhode Island': 'RI', 'South Carolina': 'SC', 'South Dakota': 'SD', 'Tennessee': 'TN', 'Texas': 'TX',
    'Utah': 'UT', 'Vermont': 'VT', 'Virgin Islands': 'VI', 'Virginia': 'VA', 'Washington': 'WA',
    'West Virginia': 'WV', 'Wisconsin': 'WI', 'Wyoming': 'WY'}

usa_grouped['Code'] = usa_grouped['Province_State'].map(us_code)

In [None]:
fig = px.choropleth(usa_grouped, color='Confirmed', locations='Code', locationmode="USA-states", 
                    scope="usa", color_continuous_scale="RdGy", title='No. of cases in USA')
fig

In [None]:
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    counties = json.load(response)
fig = px.choropleth(usa_latest, geojson=counties, locations='FIPS', color='Confirmed',
                          color_continuous_scale="Peach",
                            scope="usa",
                            labels={'Confirmed':'Confirmed'})
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

In [None]:
total_world_cases = np.sum(country_confirmed_cases)
us_confirmed = latest_data[latest_data['Country_Region']=='US']['Confirmed'].sum()
outside_us_confirmed = total_world_cases - us_confirmed

plt.figure(figsize=(16, 9))
plt.barh('United States', us_confirmed)
plt.barh('Outside United States', outside_us_confirmed)
plt.title('# of Total Coronavirus Confirmed Cases', size=20)
plt.xticks(size=20)
plt.yticks(size=20)
plt.show()


plt.figure(figsize=(16, 9))
plt.barh('United States', us_confirmed/total_world_cases)
plt.barh('Outside United States', outside_us_confirmed/total_world_cases)
plt.title('# of Coronavirus Confirmed Cases Expressed in Percentage', size=20)
plt.xticks(size=20)
plt.yticks(size=20)
plt.show()

In [None]:
# Only show 15 countries with the most confirmed cases, the rest are grouped into the other category
visual_unique_countries = [] 
visual_confirmed_cases = []
others = np.sum(country_confirmed_cases[10:])

for i in range(len(country_confirmed_cases[:10])):
    visual_unique_countries.append(unique_countries[i])
    visual_confirmed_cases.append(country_confirmed_cases[i])
    
visual_unique_countries.append('Others')
visual_confirmed_cases.append(others)
def plot_bar_graphs(x, y, title):
    plt.figure(figsize=(16, 12))
    plt.barh(x, y)
    plt.title(title, size=20)
    plt.xticks(size=20)
    plt.yticks(size=20)
    plt.show()
    
# good for a lot x values 
def plot_bar_graphs_tall(x, y, title):
    plt.figure(figsize=(19, 18))
    plt.barh(x, y)
    plt.title(title, size=25)
    plt.xticks(size=25)
    plt.yticks(size=25)
    plt.show()

In [None]:
plot_bar_graphs(visual_unique_countries, visual_confirmed_cases, '# of Covid-19 Confirmed Cases in Countries/Regions')

# Pie Chart Visualizations for COVID-19

In [None]:
def plot_pie_charts(x, y, title):
    # more muted color 
    c = ['lightcoral', 'rosybrown', 'sandybrown', 'navajowhite', 'gold',
        'khaki', 'lightskyblue', 'turquoise', 'lightslategrey', 'thistle', 'pink']
    plt.figure(figsize=(20,15))
    plt.title(title, size=20)
    plt.pie(y, colors=c,shadow=True, labels=y)
    plt.legend(x, loc='best', fontsize=12)
    plt.show()

In [None]:
plot_pie_charts(visual_unique_countries, visual_confirmed_cases, 'Covid-19 Confirmed Cases per Country')

# Pakistan

In [None]:
#Total cases of carona in Pakistan
df_pk['Total Cases'] = df_pk['Cured'] + df_pk['Deaths'] + df_pk['Confirmed']
#Active cases of carona in Pakistan
df_pk['Active Cases'] = df_pk['Total Cases'] - df_pk['Cured'] - df_pk['Deaths']

In [None]:
#Till 3rd April Cases in Pakistan
df1= df_pk[df_pk['Date']=='3/4/2020']
#print(df1)
fig = px.bar(df1, x='State/UnionTerritory', y='Total Cases', color='Total Cases', height=600)
fig.update_layout(title='Till 3rd April Total Cases in Pakistan')
fig.show()

In [None]:
#Till  April Active Cases in Pakistan
df1= df_pk[df_pk['Date']=='3/4/2020']
fig = px.bar(df1, x='State/UnionTerritory', y='Active Cases', color='Active Cases',barmode='group', height=600)
fig.update_layout( title='Till 3rd April Active Cases in Pakistan')
fig.show()

In [None]:
df_pk['Date'] =pd.to_datetime(df_pk.Date,dayfirst=True)

In [None]:
#Daily Cases in Pakistan Date wise
carona_data = df_pk.groupby(['Date'])['Total Cases'].sum().reset_index().sort_values('Total Cases',ascending = True)
carona_data['Daily Cases'] = carona_data['Total Cases'].sub(carona_data['Total Cases'].shift())
carona_data['Daily Cases'].iloc[0] = carona_data['Total Cases'].iloc[0]
carona_data['Daily Cases'] = carona_data['Daily Cases'].astype(int)
fig = px.bar(carona_data, y='Daily Cases', x='Date',hover_data =['Daily Cases'], color='Daily Cases', height=500)
fig.update_layout(
    title='Daily Cases in Pakistan Date wise')
fig.show()

In [None]:
#Total Cases,Active Cases,Cured,Deaths from Corona Virus in Pakistan
carona_data = df_pk.groupby(['Date'])['Total Cases','Active Cases','Cured','Deaths'].sum().reset_index().sort_values('Date',ascending=False)
fig = go.Figure()
fig.add_trace(go.Scatter(x=carona_data['Date'], y=carona_data['Total Cases'],
                    mode='lines+markers',name='Total Cases'))
fig.add_trace(go.Scatter(x=carona_data['Date'], y=carona_data['Active Cases'], 
                mode='lines+markers',name='Active Cases'))
fig.add_trace(go.Scatter(x=carona_data['Date'], y=carona_data['Cured'], 
                mode='lines+markers',name='Cured'))
fig.add_trace(go.Scatter(x=carona_data['Date'], y=carona_data['Deaths'], 
                mode='lines+markers',name='Deaths'))
fig.update_layout(title_text='Curve Showing Different Cases from COVID-19 in Pakistan',plot_bgcolor='rgb(225,230,255)')
fig.show()

In [None]:
#Testing till 2 April
df_pk['Date'] =pd.to_datetime(df_pk['Date'],dayfirst=True)
fig = go.Figure()
fig.add_trace(go.Scatter(x=df_pk['Date'], y=df_pk['Cumulative Tests'],
                    mode='lines+markers',name='Cumulative Tests'))
fig.add_trace(go.Scatter(x=df_pk['Date'], y=df_pk['Still Admitted'], 
                mode='lines+markers',name='Still Admitted'))
fig.add_trace(go.Scatter(x=df_pk['Date'], y=df_pk['Confirmed'], 
                mode='lines+markers',name='Confirmed'))
fig.update_layout(title_text='TEST for COVID-19',plot_bgcolor='rgb(225,230,255)')
fig.show()

In [None]:
#Total Cases,Active Cases,Cured,Deaths from Corona Virus in Pakistan
carona_data = df_pk.groupby(['Date'])['Cumulative Tests','Still Admitted','Confirmed','Home Facility'].sum().reset_index().sort_values('Date',ascending=False)
fig = go.Figure()
#fig.add_trace(go.Scatter(x=carona_data['Date'], y=carona_data['Cumulative Tests'], 
                #mode='lines+markers',name='Cumulative Tests'))
fig.add_trace(go.Scatter(x=carona_data['Date'], y=carona_data['Confirmed'], 
                mode='lines+markers',name='Confirmed'))
fig.add_trace(go.Scatter(x=carona_data['Date'], y=carona_data['Still Admitted'], 
                mode='lines+markers',name='Still Admitted'))
fig.add_trace(go.Scatter(x=carona_data['Date'], y=carona_data['Home Facility'], 
                mode='lines+markers',name='Home Facility'))
fig.update_layout(title_text='Curve Showing Test Performed and Status of Different Cases from COVID-19 in Pakistan',plot_bgcolor='rgb(225,230,255)')
fig.show()


In [None]:
#Last update
last_date = df_pk['Date'].iloc[-1]
last_df = df_pk[df_pk['Date'] == last_date].groupby('State/UnionTerritory').sum()[['Confirmed', 'Deaths']]

In [None]:
last_df = last_df.sort_values(by='Confirmed', ascending=False)
print('Pakistan Results by Region')
#We can find different camp options here: https://matplotlib.org/3.2.0/tutorials/colors/colormaps.html
last_df.style.background_gradient(cmap='Greens')

In [None]:
#Cumulative Partition
c = last_df
conf_max = c['Confirmed'][:4] 
conf_max.loc['Other'] = c['Confirmed'][4:].sum()
plt.figure(figsize=(11,6))
plt.pie(conf_max, labels=conf_max.index, autopct='%1.1f%%', explode=(0,0,0,0,1), shadow=True)
plt.title('COVID-19 Cumulative Patients Partition')
plt.show()

In [None]:
confirmed_df = pd.read_csv('time_series_covid_19_confirmed.csv')
deaths_df = pd.read_csv('time_series_covid_19_deaths.csv')
recoveries_df = pd.read_csv('time_series_covid_19_recovered.csv')

# Data Preprocessing

In [None]:
cols = confirmed_df.keys()

confirmed = confirmed_df.loc[:, cols[4]:cols[-1]]
deaths = deaths_df.loc[:, cols[4]:cols[-1]]
recoveries = recoveries_df.loc[:, cols[4]:cols[-1]]

dates = confirmed.keys()
world_cases = []
total_deaths = [] 
mortality_rate = []
recovery_rate = [] 
total_recovered = [] 
total_active = [] 
 
pk_cases = [] 

for i in dates:
    confirmed_sum = confirmed[i].sum()
    death_sum = deaths[i].sum()
    recovered_sum = recoveries[i].sum()
    
    # confirmed, deaths, recovered, and active
    world_cases.append(confirmed_sum)
    total_deaths.append(death_sum)
    total_recovered.append(recovered_sum)
    total_active.append(confirmed_sum-death_sum-recovered_sum)
    
    # calculate rates
    mortality_rate.append(death_sum/confirmed_sum)
    recovery_rate.append(recovered_sum/confirmed_sum)

    # case studies 
    pk_cases.append(confirmed_df[confirmed_df['Country/Region']=='Pakistan'][i].sum())

def daily_increase(data):
    d = [] 
    for i in range(len(data)):
        if i == 0:
            d.append(data[0])
        else:
            d.append(data[i]-data[i-1])
    return d 

world_daily_increase = daily_increase(world_cases)

pk_daily_increase = daily_increase(pk_cases)

days_since_1_22 = np.array([i for i in range(len(dates))]).reshape(-1, 1)
world_cases = np.array(world_cases).reshape(-1, 1)
total_deaths = np.array(total_deaths).reshape(-1, 1)
total_recovered = np.array(total_recovered).reshape(-1, 1)

# Forcasting

In [None]:
days_in_future = 10
future_forcast = np.array([i for i in range(len(dates)+days_in_future)]).reshape(-1, 1)
#print(future_forcast)
adjusted_dates = future_forcast[:-10]
#print(adjusted_dates)

In [None]:
start = '1/22/2020'
start_date = datetime.datetime.strptime(start, '%m/%d/%Y')
future_forcast_dates = []
for i in range(len(future_forcast)):
    future_forcast_dates.append((start_date + datetime.timedelta(days=i)).strftime('%m/%d/%Y'))

In [None]:
X_train_confirmed, X_test_confirmed, y_train_confirmed, y_test_confirmed = train_test_split(days_since_1_22, pk_cases, test_size=0.2, shuffle=False)

# Predictions with SVM

In [None]:
svm_confirmed = SVR(shrinking=True, kernel='poly',gamma=0.01, epsilon=1,degree=6, C=0.1)
svm_confirmed.fit(X_train_confirmed, y_train_confirmed)
svm_pred = svm_confirmed.predict(future_forcast)

In [None]:
#svm_confirmed = LinearRegression()
#svm_confirmed.fit(X_train_confirmed, y_train_confirmed)
#svm_pred = svm_confirmed.predict(future_forcast)

# >> -8437.9142

In [None]:
#svm_confirmed = Lasso(alpha=0.01)
#svm_confirmed.fit(X_train_confirmed, y_train_confirmed)
#svm_pred = svm_confirmed.predict(future_forcast)

# >> -8438.0359

In [None]:
#svm_confirmed = Ridge(alpha=1.0)
#svm_confirmed.fit(X_train_confirmed, y_train_confirmed)
#svm_pred = svm_confirmed.predict(future_forcast)

# >> -8438.8834

In [None]:
#check against testing data

svm_test_pred = svm_confirmed.predict(X_test_confirmed)
plt.plot(y_test_confirmed)
plt.plot(svm_test_pred)
plt.legend(['Test Data', 'SVM Predictions'])
print('MAE:', mean_absolute_error(svm_test_pred, y_test_confirmed))
print('MSE:',mean_squared_error(svm_test_pred, y_test_confirmed))
print('RMSE:',math.sqrt(mean_squared_error(svm_test_pred, y_test_confirmed))) 
print('R-square:',r2_score(svm_test_pred, y_test_confirmed))

# Predictions with Polynomial Regression

In [None]:
# transform our data for polynomial regression

poly = PolynomialFeatures(degree=5)
poly_X_train_confirmed = poly.fit_transform(X_train_confirmed)
poly_X_test_confirmed = poly.fit_transform(X_test_confirmed)
poly_future_forcast = poly.fit_transform(future_forcast)

In [None]:
# polynomial regression

linear_model = LinearRegression(normalize=True, fit_intercept=False)
linear_model.fit(poly_X_train_confirmed, y_train_confirmed)
test_linear_pred = linear_model.predict(poly_X_test_confirmed)
linear_pred = linear_model.predict(poly_future_forcast)
print('MAE:', mean_absolute_error(test_linear_pred, y_test_confirmed))
print('MSE:',mean_squared_error(test_linear_pred, y_test_confirmed))
print('RMSE:',math.sqrt(mean_squared_error(test_linear_pred, y_test_confirmed))) 
print('R-square:',r2_score(test_linear_pred, y_test_confirmed)) 

In [None]:
print(linear_model.coef_)    # gives the coefficient of the features of your dataset

In [None]:
plt.plot(y_test_confirmed)
plt.plot(test_linear_pred)
plt.legend(['Test Data', 'Polynomial Regression predictions'])

# Predictions with Bayesian Ridge Polynomial Regression

In [None]:
tol = [1e-4, 1e-3, 1e-2]
alpha_1 = [1e-7, 1e-6, 1e-5, 1e-4]
alpha_2 = [1e-7, 1e-6, 1e-5, 1e-4]
lambda_1 = [1e-7, 1e-6, 1e-5, 1e-4]
lambda_2 = [1e-7, 1e-6, 1e-5, 1e-4]

bayesian_grid = {'tol': tol, 'alpha_1': alpha_1, 'alpha_2' : alpha_2, 'lambda_1': lambda_1, 'lambda_2' : lambda_2}

bayesian = BayesianRidge(fit_intercept=False, normalize=True)
bayesian_search = RandomizedSearchCV(bayesian, bayesian_grid, scoring='neg_mean_squared_error', cv=3, return_train_score=True, n_jobs=-1, n_iter=40, verbose=1)
bayesian_search.fit(poly_X_train_confirmed, y_train_confirmed)

In [None]:
bayesian_search.best_params_

In [None]:
bayesian_confirmed = bayesian_search.best_estimator_
test_bayesian_pred = bayesian_confirmed.predict(poly_X_test_confirmed)
bayesian_pred = bayesian_confirmed.predict(poly_future_forcast)
print('MAE:', mean_absolute_error(test_bayesian_pred, y_test_confirmed))
print('MSE:',mean_squared_error(test_bayesian_pred, y_test_confirmed))
print('RMSE:',math.sqrt(mean_squared_error(test_bayesian_pred, y_test_confirmed))) 
print('R-square:',r2_score(test_bayesian_pred, y_test_confirmed)) 

In [None]:
plt.plot(y_test_confirmed)
plt.plot(test_bayesian_pred)
plt.legend(['Test Data', 'Bayesian Ridge Polynomial Regression Predictions'])

# SVM future predictions for next 10 days 

In [None]:
plt.figure(figsize=(5, 5))
plt.plot(adjusted_dates, pk_cases)
plt.plot(future_forcast, svm_pred, linestyle='dashed', color='green')
plt.title('num of Coronavirus Cases Over Time', size=12)
plt.xlabel('Days Since 1/22/2020', size=12)
plt.ylabel('num of Cases', size=12)
plt.legend(['Confirmed Cases', 'SVM predictions'], prop={'size': 10})
plt.xticks(size=10)
plt.yticks(size=10)
plt.show()

# Polynomial Regression future predictions for next 10 days

In [None]:
plt.figure(figsize=(5, 5))
plt.plot(adjusted_dates, pk_cases)
plt.plot(future_forcast, linear_pred, linestyle='dashed', color='purple')
plt.title('num of Coronavirus Cases Over Time', size=12)
plt.xlabel('Days Since 1/22/2020', size=12)
plt.ylabel('num of Cases', size=12)
plt.legend(['Confirmed Cases', 'Polynomial Regression Predictions'], prop={'size': 10})
plt.xticks(size=10)
plt.yticks(size=10)
plt.show()

# Polynomial Bayesian Ridge Regression Predictions for next 30 days

In [None]:
plt.figure(figsize=(5, 5))
plt.plot(adjusted_dates, pk_cases)
plt.plot(future_forcast, bayesian_pred, linestyle='dashed', color='red')
plt.title('num of Coronavirus Cases Over Time', size=12)
plt.xlabel('Time', size=12)
plt.ylabel('num of Cases', size=12)
plt.legend(['Confirmed Cases', 'Polynomial Bayesian Ridge Regression Predictions'], prop={'size': 10})
plt.xticks(size=10)
plt.yticks(size=10)
plt.show()