# Coronavirus (COVID-19) Visualization, Comparisons & Prediction 

<center><img src='https://dreambuildingtest.files.wordpress.com/2019/10/6cac4-background-machine-learning-franki-chamaki-z4h9mymwima-unsplash-2000-529.jpg' width="1000" height="200" >

## Introduction 
Coronavirus is a family of viruses that are named after their spiky crown. The novel coronavirus, also known as SARS-CoV-2, is a contagious respiratory virus that first reported in Wuhan, China. On 2/11/2020, the World Health Organization designated the name COVID-19 for the disease caused by the novel coronavirus. This notebook aims at exploring COVID-19 through data analysis and projections. 


#### This notebook aims at exploring COVID-19 situation all around the world and in Iran through data analysis and projections.




   Coronavirus Case Data is provided by <a href='https://github.com/CSSEGISandData/COVID-19'>Johns Hopkins University</a>
   <br>Mobility data is provided by <a href='https://www.apple.com/covid19/mobility'>Apple</a>
   <br>Learn more from the <a href='https://www.who.int/emergencies/diseases/novel-coronavirus-2019'>World Health Organization</a>
   <br>Learn more from the <a href='https://www.cdc.gov/coronavirus/2019-ncov'>Centers for Disease Control and Prevention</a>
   <br>Check out map visualizations from  <a href='https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6'>JHU CCSE Dashboard</a>
   
   > Don't PANIC, Keep strong, world. We can get through this! follow WHO guidelines. Wear masks, wash your hands and stay safe and Healthy.</font>

## Import Libraries

In [3]:
import pandas as pd 
import numpy as np 

import matplotlib.pyplot as plt 
import matplotlib.colors as mcolors

import random
import math
import time
import datetime
import operator 

from sklearn.linear_model import LinearRegression, BayesianRidge
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, mean_absolute_error

plt.style.use('fivethirtyeight')
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

#### Import the data

In [4]:
confirmed_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
deaths_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
recoveries_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
latest_data = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/12-25-2020.csv')
us_medical_data = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports_us/12-25-2020.csv')
apple_mobility = pd.read_csv('https://covid19-static.cdn-apple.com/covid19-mobility-data/2023HotfixDev19/v3/en-us/applemobilitytrends-2020-12-25.csv')

In [3]:
confirmed_df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,11/27/20,11/28/20,11/29/20,11/30/20,12/1/20,12/2/20,12/3/20,12/4/20,12/5/20,12/6/20
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,...,45723,45844,46116,46274,46516,46718,46837,46837,47072,47306
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,36245,36790,37625,38182,39014,39719,40501,41302,42148,42988
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,80168,81212,82221,83199,84152,85084,85927,86730,87502,88252
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,6610,6610,6712,6745,6790,6842,6904,6955,7005,7050
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,15008,15087,15103,15139,15251,15319,15361,15493,15536,15591


In [4]:
deaths_df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,11/27/20,11/28/20,11/29/20,11/30/20,12/1/20,12/2/20,12/3/20,12/4/20,12/5/20,12/6/20
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,...,1740,1752,1774,1795,1822,1841,1846,1846,1864,1874
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,771,787,798,810,822,839,852,870,889,905
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,2372,2393,2410,2431,2447,2464,2480,2492,2501,2516
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,76,76,76,76,76,76,77,77,78,78
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,342,345,346,348,350,351,352,353,354,354


In [5]:
recoveries_df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,11/27/20,11/28/20,11/29/20,11/30/20,12/1/20,12/2/20,12/3/20,12/4/20,12/5/20,12/6/20
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,...,36295,36709,36716,36831,36946,37218,37260,37260,37393,37685
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,17755,18152,18481,18849,19384,19912,20484,20974,21286,21617
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,51946,52568,53204,53809,54405,54990,55538,56079,56617,57146
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,5710,5710,5794,5873,5940,5988,6066,6130,6171,6238
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,7697,7763,7763,7851,7932,8139,8244,8299,8335,8338


In [6]:
latest_data.head()

Unnamed: 0,FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key,Incident_Rate,Case_Fatality_Ratio
0,,,,Afghanistan,2020-12-06 05:26:18,33.93911,67.709953,47072,1864,37393,7815.0,Afghanistan,120.919615,3.959891
1,,,,Albania,2020-12-06 05:26:18,41.1533,20.1683,42148,889,21286,19973.0,Albania,1464.591007,2.109234
2,,,,Algeria,2020-12-06 05:26:18,28.0339,1.6596,87502,2501,56617,28384.0,Algeria,199.543714,2.85822
3,,,,Andorra,2020-12-06 05:26:18,42.5063,1.5218,7005,78,6171,756.0,Andorra,9066.200738,1.11349
4,,,,Angola,2020-12-06 05:26:18,-11.2027,17.8739,15536,354,8335,6847.0,Angola,47.27035,2.278579


In [7]:
us_medical_data.head()

Unnamed: 0,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,FIPS,Incident_Rate,Total_Test_Results,People_Hospitalized,Case_Fatality_Ratio,UID,ISO3,Testing_Rate,Hospitalization_Rate
0,Alabama,US,2020-12-06 05:30:26,32.3182,-86.9023,267589,3877,168387.0,95325.0,1.0,5457.452656,1637161.0,,1.448864,84000001.0,USA,33389.745645,
1,Alaska,US,2020-12-06 05:30:26,61.3707,-152.4044,36271,143,7165.0,28963.0,2.0,4958.136547,1067231.0,,0.394254,84000002.0,USA,145887.265992,
2,American Samoa,US,2020-12-06 05:30:26,-14.271,-170.132,0,0,,0.0,60.0,0.0,2140.0,,,16.0,ASM,3846.084722,
3,Arizona,US,2020-12-06 05:30:26,33.7298,-111.4312,358900,6925,55640.0,296335.0,4.0,4930.814043,2349913.0,,1.929507,84000004.0,USA,32284.714463,
4,Arkansas,US,2020-12-06 05:30:26,34.9697,-92.3731,169382,2620,148131.0,18631.0,5.0,5612.756826,1748446.0,,1.5468,84000005.0,USA,57937.692441,


## Data Preprocessing and Exploring 

In [8]:
cols = confirmed_df.keys()

In [9]:
cols[:10]

Index(['Province/State', 'Country/Region', 'Lat', 'Long', '1/22/20', '1/23/20',
       '1/24/20', '1/25/20', '1/26/20', '1/27/20'],
      dtype='object')

In [11]:
# Get the dates column 4 to end
confirmed = confirmed_df.loc[:, cols[4]:cols[-1]]
# confiremd_iloc=confirmed_df.iloc[:,4:]
deaths = deaths_df.loc[:, cols[4]:cols[-1]]
recoveries = recoveries_df.loc[:, cols[4]:cols[-1]]

In [12]:
dates = confirmed.keys()

world_cases = []
total_deaths = [] 
total_recovered = [] 
total_active = [] 
mortality_rate = []
recovery_rate = [] 

for i in dates:
    confirmed_sum = confirmed[i].sum()                      # Sum of confiremd cases all over the world
    death_sum = deaths[i].sum()                             # Sum of death cases all over the world
    recovered_sum = recoveries[i].sum()                     # Sum of recovered cases all over the World
    
    # confirmed, deaths, recovered, and active
    world_cases.append(confirmed_sum)
    total_deaths.append(death_sum)
    total_recovered.append(recovered_sum)
    total_active.append(confirmed_sum-death_sum-recovered_sum)
    
    # calculate rates
    mortality_rate.append(death_sum/confirmed_sum)
    recovery_rate.append(recovered_sum/confirmed_sum)

In [13]:
def daily_increase(data):
    d = [] 
    for i in range(len(data)):
        if i == 0:
            d.append(data[0])
        else:
            d.append(data[i]-data[i-1])
    return d 

def moving_average(data, window_size):
    moving_average = []
    for i in range(len(data)):
        if i + window_size < len(data):
            moving_average.append(np.mean(data[i:i+window_size]))
        else:
            moving_average.append(np.mean(data[i:len(data)]))
    return moving_average

# window size
window = 7

# confirmed cases
world_daily_increase = daily_increase(world_cases)
world_confirmed_avg= moving_average(world_cases, window)
world_daily_increase_avg = moving_average(world_daily_increase, window)

# deaths
world_daily_death = daily_increase(total_deaths)
world_death_avg = moving_average(total_deaths, window)
world_daily_death_avg = moving_average(world_daily_death, window)


# recoveries
world_daily_recovery = daily_increase(total_recovered)
world_recovery_avg = moving_average(total_recovered, window)
world_daily_recovery_avg = moving_average(world_daily_recovery, window)


# active 
world_active_avg = moving_average(total_active, window)