# COVID-19 Data Analysis & Visualization

## What is COVID-19?

>Coronavirus (COVID-19) symptoms are similar to colds and flu. They include a high temperature, a cough and a loss or change to your smell or taste. You can help stop the spread of coronavirus (COVID-19) by getting vaccinated and taking care when meeting other people. While you’re staying at home with coronavirus (COVID-19), you can ease mild symptoms by resting, drinking plenty of fluids and taking painkillers. Treatments for coronavirus (COVID-19) include antibody and antiviral medicines. They are only for people at highest risk of becoming seriously ill.

![Coronavirus Image](https://www.imperial.ac.uk/stories/two-years-of-covid/assets/JeG4hqRj7c/covid-imperial-story-title-image-wide-2560x1440.jpeg)

### Importing Libraries

In [45]:
# Importing the library
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 

import plotly.express as px
import plotly.graph_objects as go 

from IPython.core.display import display, HTML
from ipywidgets import interact
from ipywidgets import widgets 
import folium

### Loading the data from source

In [46]:
# loading data right from the source:
death_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
confirmed_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
recovered_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
country_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv')

### Understanding the data

In [47]:
# Checking some of the rows from death_df dataframe
death_df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,11/21/22,11/22/22,11/23/22,11/24/22,11/25/22,11/26/22,11/27/22,11/28/22,11/29/22,11/30/22
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,...,7832,7833,7833,7833,7833,7833,7833,7833,7833,7833
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,3594,3594,3594,3594,3594,3594,3594,3594,3594,3594
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,6881,6881,6881,6881,6881,6881,6881,6881,6881,6881
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,156,156,156,156,156,156,156,156,156,157
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,1917,1917,1923,1923,1923,1923,1923,1923,1923,1924


In [48]:
# Checking some of the rows from confirmed_df dataframe
confirmed_df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,11/21/22,11/22/22,11/23/22,11/24/22,11/25/22,11/26/22,11/27/22,11/28/22,11/29/22,11/30/22
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,...,205229,205324,205391,205506,205541,205612,205612,205802,205830,205907
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,333257,333282,333293,333305,333316,333322,333330,333330,333338,333343
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,271028,271035,271041,271050,271057,271061,271061,271079,271082,271090
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,46824,46824,46824,46824,46824,46824,46824,46824,46824,47219
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,103131,103131,104491,104491,104491,104491,104491,104491,104491,104676


In [49]:
# Checking some of the rows from recovered_df dataframe
recovered_df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,11/21/22,11/22/22,11/23/22,11/24/22,11/25/22,11/26/22,11/27/22,11/28/22,11/29/22,11/30/22
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [50]:
# Checking some of the rows from country_df dataframe
country_df.head()

Unnamed: 0,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Incident_Rate,People_Tested,People_Hospitalized,Mortality_Rate,UID,ISO3,Cases_28_Days,Deaths_28_Days
0,Afghanistan,2022-12-01 08:20:17,33.93911,67.709953,205907,7833,,,528.938544,,,3.804145,4,AFG,2642,10
1,Albania,2022-12-01 08:20:17,41.1533,20.1683,333343,3594,,,11583.258044,,,1.078169,8,ALB,347,1
2,Algeria,2022-12-01 08:20:17,28.0339,1.6596,271090,6881,,,618.206504,,,2.538271,12,DZA,250,0
3,Andorra,2022-12-01 08:20:17,42.5063,1.5218,47219,157,,,61113.052482,,,0.332493,20,AND,631,2
4,Angola,2022-12-01 08:20:17,-11.2027,17.8739,104676,1924,,,318.490679,,,1.838053,24,AGO,1545,7


In [51]:
print(death_df.shape)
print(confirmed_df.shape)
print(recovered_df.shape)
print(country_df.shape)

(289, 1048)
(289, 1048)
(274, 1048)
(201, 16)


In [52]:
death_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 289 entries, 0 to 288
Columns: 1048 entries, Province/State to 11/30/22
dtypes: float64(2), int64(1044), object(2)
memory usage: 2.3+ MB


In [53]:
confirmed_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 289 entries, 0 to 288
Columns: 1048 entries, Province/State to 11/30/22
dtypes: float64(2), int64(1044), object(2)
memory usage: 2.3+ MB


In [54]:
recovered_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 274 entries, 0 to 273
Columns: 1048 entries, Province/State to 11/30/22
dtypes: float64(2), int64(1044), object(2)
memory usage: 2.2+ MB


In [55]:
country_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 201 entries, 0 to 200
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Country_Region       201 non-null    object 
 1   Last_Update          201 non-null    object 
 2   Lat                  199 non-null    float64
 3   Long_                199 non-null    float64
 4   Confirmed            201 non-null    int64  
 5   Deaths               201 non-null    int64  
 6   Recovered            0 non-null      float64
 7   Active               0 non-null      float64
 8   Incident_Rate        196 non-null    float64
 9   People_Tested        0 non-null      float64
 10  People_Hospitalized  0 non-null      float64
 11  Mortality_Rate       201 non-null    float64
 12  UID                  201 non-null    int64  
 13  ISO3                 197 non-null    object 
 14  Cases_28_Days        201 non-null    int64  
 15  Deaths_28_Days       201 non-null    int

### Preparing the Data

In [56]:
death_df.columns = map(str.lower, death_df.columns)
confirmed_df.columns = map(str.lower, confirmed_df.columns)
recovered_df.columns = map(str.lower, recovered_df.columns)
country_df.columns = map(str.lower, country_df.columns)

In [57]:
# Renaming some of the columns for ease of handling
death_df = death_df.rename(columns={'province/state':'state', 'country/region':'country'})
confirmed_df = confirmed_df.rename(columns={'province/state':'state', 'country/region':'country'})
recovered_df= recovered_df.rename(columns={'province/state':'state', 'country/region':'country'})
country_df = country_df.rename(columns={ 'country_region':'country'})

### Featuring Engineering
#### Confirmed/Death/Recovered New Cases

In [58]:
# Creating a new feature "NewCases" to capture the difference between the last/latest day count and 2nd last day count
confirmed_df.insert(4,'NewCases',0)
confirmed_df['NewCases'] = confirmed_df.iloc[:,-1] - confirmed_df.iloc[:,-2]

In [59]:
# Creating a new feature "NewCases" to capture the difference between the last/latest day count and 2nd last day count
try:
    confirmed_df.insert(4,'NewCases',0)
except:
    pass
confirmed_df['NewCases'] = confirmed_df.iloc[:,-1] - confirmed_df.iloc[:,-2]

In [60]:
# Creating a new feature "NewCases" to capture the difference between the last/latest day count and 2nd last day count
try:
    death_df.insert(4,'NewCases',0)
except:
    pass
death_df['NewCases'] = death_df.iloc[:,-1] - death_df.iloc[:,-2]

In [61]:
# Creating a new feature "NewCases" to capture the difference between the last/latest day count and 2nd last day count
try:
    recovered_df.insert(4,'NewCases',0)
except:
    pass
recovered_df['NewCases'] = recovered_df.iloc[:,-1] - recovered_df.iloc[:,-2]

### Overall Worldwide Counts

In [64]:
# Summing up the total confiremed cases acrons countries
confirmed_total = country_df['confirmed'].sum()
confirmed_total

643377353

In [66]:
# Summing up the total deaths across countries
deaths_total = country_df['deaths'].sum()
deaths_total

6635119

In [67]:
# Summing up the total recovered cases across countries
recovered_total = country_df['recovered'].sum()
recovered_total

0.0

In [68]:
# Summing up the total active cases across countries
active_total = country_df['active'].sum()
active_total

0.0

In [82]:
# display the current total stats

display(
    HTML(
        "<div style = 'background-color: #504e4e; padding:32px '>"+
        "<span style='color: #fff; font-size:32px;'> Confirmed: " + str(confirmed_total) + "</span>" + "<br>" +
        "<span style='color: red; font-size:32px;'> Deaths: " + str(deaths_total) + "</span>" + "<br>" +
        "<span style='color: lightgreen; font-size:32px;'> Recovered: " + str(recovered_total) + "</span>" + "<br>" +
        "<span style='color: lightblue; font-size:32px;'> Active: " + str(active_total) + "</span>" + "<br>" +
        "</div>"
    )
)