# Date Visualizations on COVID19 DataSet
In this Project, I will try to visualize COVID19 Dataset through the Confirmed Cases and Deaths.

## Table of Contents
   ### Explore the Data
        1. Universal Growth Of COVID19 Over Time
        2. Trend Of COVID19 In Top 15 Countries
        3. Mortality Rate
        4. Specific Country Groth Of COVID19
            4.1 Hong Kong
            4.1 China
   ### Ending
## Explore the Data

In [1]:
import pandas as pd
import os
if not os.path.exists("images"):
    os.mkdir("images")
import numpy as np
import datetime as dt
import requests
import sys
from itertools import chain
import pycountry
import pycountry_convert as pc
import plotly_express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

In [8]:
main = pd.read_csv('covid_19_data.csv')

In [9]:
main.rename(columns={'ObservationDate':'Date',
                     'Province/State':'ProvinceState',
                     'Country/Region':'CountryRegion'},inplace=True)


main.loc[main['CountryRegion']=='Mainland China','CountryRegion']='China'
main['Date'] = pd.to_datetime(main['Date'],format='%m/%d/%Y')
main['Day'] = main.Date.dt.dayofyear
main["caseslag"] = main.groupby(["CountryRegion","ProvinceState"])["Confirmed"].shift(1)
main['deathslag'] = main.groupby(['CountryRegion','ProvinceState'])['Deaths'].shift(1)

main['DailyCases'] = main['Confirmed'] - main['caseslag']
main['DailyDeaths'] = main['Deaths'] - main['deathslag']

In [10]:
display(main.head())
display(main.info())
display(main.describe())

Unnamed: 0,SNo,Date,ProvinceState,CountryRegion,Last Update,Confirmed,Deaths,Recovered,Day,caseslag,deathslag,DailyCases,DailyDeaths
0,1,2020-01-22,Anhui,China,1/22/2020 17:00,1.0,0.0,0.0,22,,,,
1,2,2020-01-22,Beijing,China,1/22/2020 17:00,14.0,0.0,0.0,22,,,,
2,3,2020-01-22,Chongqing,China,1/22/2020 17:00,6.0,0.0,0.0,22,,,,
3,4,2020-01-22,Fujian,China,1/22/2020 17:00,1.0,0.0,0.0,22,,,,
4,5,2020-01-22,Gansu,China,1/22/2020 17:00,0.0,0.0,0.0,22,,,,


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 306429 entries, 0 to 306428
Data columns (total 13 columns):
 #   Column         Non-Null Count   Dtype         
---  ------         --------------   -----         
 0   SNo            306429 non-null  int64         
 1   Date           306429 non-null  datetime64[ns]
 2   ProvinceState  228329 non-null  object        
 3   CountryRegion  306429 non-null  object        
 4   Last Update    306429 non-null  object        
 5   Confirmed      306429 non-null  float64       
 6   Deaths         306429 non-null  float64       
 7   Recovered      306429 non-null  float64       
 8   Day            306429 non-null  int64         
 9   caseslag       227568 non-null  float64       
 10  deathslag      227568 non-null  float64       
 11  DailyCases     227568 non-null  float64       
 12  DailyDeaths    227568 non-null  float64       
dtypes: datetime64[ns](1), float64(7), int64(2), object(3)
memory usage: 30.4+ MB


None

Unnamed: 0,SNo,Confirmed,Deaths,Recovered,Day,caseslag,deathslag,DailyCases,DailyDeaths
count,306429.0,306429.0,306429.0,306429.0,306429.0,227568.0,227568.0,227568.0,227568.0
mean,153215.0,85670.91,2036.403268,50420.29,174.961939,79966.19,1956.884747,501.816657,10.23277
std,88458.577156,277551.6,6410.938048,201512.4,103.987447,248474.8,5856.901547,2289.946998,49.323903
min,1.0,-302844.0,-178.0,-854405.0,1.0,-302844.0,-178.0,-302844.0,-5341.0
25%,76608.0,1042.0,13.0,11.0,88.0,1227.75,15.0,2.0,0.0
50%,153215.0,10375.0,192.0,1751.0,162.0,11732.0,234.0,63.0,1.0
75%,229822.0,50752.0,1322.0,20270.0,265.0,48299.5,1408.0,271.0,6.0
max,306429.0,5863138.0,112385.0,6399531.0,366.0,5692920.0,112379.0,302844.0,4068.0


## Universal Growth Of COVID19 Over Time 
In this part, I will have a quick look at how COVID19 has been growing throufhout the world. I will be using some interactive graphs to show the daily impact of COVID19 worldwide.

In [11]:
def daily_count(main):
    main.loc[0,'DailyCases'] = main.loc[0,'Confirmed']
    main.loc[0,'DailyDeaths'] = main.loc[0,'Deaths']
    for i in range(1,len(main)):
        main.loc[i,'DailyCases'] = main.loc[i,'Confirmed'] - main.loc[i-1,'Confirmed']
        main.loc[i,'DailyDeaths'] = main.loc[i,'Deaths'] - main.loc[i-1,'Deaths']

    main.loc[0,'DailyCases'] = 0
    main.loc[0,'DailyDeaths'] = 0
    return main

In [12]:
main_world = main.copy()
main_world = main_world.groupby('Date',as_index=False)['Confirmed','Deaths','DailyCases','DailyDeaths'].sum()
main_world = daily_count(main_world)

In [13]:
def draw_graph(df, x, y1, y2, title, days=7):
    colors = dict(case='#4285F4', death='#EA4335')
    df['cases_avg'] = df[y1].rolling(days).mean()
    df['deaths_avg'] = df[y2].rolling(days).mean()
    fig = make_subplots(specs=[[{"secondary_y": True}]])
    fig.add_trace(go.Scatter(name='DailyCases', x=df[x], y=df[y1], mode='lines',
                             line=dict(width=0.5, color=colors['case'])),
                  secondary_y=False)
    fig.add_trace(go.Scatter(name='DailyDeaths', x=df[x], y=df[y2], mode='lines',
                             line=dict(width=0.5, color=colors['death'])),
                  secondary_y=True)
    fig.add_trace(go.Scatter(name='Cases: <br>' + str(days) + '-Day Average',
                             x=df[x], y=df['cases_avg'], mode='lines',
                             line=dict(width=3, color=colors['case'])),
                  secondary_y=False)
    fig.add_trace(go.Scatter(name='Deaths: <br>' + str(days) + '-Day Average',
                             x=df[x], y=df['deaths_avg'], mode='lines',
                             line=dict(width=3, color=colors['death'])),
                  secondary_y=True)

    fig.update_yaxes(title_text='Cases', title_font=dict(color=colors['case']), secondary_y=False, nticks=5,
                     tickfont=dict(color=colors['case']), linewidth=2, linecolor='black', gridcolor='darkgray',
                     zeroline=False)
    fig.update_yaxes(title_text='Deaths', title_font=dict(color=colors['death']), secondary_y=True, nticks=5,
                     tickfont=dict(color=colors['death']), linewidth=2, linecolor='black', gridcolor='darkgray',
                     zeroline=False)

    fig.update_layout(title=title, height=400, width=700,
                      margin=dict(l=0, r=0, t=60, b=30), hovermode='x',
                      legend=dict(x=0.01, y=0.99, bordercolor='black', borderwidth=1, bgcolor='#93C47D',
                                  font=dict(family='arial', size=10)),
                      xaxis=dict(mirror=True, linewidth=2, linecolor='black', gridcolor='darkgray'),
                      plot_bgcolor='rgb(255,255,255)')
    return fig

In [29]:
fig = draw_graph(
    main_world,
    'Date',
    'DailyCases',
    'DailyDeaths',
    '<b>Worldwide: Daily Cases & Deaths</b><br>   With 7-Day Averages')

fig.show()

From this graph, we could see that both cases and deaths have grown exponentially started from Mar 2020 and until Jan 2021. It has a great decrease which I think vaccines have played an important role on that. However, the cases and deaths have started rising again around 8 Mar 2021.

## Trend Of COVID19 in Top 15 Countries 
In this section, I will try to find out TOP 15 affected Countries. Actually, the Confirmed and Deaths are the cummulative sums till date. Therefore, I will find the TOP 15 countries by using the country data of last date.

In [15]:
last_Date = main.Date.max()
main_CountryRegion = main[main["Date"]==last_Date]
main_CountryRegion = main_CountryRegion.groupby("CountryRegion",as_index=False)["Confirmed","Deaths"].sum()
main_CountryRegion = main_CountryRegion.nlargest(15,"Confirmed")

In [16]:
top_trend = main.groupby(["Date","CountryRegion"],as_index=False)["Confirmed","Deaths"].sum()
top_trend = top_trend.merge(main_CountryRegion,on="CountryRegion")

top_trend.drop(["Confirmed_y","Deaths_y"],axis=1,inplace=True)
top_trend.rename(columns={ "Confirmed_x" : "Cases",
                           "Deaths_x" : "Deaths" },inplace=True)

In [17]:
top_trend["log(Cases)"] = np.log(top_trend["Cases"]+1)
top_trend["log(Deaths)"] = np.log(top_trend["Deaths"]+1)

In [18]:
fig_Cases_Growth = px.line(top_trend, x = "Date", y = "Cases", color="CountryRegion",
                title = "COVID19 Total Cases Growth for top 15 Countries")
fig_Cases_Growth.update_layout(hovermode="closest", template="seaborn", width=800,
                    xaxis=dict(mirror=True, linewidth=2, linecolor="black",
                               showgrid=False),
                    yaxis=dict(mirror=True,linewidth=2,linecolor="black"))

fig_Cases_Growth.show()

In [19]:
fig_log_Cases_Growth = px.line(top_trend, x = "Date", y = "log(Cases)", color="CountryRegion",
                title = "COVID19 Total Cases Growth for top 15 Countries in log scale")
fig_log_Cases_Growth.update_layout(hovermode="closest", template="seaborn", width=800,
                    xaxis=dict(mirror=True, linewidth=2, linecolor="black",
                               showgrid=False),
                    yaxis=dict(mirror=True,linewidth=2,linecolor="black"))

fig_log_Cases_Growth.show()

In [20]:
fig_Deaths_Growth = px.line(top_trend, x = "Date", y = "Deaths", color="CountryRegion",
                title = "COVID19 Total Deaths Growth for top 15 Countries")
fig_Deaths_Growth.update_layout(hovermode="closest", template="seaborn", width=800,
                    xaxis=dict(mirror=True, linewidth=2, linecolor="black",
                               showgrid=False),
                    yaxis=dict(mirror=True,linewidth=2,linecolor="black"))

fig_Deaths_Growth.show()

In [21]:
fig_log_Deaths_Growth = px.line(top_trend, x = "Date", y = "log(Deaths)", color="CountryRegion",
                title = "COVID19 Total Deaths Growth for top 15 Countries")
fig_log_Deaths_Growth.update_layout(hovermode="closest", template="seaborn", width=800,
                    xaxis=dict(mirror=True, linewidth=2, linecolor="black",
                               showgrid=False),
                    yaxis=dict(mirror=True,linewidth=2,linecolor="black"))

fig_log_Deaths_Growth.show()

Below are my own analysis from the above line plots for the top 15 affect countries.

    Cases and Deaths for US, Brazil and India strongly increase over time.
    The cases and deaths are increasing which almost exponentially for the rest of the countries.
    Most of the affected countries are Western European Countries.
Actually, I have also plotted the top 15 affect countries graph in log scale which because it shows the analysis more specific and more representative.

## Mortaily Rate 
Next, I am going to calculate the mortality rate by using the number of deaths divided by the number of confirmed cases.

In [22]:
top_trend["Mortality Rate%"] = round((top_trend.Deaths/top_trend.Cases)*100,2)

fig_mortality_rate = px.line(top_trend,x="Date",y="Mortality Rate%",
                             color="CountryRegion",
                             title="Mortality Rate % \n (TOP 15 Countries)")

fig_mortality_rate.update_layout(hovermode='closest',
                                 template='seaborn',
                                 width=700,
                                 xaxis=dict(mirror=True,linewidth=2,linecolor='black',showgrid=False),
                                 yaxis=dict(mirror=True,linewidth=2,linecolor='black'))
fig_mortality_rate.show()

The result has been shown above. Which it is very interesting that Iran has 100% mortality rate during Feb 19, 2020.

## Specific Country Groth Of COVID19

## Hong Kong 
Actually the reason I have not picked the most affected countries because I think most of the people would choose them to do anaysis. Therefore I have picked my living place Hong Kong and also China to analysis.

In [23]:
main_hk = main.query("CountryRegion=='Hong Kong'")
main_hk = main_hk.groupby('Date',as_index=False)['Confirmed','Deaths','DailyCases','DailyDeaths'].sum()

In [24]:
fig_hk = draw_graph(
    main_hk,
    'Date',
    'DailyCases',
    'DailyDeaths',
    '<b>HongKong: DailyCases & Deaths</b><br>   With 7-Day averages')

fig_hk.show()

With comparing the number of cases and deaths with US, Brazil and India,Hong Kong seems not much cases and deaths. It is just because the population of Hong Kong is not many as big Region such as US, Brazil and India.

In [25]:
main_plot = main.groupby(['Date','CountryRegion','ProvinceState'],
                     as_index=False)['Confirmed','Deaths'].sum()

## Mainland China

In [26]:
main_china = main_plot.query("CountryRegion=='China'")
fig_china_cases = px.line(main_china,
                    x='Date',
                    y="Confirmed",
                    color='ProvinceState',
                    title='Total Cases Growth For China')
fig_china_cases.update_layout(hovermode='closest',
                        template='seaborn',
                        width=1200,
                        xaxis=dict(mirror=True,linewidth=2,linecolor='black',showgrid=False),
                        yaxis=dict(mirror=True,linewidth=2,linecolor='black'))

fig_china_cases.show()

In [27]:
fig_china_deaths = px.line(main_china,
                    x='Date',
                    y="Deaths",
                    color='ProvinceState',
                    title='Total Deaths Growth For China')
fig_china_deaths.update_layout(hovermode='closest',
                        template='seaborn',
                        width=1200,
                        xaxis=dict(mirror=True,linewidth=2,linecolor='black',showgrid=False),
                        yaxis=dict(mirror=True,linewidth=2,linecolor='black'))
fig_china_deaths.show()

From the graph above, we could conclude that almost all cases are from Huabei Province

## Ending 
Actually, I am just doing some simple data visualizaiton on COVID19 dataset, which give a brief current situation how COVID19 spread around the world and affect the world. From every graphs above, COVID19 has already spread around the world and causes many deaths