#                            COVID - 19 Pandemic

**Corona virus is a new virus which is causing havoic and panic among all the nations in the world**

*I have made a small attempt to bring some of the insights about the corona virus cases of some of the countries where there is huge loss of life. I have used plotly and seaborn visualisation tools for bringing out the insights in a visual way.*

****The main purpose of this notebook is to visualize the trends of death, recovery in various countries.

By : [Akshay Sb](www.linkedin.com/in/akshay-sb-878990a0)

# Please do upvote, if you find it useful
# Please do let me know, if anything need to be updated.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
pio.templates.default = "seaborn"
from plotly.subplots import make_subplots

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.


In [None]:
df = pd.read_csv("../input/corona-virus-report/covid_19_clean_complete.csv", parse_dates = ['Date'])
df.head()

In [None]:
df.info()

There are 5 numerical variables (Lat, Long, Confirmed, Deaths,Recovered), 2 categorical variables (Province/State and Country/Region)

In [None]:
df.rename(columns={'Province/State':'State','Country/Region':'Country'},inplace=True)
df.describe(include='object')

As per the dataset, COVID-19 pandemic has spread across 177 countries

In [None]:
date=df.Date.value_counts().sort_index()
print('First date:',date.index[0])
print('Last date:',date.index[-1])

This dataset has data between 22nd January, 2020 to 28th March, 2020

# **Let us try to find the no. of active cases**

In [None]:
df['Active'] = df['Confirmed'] - df['Deaths'] - df['Recovered']

In [None]:
df.describe()

# Data Cleaning

**Here, we can see that minimum of active cases is -6, which does not make any sense, let us have look at those records in our data**

In [None]:
df[df['Active']==df['Active'].min()]

**We can observe that out of 168 confirmed cases, there are 6 deaths, there might be wrong data entry in Recoverd column, let us change that and proceed**

In [None]:
df.loc[(df['Active']==-6),'Recovered']=162
# Let us once again find active cases
df['Active'] = df['Confirmed'] - df['Deaths'] - df['Recovered']

In [None]:
df.describe()

**We can see that still minimum active cases is -1 which is of no sense, let us see the rows and change them accordingly.**

In [None]:
df[df['Active']==df['Active'].min()]

In [None]:
df.loc[(df['Active']==-1),'Confirmed']=1
# Let us once again find active cases
df['Active'] = df['Confirmed'] - df['Deaths'] - df['Recovered']

In [None]:
df.describe()

# Data Visualization

In this section, we will try to visualize the data, we will have a look at the countries which have most number of deaths,active cases through plots, we will also look at how the pandemic spread initially by static plots and then we will plot the dynamic plot

In [None]:
latest=df[df['Date']==df['Date'].max()]
active_country=latest.groupby('Country')['Confirmed','Deaths','Recovered','Active'].sum().reset_index()
active_country.sort_values(by='Active',ascending=False,inplace=True)
active_country.head()

**As of March 28th 2020, USA has highest number of active cases followed by Italy,Germany and Spain **

***Let us have look at top 20 countries by active cases***

In [None]:
active_top_20=active_country.head(20)

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(15,20))
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlabel('Active Cases',fontsize=25)
plt.ylabel('Countries',fontsize=25)
plt.title('Top 20 Countries with Active Cases',fontsize=50)
ax=sns.barplot(x=active_top_20['Active'],y=active_top_20['Country'])
for i, (value, name) in enumerate(zip(active_top_20['Active'], active_top_20['Country'])):
    ax.text(value, i-.05, f'{value:,.0f}',  size=10, ha='left',  va='center')
ax.set(xlabel='Active Cases',ylabel='Countries')

From above plot, it is evident that as on 28th March 2020, US has highest number of cases followed by Italy, Spain, Germany and France.

**Let us visualize top 20 countries with most number of Confirmed cases**

In [None]:
active_country.sort_values('Confirmed',ascending=False,inplace=True)
con_top_20=active_country.head(20)
con_top_20.head()

In [None]:
plt.figure(figsize= (15,10))
plt.xticks(fontsize = 15)
plt.yticks(fontsize = 15)
plt.xlabel("Total Confirmed cases",fontsize = 25)
plt.ylabel('Country',fontsize = 25)
plt.title("Top 20 countries with Confirmed cases" , fontsize = 50)
ax = sns.barplot(x = con_top_20.Confirmed, y = con_top_20.Country)
for i, (value, name) in enumerate(zip(con_top_20.Confirmed,con_top_20.Country)):
    ax.text(value, i-.05, f'{value:,.0f}',  size=10, ha='left',  va='center')
ax.set(xlabel='Confirmed cases', ylabel='Country')

**As per 28th March, US has highest number of confirmed cases, followed by Italy, China, Spain and Germany**

**Let us have a look at top 20 countries with most of number of deaths**

In [None]:
active_country.sort_values('Deaths',ascending=False,inplace=True)
death_top_20=active_country.head(20)
death_top_20.head()

In [None]:
plt.figure(figsize= (15,10))
plt.xticks(fontsize = 15)
plt.yticks(fontsize = 15)
plt.xlabel("Total cases",fontsize = 25)
plt.ylabel('Country',fontsize = 25)
plt.title("Top 20 countries with Deaths" , fontsize = 50)
ax = sns.barplot(x = death_top_20.Deaths, y = death_top_20.Country)
for i, (value, name) in enumerate(zip(death_top_20.Deaths,death_top_20.Country)):
    ax.text(value, i-.05, f'{value:,.0f}',  size=10, ha='left',  va='center')
ax.set(xlabel='Total Deaths', ylabel='Country')

1. **As of 28th March 2020, even though US has highest active cases, Italy has highest death tally till now, followed by Spain, China, Iran and France**
2. **Even though China has huge number of confirmed cases, number of deaths is low compared to Italy and Spain**

In [None]:
active_country.sort_values('Recovered',ascending=False,inplace=True)
rec_top_20=active_country.head(20)
rec_top_20.head()

In [None]:
plt.figure(figsize= (15,10))
plt.xticks(fontsize = 15)
plt.yticks(fontsize = 15)
plt.xlabel("Total Recovered",fontsize = 25)
plt.ylabel('Country',fontsize = 25)
plt.title("Top 20 countries with Recoveries" , fontsize = 50)
ax = sns.barplot(x = rec_top_20.Recovered, y = rec_top_20.Country)
for i, (value, name) in enumerate(zip(rec_top_20.Recovered,rec_top_20.Country)):
    ax.text(value, i-.05, f'{value:,.0f}',  size=10, ha='left',  va='center')
ax.set(xlabel='Total Recovered', ylabel='Country')

**China has highest recovered patients, even though Italy is second in place, there is lot of difference between Italy and China.**

# Now Let us look at the rate of spreading of active cases for some countries

In [None]:
ch_df=df[df['Country']=='China']
ch_df.head()

In [None]:
plt.figure(figsize= (15,10))
plt.xticks(rotation = 90 ,fontsize = 10)
plt.yticks(fontsize = 15)
plt.xlabel("Dates",fontsize = 30)
plt.ylabel('Total cases',fontsize = 30)
plt.title("Confirmed Cases in China Over Time" , fontsize = 30)
total_cases_ch = ch_df.groupby('Date')['Date', 'Confirmed'].sum().reset_index()
total_cases_ch['Date'] = pd.to_datetime(total_cases_ch['Date'])


ax = sns.pointplot( x = total_cases_ch.Date.dt.date ,y = total_cases_ch.Confirmed , color = 'r')
ax.set(xlabel='Dates', ylabel='Total Confirmed cases')

Total confirmed cases were increasing rapidly from 22nd January 2020 to 19th February 2020, now (28/03/2020), there are almost no new confirmed cases with respect to China

In [None]:
it_df=df[df['Country']=='Italy']
it_df.head()

In [None]:
plt.figure(figsize= (15,10))
plt.xticks(rotation = 90 ,fontsize = 10)
plt.yticks(fontsize = 15)
plt.xlabel("Dates",fontsize = 30)
plt.ylabel('Total cases',fontsize = 30)
plt.title("Confirmed Cases in Italy Over Time" , fontsize = 30)
total_cases_it = it_df.groupby('Date')['Date', 'Confirmed'].sum().reset_index()
total_cases_it['Date'] = pd.to_datetime(total_cases_it['Date'])


ax = sns.pointplot( x = total_cases_it.Date.dt.date ,y = total_cases_it.Confirmed , color = 'b')
ax.set(xlabel='Dates', ylabel='Total Confirmed cases')

Confirmed cases in Italy are in opposite trend to that of China, even though first case was observed somewhere between 22 feb to 27 feb 2020,there was sudden increase in number of confirmed cases on 12th March 2020, and from then on it has never showed decreasing trend

In [None]:
us_df=df[df['Country']=='US']
us_df.head()

In [None]:
plt.figure(figsize= (15,10))
plt.xticks(rotation = 90 ,fontsize = 10)
plt.yticks(fontsize = 15)
plt.xlabel("Dates",fontsize = 30)
plt.ylabel('Total cases',fontsize = 30)
plt.title("Confirmed Cases in US Over Time" , fontsize = 30)
total_cases_us = us_df.groupby('Date')['Date', 'Confirmed'].sum().reset_index()
total_cases_us['Date'] = pd.to_datetime(total_cases_it['Date'])


ax = sns.pointplot( x = total_cases_us.Date.dt.date ,y = total_cases_us.Confirmed , color = 'g')
ax.set(xlabel='Dates', ylabel='Total Confirmed cases')

* During initial days of pandemic, there was rapid increase in China, which was center of Pandemic, but as days progressed, it was spread to other countries.

* Confirmed cases in Italy, US, France, Iran and Germany gradually increased.

* During late february, while confirmed cases started decreasing in China, it started to increase in other countries.

# Let us have look at Mortality and Recovery Rate of some Countries

In [None]:
rate_df = latest.groupby(by = 'Country')['Recovered','Confirmed','Deaths'].sum().reset_index()
rate_df['Recovery percentage'] =  round(((rate_df['Recovered']) / (rate_df['Confirmed'])) * 100 , 2)
rate_df['Death percentage'] =  round(((rate_df['Deaths']) / (rate_df['Confirmed'])) * 100 , 2)
rate_df.head()

In [None]:
rate_df.sort_values(by='Death percentage',ascending=False,inplace=True)
mortal_df=rate_df.head(20)
mortal_df.head()

In [None]:
plt.figure(figsize= (20,20))
plt.xticks(fontsize = 15)
plt.yticks(fontsize = 15)
plt.xlabel("Total cases",fontsize = 30)
plt.ylabel('Country',fontsize = 30)
plt.title("Top 20 countries having highest mortality rate" , fontsize = 30)
ax = sns.barplot(x = mortal_df['Death percentage'], y = mortal_df['Country'])
for i, (value, name) in enumerate(zip(mortal_df['Death percentage'], mortal_df['Country'])):
    ax.text(value, i-.05, f'{value:,.0f}',  size=10, ha='left',  va='center')
ax.set(xlabel='Mortality Rate in percentage', ylabel='Country')

* Even though there are less number of confirmed cases in African countries such as Gambia, Nicaragua, Cabo verde, Sudan and Zimbabwe, number of recoveries are also less resulting in higher mortality rate
* This higher mortality rate is mainly because of poor medical facilities available in under-developed countries'
* Even though there are deaths in thousands in countries like China, Itlay, Germany, Iran, it is very less compared to number of confirmed cases.

**Let us look at recovery rate**

In [None]:
rate_df.sort_values(by='Recovery percentage',ascending=False,inplace=True)
rec_df=rate_df.head(20)
rec_df.head()

In [None]:
plt.figure(figsize= (20,20))
plt.xticks(fontsize = 15)
plt.yticks(fontsize = 15)
plt.xlabel("Total cases",fontsize = 30)
plt.ylabel('Country',fontsize = 30)
plt.title("Top 20 countries having highest Recovery rate" , fontsize = 30)
ax = sns.barplot(x = rec_df['Recovery percentage'], y = rec_df['Country'])
for i, (value, name) in enumerate(zip(rec_df['Recovery percentage'], rec_df['Country'])):
    ax.text(value, i-.05, f'{value:,.0f}',  size=10, ha='left',  va='center')
ax.set(xlabel='Recovery Rate in percentage', ylabel='Country')

* Saint Vincent and Grenadlines has recovery rate of 100% !! That's only because there was only one confirmed cases, which was recovered.
* China has excellent recovery rate of 92% in spite of large number of confirmed cases.
* Majority of the recovery rates are in between 30 - 20%.
* As mentioned earlier, recovery rate is low in some African countries because of poor medical facilities.

# Let us look at world wide spread using Dynamic Analysis.
**World Wide Spread Analysis**

We can use plotly visualisation tool for dynamic plotting.Python visualization tool that’s capable of handling geographical, scientific, statistical, and financial data. Plotly has several advantages over matplotlib. One of the main advantages is that only a few lines of codes are necessary to create aesthetically pleasing, interactive plots.

* It can reduce time required for data exploration.

In [None]:
df1 = df
df1['Date'] = pd.to_datetime(df1['Date'])
df1['Date'] = df1['Date'].dt.strftime('%m/%d/%Y')
df1 = df1.fillna('-')
fig = px.density_mapbox(df1, lat='Lat', lon='Long', z='Confirmed', radius=20,zoom=1, hover_data=["Country",'State',"Confirmed"],
                        mapbox_style="carto-positron", animation_frame = 'Date', range_color= [0, 1000],title='Spread of Covid-19')
fig.update_layout(margin={"r":0,"t":30,"l":0,"b":0})
fig.show()

**Spread of Covid-19 is very slow in the beginning, but as time progresses, it spreads at a rapid rate**

In [None]:
figure = px.choropleth(active_country, locations="Country", 
                    locationmode='country names', color="Active", 
                    hover_name="Country", range_color=[1,1000], 
                    color_continuous_scale="blues", 
                    title='Countries with Active Cases')
figure.show()

# Here are the inferences from above plots:
1. **As known, China was the center of spread of this pandemic.**
2. **Since, it is center, Confirmed rates were high in the initial days, and as the days passed, because of lock-downs the number of confirmed cases decreased.**
3. **SInce the pandemic was taken lightly by other countries in the initial days, it increased at a very rapid rate in Asian, European and American countries causing thousands of death.**
4. **It is a serious warning to other countries, if no precautions are taken, it will cause large deaths**

# Let us look at number of confirmed cases within China.
# We have considered China, because it was the origin of Pandemic

In [None]:
china=df[df['Country']=='China']
china.head()

In [None]:
ch=china.groupby('State')['Confirmed','Deaths','Recovered','Active'].sum().reset_index()
ch.head()

In [None]:
plt.figure(figsize=(20,20))
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlabel('No. of Confirmed Cases in China',fontsize=30)
plt.ylabel('States',fontsize=30)
plt.title('States having Active Cases in China', fontsize=30)
ax=sns.barplot(x=ch['Confirmed'],y=ch['State'])
for i, (value, name) in enumerate(zip(ch.Confirmed, ch.State)):
    ax.text(value, i-.05, f'{value:,.0f}',  size=10, ha='left',  va='center')
ax.set(xlabel='Total cases', ylabel='States')

1. **This plot gives interesting and questionable insight to viewers**
2. **There can be numerous inferences from this single plot**
3. **Within Chinese states, number of confirmed cases are highest in Hubei**
4. **Confirmed cases in other states of China are very very less compared to Hubei**
5. **China did excellent job of locking down the infected persons from entering other states**
6. **Sadly, it did not succeed in preventing infected people travelling to different countries**

# Let us look at COVID-19 developments in India

In [None]:
ind_df=df[df['Country']=='India']
ind_df.head()

In [None]:
plt.figure(figsize= (15,10))
plt.xticks(rotation = 90 ,fontsize = 10)
plt.yticks(fontsize = 15)
plt.xlabel("Dates",fontsize = 30)
plt.ylabel('Total cases',fontsize = 30)
plt.title("Confirmed Cases in India Over Time" , fontsize = 30)
total_cases_ind = ind_df.groupby('Date')['Date', 'Confirmed'].sum().reset_index()
total_cases_ind['Date'] = pd.to_datetime(total_cases_ind['Date'])


ax = sns.pointplot( x = total_cases_ind.Date.dt.date ,y = total_cases_ind.Confirmed , color = 'y')
ax.set(xlabel='Dates', ylabel='Total Confirmed cases')

1. **This plot looks scary, especially you are Indian.**
2. **Till March 23,2020, there were almost zero confirmed cases in India**
3. **From first week of March till 3rd week of March, there was steady increase in number of confirmed cases**
4. **From 19th march, there is rapid increase in number of confirmed cases**
5. **Even though lock down was announced earlier, but people have not taken it seriously**
6. **Similar pattern is being observed as European countries, if not taken seriously, India may have large number of deaths by mid-April**

In [None]:
plt.figure(figsize= (15,10))
plt.xticks(rotation = 90 ,fontsize = 10)
plt.yticks(fontsize = 15)
plt.xlabel("Dates",fontsize = 30)
plt.ylabel('Total deaths',fontsize = 30)
plt.title("Deaths in India Over Time" , fontsize = 30)
total_deaths_ind = ind_df.groupby('Date')['Date', 'Deaths'].sum().reset_index()
total_deaths_ind['Date'] = pd.to_datetime(total_deaths_ind['Date'])


ax = sns.pointplot( x = total_deaths_ind.Date.dt.date ,y = total_deaths_ind.Deaths , color = 'y')
ax.set(xlabel='Dates', ylabel='Total Deaths')

1. **Even though death number is very low, but the trend is very alarming**
2. **First death was on 11th March, 2020**
3. **Within next 18 days, death toll has reached 25**

In [None]:
plt.figure(figsize= (15,10))
plt.xticks(rotation = 90 ,fontsize = 10)
plt.yticks(fontsize = 15)
plt.xlabel("Dates",fontsize = 30)
plt.ylabel('Total Recoveries',fontsize = 30)
plt.title("Recoveries in India Over Time" , fontsize = 30)
total_rec_ind = ind_df.groupby('Date')['Date', 'Recovered'].sum().reset_index()
total_rec_ind['Date'] = pd.to_datetime(total_rec_ind['Date'])


ax = sns.pointplot( x = total_rec_ind.Date.dt.date ,y = total_rec_ind.Recovered , color = 'y')
ax.set(xlabel='Dates', ylabel='Total Recoveries')

1. **This is some positive visuals despite lots of unrest inthe country**
2. **There is rapid increasing trend in number of people who are recovering**

# Final Remarks

1. **COVID-19 is a pandemic which is spreading at global level**
2. **Even though it is spreading at alarming rate, it can be contained by few simple steps**

# * Wash your hands

# * Please maintain social distancing

# * Contact health care officials if symptoms are observed and do not panic 