# Covid 19 Data Driven Case Study 

 ## Data Sources :
 
 - Kaggle  : https://www.kaggle.com/sudalairajkumar/covid19-in-india 
 - kaggle  : https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset
 - DataHub : https://datahub.io/core/covid-19#python (based on)

## Importing Libraries :

In [98]:
import pandas as pd
import numpy as np
from io import StringIO
import requests

# Visualisation libraries
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as pyo

# Disable warnings 
import warnings
warnings.filterwarnings('ignore')

## Importing Data : 

In [100]:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}



url="https://datahub.io/core/covid-19/r/worldwide-aggregated.csv"

s=requests.get(url, headers= headers).text

df_world=pd.read_csv(StringIO(s)) ## worldwide-aggregated 



url="https://datahub.io/core/covid-19/r/countries-aggregated.csv"
s=requests.get(url, headers= headers).text

df_con=pd.read_csv(StringIO(s)) ## countries-aggregated




url="https://datahub.io/core/covid-19/r/time-series-19-covid-combined.csv"
s=requests.get(url, headers= headers).text

df_t=pd.read_csv(StringIO(s)) ## time series




## Data Understanding 
### using the below functions : 
#### shape
#### head
#### tail
#### info
#### describe

In [101]:
df_world.shape


(86, 5)

In [102]:
df_con.shape


(15910, 5)

In [103]:
df_t.shape

(22704, 8)

In [104]:
df_world.head()

Unnamed: 0,Date,Confirmed,Recovered,Deaths,Increase rate
0,2020-01-22,555,28,17,
1,2020-01-23,654,30,18,17.837838
2,2020-01-24,941,36,26,43.883792
3,2020-01-25,1434,39,42,52.391073
4,2020-01-26,2118,52,56,47.698745


In [105]:
df_world.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 86 entries, 0 to 85
Data columns (total 5 columns):
Date             86 non-null object
Confirmed        86 non-null int64
Recovered        86 non-null int64
Deaths           86 non-null int64
Increase rate    85 non-null float64
dtypes: float64(1), int64(3), object(1)
memory usage: 3.4+ KB


In [106]:
df_con.tail()

Unnamed: 0,Date,Country,Confirmed,Recovered,Deaths
15905,2020-04-16,West Bank and Gaza,374,63,2
15906,2020-04-16,Western Sahara,6,0,0
15907,2020-04-16,Yemen,1,0,0
15908,2020-04-16,Zambia,48,30,2
15909,2020-04-16,Zimbabwe,23,1,3


In [107]:
df_con.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15910 entries, 0 to 15909
Data columns (total 5 columns):
Date         15910 non-null object
Country      15910 non-null object
Confirmed    15910 non-null int64
Recovered    15910 non-null int64
Deaths       15910 non-null int64
dtypes: int64(3), object(2)
memory usage: 621.6+ KB


In [108]:
df_t.tail()

Unnamed: 0,Date,Country/Region,Province/State,Lat,Long,Confirmed,Recovered,Deaths
22699,2020-04-12,Zimbabwe,,-20.0,30.0,14.0,0.0,3.0
22700,2020-04-13,Zimbabwe,,-20.0,30.0,17.0,0.0,3.0
22701,2020-04-14,Zimbabwe,,-20.0,30.0,17.0,0.0,3.0
22702,2020-04-15,Zimbabwe,,-20.0,30.0,23.0,1.0,3.0
22703,2020-04-16,Zimbabwe,,-20.0,30.0,23.0,1.0,3.0


In [109]:
df_t.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22704 entries, 0 to 22703
Data columns (total 8 columns):
Date              22704 non-null object
Country/Region    22704 non-null object
Province/State    7052 non-null object
Lat               22704 non-null float64
Long              22704 non-null float64
Confirmed         22618 non-null float64
Recovered         21500 non-null float64
Deaths            22618 non-null float64
dtypes: float64(5), object(3)
memory usage: 1.4+ MB


## Issues :

- df_world
   - It has null value in increase rate
   - Format date col
- df_con
     - Format date col 
  

## Solution :

## Define : It has null value in increase rate in df_world :

 ## Code :

In [110]:
df_world['Increase rate'].fillna(0, inplace = True)

## Test :

In [111]:
df_world.head(1)

Unnamed: 0,Date,Confirmed,Recovered,Deaths,Increase rate
0,2020-01-22,555,28,17,0.0


## Define : Format date col of df_world and df_con

## Code :

In [112]:
df_world['Date'] = pd.to_datetime(df_world['Date'])
df_con['Date'] = pd.to_datetime(df_con['Date'])

## Test :

In [113]:
df_con.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15910 entries, 0 to 15909
Data columns (total 5 columns):
Date         15910 non-null datetime64[ns]
Country      15910 non-null object
Confirmed    15910 non-null int64
Recovered    15910 non-null int64
Deaths       15910 non-null int64
dtypes: datetime64[ns](1), int64(3), object(1)
memory usage: 621.6+ KB


In [114]:
df_world.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 86 entries, 0 to 85
Data columns (total 5 columns):
Date             86 non-null datetime64[ns]
Confirmed        86 non-null int64
Recovered        86 non-null int64
Deaths           86 non-null int64
Increase rate    86 non-null float64
dtypes: datetime64[ns](1), float64(1), int64(3)
memory usage: 3.4 KB


## Analysis Topics :

- Current Situation.
- Calculation of Recovered Col.
- Calculation of Mortality Rate col.
- Function which gives 10 countries names with perticular day's Confirmed, Active, Deaths cases world wide as a  return.
- Visualize Day by Day Overall Recoards.
- Visualize Day by Day Mortality Rate Percentage.
- Visualize Country Wise Recover.
- Visualize Country Wise Confirmed Cases.
- Visualize Country Wise death cases.


## Define : Current Situation

## Code : 

In [115]:
s = df_world.shape[0]

In [116]:
m_1 = df_world.iloc[s-1:]

## Test :

In [117]:
m_1['Confirmed'].values

array([2152646], dtype=int64)

In [118]:
m_1['Recovered'].values

array([542107], dtype=int64)

In [119]:
m_1['Deaths'].values

array([143800], dtype=int64)

## Define :  Calculation of Recovered Col,Calculation of Mortality Rate col.

## Code :

In [120]:
a = df_world['Confirmed'] - df_world['Recovered'] - df_world['Deaths']## active cases

m = ((df_world['Deaths']/df_world['Confirmed'])*100) ## mortality rate

In [121]:
df_world['Active'] = a
df_world['Mortality_rate'] = m

## Test : 

In [122]:
df_world.shape

(86, 7)

In [123]:
df_world.head()

Unnamed: 0,Date,Confirmed,Recovered,Deaths,Increase rate,Active,Mortality_rate
0,2020-01-22,555,28,17,0.0,510,3.063063
1,2020-01-23,654,30,18,17.837838,606,2.752294
2,2020-01-24,941,36,26,43.883792,879,2.763018
3,2020-01-25,1434,39,42,52.391073,1353,2.92887
4,2020-01-26,2118,52,56,47.698745,2010,2.644004


## Define : Function which gives 10 countries names with perticular day's Confirmed, Active, Deaths cases world wide as a return.

## Code :

In [124]:
def cal(a):
    m_2 = df_con[df_con['Date']==a]
    print(m_2.tail(10))

In [125]:
cal('2020-04-04')

            Date             Country  Confirmed  Recovered  Deaths
13680 2020-04-04      United Kingdom      42477        215    4320
13681 2020-04-04             Uruguay        400         93       5
13682 2020-04-04          Uzbekistan        266         25       2
13683 2020-04-04           Venezuela        155         52       7
13684 2020-04-04             Vietnam        240         90       0
13685 2020-04-04  West Bank and Gaza        217         21       1
13686 2020-04-04      Western Sahara          0          0       0
13687 2020-04-04               Yemen          0          0       0
13688 2020-04-04              Zambia         39          2       1
13689 2020-04-04            Zimbabwe          9          0       1


In [126]:

def calu(a,b):
    m_4 = df_con.copy()
    m_2 = m_4['Date']==a
    m_3 = m_4['Country']==b
    m_4 = m_4[m_2 & m_3]
    print(m_4)

## Test : 

In [127]:
cal('2020-04-04')

            Date             Country  Confirmed  Recovered  Deaths
13680 2020-04-04      United Kingdom      42477        215    4320
13681 2020-04-04             Uruguay        400         93       5
13682 2020-04-04          Uzbekistan        266         25       2
13683 2020-04-04           Venezuela        155         52       7
13684 2020-04-04             Vietnam        240         90       0
13685 2020-04-04  West Bank and Gaza        217         21       1
13686 2020-04-04      Western Sahara          0          0       0
13687 2020-04-04               Yemen          0          0       0
13688 2020-04-04              Zambia         39          2       1
13689 2020-04-04            Zimbabwe          9          0       1


In [128]:
calu('2020-04-04','India')

            Date Country  Confirmed  Recovered  Deaths
13583 2020-04-04   India       3082        229      86


## Define : Visualize Day by Day Overall Recoards.

## Code :

In [129]:
trace = go.Scatter(x=df_world['Date'],y=df_world['Confirmed'],mode='lines+markers',marker={'color': '#000000'},name = 'Confirmed')
trace1 = go.Scatter(x=df_world['Date'],y=df_world['Active'],mode='lines+markers',marker={'color': '#FFFF00'},name = 'Active Cases')
trace2 = go.Scatter(x=df_world['Date'],y=df_world['Recovered'],mode='lines+markers',marker={'color': '#FF0000'},name = 'Recovered')
trace3 = go.Scatter(x=df_world['Date'],y=df_world['Deaths'],mode='lines+markers',marker={'color': '#FF00FF'},name = 'Deaths')
data=[trace,trace1,trace2,trace3]

layout = go.Layout(title = 'Mortality Rate Percentage',xaxis={'title':'Date'},yaxis={'title':'Mortality Rate'})
fig = go.Figure(data=data,layout=layout)


## Test :

In [130]:
pyo.iplot(fig)

## Define : Visualize Day by Day Mortality Rate Percentage.

## Code :

In [131]:
trace = go.Bar(x=df_world['Date'],y=df_world['Mortality_rate'])
data=[trace]
layout = go.Layout(title = 'Mortality Rate Percentage',xaxis={'title':'Date'},yaxis={'title':'Mortality Rate'})
fig = go.Figure(data=data,layout=layout)


## Test :

In [132]:
pyo.iplot(fig)

## Define : Visualize Country Wise Recover.

## Code :

In [133]:
trace = go.Heatmap(x=df_con['Country'],y=df_con['Date'], z = df_con['Recovered'],colorscale='Viridis')

layout = go.Layout(title = 'Country Wise Recover')
fig = go.Figure(data=[trace],layout=layout)


## Test :

In [134]:
pyo.iplot(fig)

## Define : Visualize Country Wise Confirmed Cases.

## Code :

In [137]:
df_c = df_con.copy()

In [139]:
df_c.drop(columns=['Date'],inplace=True)

In [141]:
df_c.drop_duplicates(subset='Country',keep='last',inplace =True)

In [143]:
trace=dict(type='choropleth',
          locations=df_c['Country'],
          locationmode='country names',
          autocolorscale=False,
          colorscale='Rainbow',
          marker=dict(line=dict(color='rgb(255,255,255)',width=1)),
          z = df_c['Confirmed'],colorbar={'title':'Color Range','len':1.1})
          


data=[trace]
layout = go.Layout(title = "Today's View")
fig = go.Figure(data=[trace],layout=layout)

## Test :

In [144]:
pyo.iplot(fig)

## Define : Visualize Country Wise Death Cases.

## Code :

In [145]:
trace=dict(type='choropleth',
          locations=df_c['Country'],
          locationmode='country names',
          autocolorscale=False,
          colorscale='Rainbow',
          marker=dict(line=dict(color='rgb(255,255,255)',width=1)),
          z = df_c['Deaths'],colorbar={'title':'Color Range','len':1.1})
          


data=[trace]
layout = go.Layout(title = "Today's View")
fig = go.Figure(data=[trace],layout=layout)

## Test :

In [146]:
pyo.iplot(fig)