## COVID19 Analysis

### Akshita Mishra

We are focusing on India COVID19 data for our Exploratory Analysis effort here.


In [1]:
# We are using plotly library. 
# So if it is not installed in our environment, then we have to first install it using the pip command.

#!pip install plotly

Collecting plotly
[?25l  Downloading https://files.pythonhosted.org/packages/1f/f6/bd3c17c8003b6641df1228e80e1acac97ed8402635e46c2571f8e1ef63af/plotly-4.14.3-py2.py3-none-any.whl (13.2MB)
[K     |████████████████████████████████| 13.2MB 9.4MB/s eta 0:00:01
[?25hCollecting retrying>=1.3.3 (from plotly)
  Downloading https://files.pythonhosted.org/packages/44/ef/beae4b4ef80902f22e3af073397f079c96969c69b2c7d52a57ea9ae61c9d/retrying-1.3.3.tar.gz
Building wheels for collected packages: retrying
  Building wheel for retrying (setup.py) ... [?25ldone
[?25h  Created wheel for retrying: filename=retrying-1.3.3-cp37-none-any.whl size=11429 sha256=e30ebe8856ed23f296df9f7e97971b208b72ee5e1b869b14a91a4575581a83e2
  Stored in directory: /Users/kamalmishra/Library/Caches/pip/wheels/d7/a9/33/acc7b709e2a35caa7d4cae442f6fe6fbf2c43f80823d46460c
Successfully built retrying
Installing collected packages: retrying, plotly
Successfully installed plotly-4.14.3 retrying-1.3.3


### Import all libraries required

In [61]:
import pandas as pd
import plotly.express as px
import os
from matplotlib import pyplot as plt

%matplotlib inline

In [62]:
os.getcwd()

'/Users/kamalmishra/Python_Akshita'

### Get input data

We have used Kaggle data reference for COVID19 dataset.
Our DATE range is from Jan/2020 to Apr/2021.


In [63]:
df = pd.read_csv("/Users/kamalmishra/Desktop/Akshita_GEAR/COVID/covid_19_india.csv")

df.head()

Unnamed: 0,Sno,Date,Time,State/UnionTerritory,ConfirmedIndianNational,ConfirmedForeignNational,Cured,Deaths,Confirmed
0,1,2020-01-30,6:00 PM,Kerala,1,0,0,0,1
1,2,2020-01-31,6:00 PM,Kerala,1,0,0,0,1
2,3,2020-02-01,6:00 PM,Kerala,2,0,0,0,2
3,4,2020-02-02,6:00 PM,Kerala,3,0,0,0,3
4,5,2020-02-03,6:00 PM,Kerala,3,0,0,0,3


In [64]:
start_date = min(df["Date"])
end_date = max(df["Date"])

print("The COVID19 data we have is between: ", start_date, '-', end_date)

The COVID19 data we have is between:  2020-01-30 - 2021-04-28


### Analysis - Confirmed Cases

In [65]:
confirmed = df.groupby("State/UnionTerritory")["Confirmed"].sum().reset_index()

In [66]:
px.bar(x=confirmed.nlargest(39,"Confirmed")["State/UnionTerritory"],y = confirmed.nlargest(39,"Confirmed")["Confirmed"],color_discrete_sequence=px.colors.qualitative.Dark2,title="All 38 states/union territories which have Confirmed Cases")

### Analysis - Deaths

In [67]:
deaths = df.groupby("State/UnionTerritory")["Deaths"].sum().reset_index()

In [68]:
px.bar(x=deaths.nlargest(39,"Deaths")["State/UnionTerritory"],y = deaths.nlargest(39,"Deaths")["Deaths"],color_discrete_sequence=px.colors.qualitative.Dark2,title="All 39 states/union territories which have Deaths confirmed")

### TreeMap - Confirmed Cases 

In [69]:
px.treemap(confirmed,path = ["State/UnionTerritory"],values = "Confirmed",title = "Overall confirmed cases")

In [70]:
df.shape[1]

9

### State wise analysis

We will try to format the DATE column. Then, we will add another new column called "Death_Percentage" to track % of death cases by State/UT. Once we get the data into a dataframe, then we will put it togther into a chart.

In [73]:
df['Date'] = df['Date'].astype('datetime64[ns]')
df['Date'] =  pd.to_datetime(df['Date'] ,format ='%d-%m-%Y')
df.head()

Unnamed: 0,Sno,Date,Time,State/UnionTerritory,ConfirmedIndianNational,ConfirmedForeignNational,Cured,Deaths,Confirmed
0,1,2020-01-30,6:00 PM,Kerala,1,0,0,0,1
1,2,2020-01-31,6:00 PM,Kerala,1,0,0,0,1
2,3,2020-02-01,6:00 PM,Kerala,2,0,0,0,2
3,4,2020-02-02,6:00 PM,Kerala,3,0,0,0,3
4,5,2020-02-03,6:00 PM,Kerala,3,0,0,0,3


In [75]:
state_wise = df.groupby('State/UnionTerritory')['Confirmed','Cured','Deaths'].sum().reset_index()
state_wise["Death_percentage"] = ((state_wise["Deaths"] / state_wise["Confirmed"]) * 100)

# We can explore/try various options of cmap / color map attribute
# Try cmap as PuBu / viridis / plasma / inferno / cividis / magma for various flavor of visualizations
state_wise.style.background_gradient(cmap='plasma')

Unnamed: 0,State/UnionTerritory,Confirmed,Cured,Deaths,Death_percentage
0,Andaman and Nicobar Islands,1189952,1122556,15123,1.27089
1,Andhra Pradesh,210789590,199693140,1722554,0.817191
2,Arunachal Pradesh,3671031,3420792,11102,0.302422
3,Assam,52286878,48905420,237440,0.45411
4,Bihar,61519210,57902184,336514,0.547006
5,Cases being reassigned to states,345565,0,0,0.0
6,Chandigarh,4690884,4264128,69824,1.4885
7,Chhattisgarh,63339141,56340704,730776,1.15375
8,Dadra and Nagar Haveli and Daman and Diu,898744,840759,602,0.0669824
9,Daman & Diu,2,0,0,0.0


### Monthwise Analysis

Now, we will take historical dataset of approximately 15 months and analyse that by month.

In [77]:
month_wise = df.groupby(pd.Grouper(key='Date',freq='M')).sum()

month_wise = month_wise.drop(['Sno'], axis = 1)
month_wise['Date'] = month_wise.index

first_column = month_wise.pop('Date')
month_wise.insert(0, 'Date', first_column)

index = [x for x in range(len(month_wise))]
month_wise['index'] = index
month_wise = month_wise.set_index('index')

second_column = month_wise.pop('Confirmed')
month_wise.insert(1, 'Confirmed', second_column)
month_wise["Death_percentage"] = ((month_wise["Deaths"] / month_wise["Confirmed"]) * 100)
month_wise.style.background_gradient(cmap='twilight_shifted')

Unnamed: 0_level_0,Date,Confirmed,Cured,Deaths,Death_percentage
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,2020-01-31 00:00:00,2,0,0,0.0
1,2020-02-29 00:00:00,86,0,0,0.0
2,2020-03-31 00:00:00,9687,808,202,2.08527
3,2020-04-30 00:00:00,422442,75443,13270,3.14126
4,2020-05-31 00:00:00,2938234,1133341,89834,3.05741
5,2020-06-30 00:00:00,10558374,5668946,319690,3.02783
6,2020-07-31 00:00:00,31726501,19980130,793511,2.5011
7,2020-08-31 00:00:00,80749620,58580895,1553468,1.92381
8,2020-09-30 00:00:00,149113758,118592934,2443374,1.6386
9,2020-10-31 00:00:00,226770312,198824412,3457615,1.52472


### Monthwise - Confimed Cases

In [78]:
fig = px.bar(month_wise, x='Date', y='Confirmed',
             hover_data=['Cured', 'Deaths'], color='Date',
             labels={'Date':'Date(monthwise)'}, height=600,
             title="Monthwise Increase in Confirmed cases")
fig.show()

### Monthwise - Cured Cases 

In [79]:
fig = px.bar(month_wise, x='Date', y='Cured',
             hover_data=['Confirmed', 'Deaths'], color='Date',
             labels={'Date':'Date(monthwise)'}, height=600,
             title="Monthwise Increase in Cured cases")
fig.show()

### Monthwise - Death Cases 

In [80]:
fig = px.bar(month_wise, x='Date', y='Deaths',
             hover_data=['Cured', 'Confirmed'], color='Date',
             labels={'Date':'Date(monthwise)'}, height=600,
             title="Monthwise Increase in Deaths cases")
fig.show()

This an attmpt to leverage some of the visualization libraries in Python to understand inights using latest COVID19 dataset for India.

### End of notebook