# Covid - 19 statistics in india
**So what is Covid-19?**
Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The disease was first identified in December 2019 in Wuhan, the capital of China's Hubei province, and has since spread globally, resulting in the ongoing 2019–20 coronavirus pandemic. Common symptoms include fever, cough and shortness of breath. Other symptoms may include fatigue, muscle pain, diarrhea, sore throat, loss of smell and abdominal pain. The time from exposure to onset of symptoms is typically around five days, but may range from two to 14 days. While the majority of cases result in mild symptoms, some progress to viral pneumonia and multi-organ failure. As of 9 April 2020, more than 1.48 million cases have been reported in more than 200 countries and territories,[16] resulting in more than 88,600 deaths. More than 331,000 people have recovered.
Source: Wikipedia

# What am I going to see?
I have visualized the data of Covid - 19 in from of tables, pie charts, tremaps, bar graphs etc. to give everyone in the country an idea about the coronavirus disease. This data is going to be automatically updated. If you like and appreciate my hard work please leave a comment below. To know more about yor state, double - click on your state in the state - wise graphs

# Timeline of Covid - 19 in India

In [1]:
from IPython.core.display import HTML
HTML('''<div class="flourish-embed flourish-cards" data-src="visualisation/1786965" data-url="https://flo.uri.sh/visualisation/1786965/embed"><script src="https://public.flourish.studio/resources/embed.js"></script></div>''')

# Libraries

In [2]:
import json
from datetime import timedelta
from urllib.request import urlopen

import numpy as np
import pandas as pd
import geopandas as gpd

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import folium

cnf = '#393e46'
dth = '#ff2e63'
rec = '#21bf73'
act = '#fe9801'

from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

ModuleNotFoundError: No module named 'geopandas'

In [None]:
from plotly.offline import plot, iplot, init_notebook_mode
init_notebook_mode(connected = True)

# Dataset

In [None]:
! ls ../input/covid19-corona-virus-india-dataset

In [None]:
# importing datasets
df = pd.read_csv('../input/covid19-corona-virus-india-dataset/complete.csv', parse_dates=['Date'])
df['Name of State / UT'] = df['Name of State / UT'].str.replace('Union Territory of ', '')
df.head()

In [None]:
df.columns

# Preprocessing

### Cleaning

In [None]:
df = df[['Date', 'Name of State / UT', 'Latitude', 'Longitude', 'Total Confirmed cases', 'Death', 'Cured/Discharged/Migrated']]
df.columns = ['Date', 'State/UT', 'Latitude', 'Longitude', 'Confirmed', 'Deaths', 'Cured']

for i in ['Confirmed', 'Deaths', 'Cured']:
    df[i] = df[i].astype('int')
    
df['Active'] = df['Confirmed'] - df['Deaths'] - df['Cured']
df['Mortality rate'] = df['Deaths']/df['Confirmed']
df['Recovery rate'] = df['Cured']/df['Confirmed']

df = df[['Date', 'State/UT', 'Latitude', 'Longitude', 'Confirmed', 'Active', 'Deaths', 'Mortality rate', 'Cured', 'Recovery rate']]

df

### Derived Tables

In [None]:
latest = df[df['Date']==max(df['Date'])]

# days
latest_day = max(df['Date'])
day_before = latest_day - timedelta(days = 1)

# state and total cases 
latest_day_df = df[df['Date']==latest_day].set_index('State/UT')
day_before_df = df[df['Date']==day_before].set_index('State/UT')

temp = pd.merge(left = latest_day_df, right = day_before_df, on='State/UT', suffixes=('_lat', '_bfr'), how='outer')
latest_day_df['New cases'] = temp['Confirmed_lat'] - temp['Confirmed_bfr']
latest = latest_day_df.reset_index()

In [None]:
print(latest_day, day_before)


# Table

In [None]:
temp = latest[['State/UT', 'Confirmed', 'Active', 'New cases', 'Deaths', 'Mortality rate',
              'Cured', 'Recovery rate']]
temp = temp.sort_values('Confirmed', ascending = False).reset_index(drop = True)

temp.style\
    .background_gradient(cmap="Blues", subset=['Confirmed', 'Active', 'New cases'])\
    .background_gradient(cmap="Greens", subset=['Cured', 'Recovery rate'])\
    .background_gradient(cmap="Reds", subset=['Deaths', 'Mortality rate'])

# Visualization of data of the country

In [None]:
fig = make_subplots(rows=1, cols=2, shared_xaxes=False, column_widths=[0.4, 0.6],
                    subplot_titles = ['Latest stats', 'Over the time'],
                    specs=[[{"type": "treemap"}, {"type": "bar"}]])

tm = temp.melt(id_vars="Date", value_vars=['Active', 'Deaths', 'Cured'])
fig_1 = px.treemap(tm, path=["variable"], values="value", height=540, width=800,
                 color_discrete_sequence=[rec, act, dth])
fig_1.data[0].textinfo = 'label+text+value'
fig.add_trace(fig_1['data'][0], row=1, col=1)

# fig.add_trace(go.Treemap(labels = tm['variable'], values = tm['value']),1,1)

temp = df.groupby('Date')['Active', 'Deaths', 'Cured'].sum().reset_index()
fig.add_trace(go.Bar(x=temp['Date'], y=temp['Active'], name='Active', marker_color=act), row=1, col=2)
fig.add_trace(go.Bar(x=temp['Date'], y=temp['Deaths'], name='Deaths', marker_color=dth), row=1, col=2)
fig.add_trace(go.Bar(x=temp['Date'], y=temp['Cured'], name='Cured', marker_color=rec), row=1, col=2)

fig.update_layout(barmode='stack')
fig.update_layout(treemapcolorway = [act, rec, dth])
fig.show()

# Map

In [None]:
m = folium.Map(location = [20.5937, 78.9629], tiles = 'cartodbpositron',
              min_zoom = 4, max_zoom = 6, zoom_start = 4)

for i in range(0, len(latest)):
    if latest.iloc[i]['Confirmed'] > 0:
        folium.Circle(
            location = [latest.iloc[i]['Latitude'], latest.iloc[i]
                       ['Longitude']],
            color = '#e84545',
            fill = '#e84545',
            tooltip = '<li><bold>Name of Sate / UT :' + str(latest.iloc[i]
                        ['State/UT']) + 
                        '<li><bold>Confirmed cases :' + str(latest.iloc[i]
                        ['Confirmed']),
            radius = int(latest.iloc[i]['Confirmed'])*300
        ).add_to(m)
m

# Race map

In [None]:
from plotly.offline import init_notebook_mode
import plotly.graph_objs as go
init_notebook_mode(connected=True)

temp = df.copy()
temp['Date'] = temp['Date'].dt.strftime('%Y/%m/%d')

fig = px.scatter_geo(temp, lat="Latitude", lon="Longitude", color='Confirmed', size='Confirmed', projection="natural earth",
                     hover_name="State/UT", scope='asia', animation_frame="Date", center={'lat':20, 'lon':78}, 
                     range_color=[0, max(temp['Confirmed'])])
fig.show()

# Visualized state and union territory wise report

### Bar graph comparison for total number of cases

In [None]:
temp = latest.sort_values('Confirmed', ascending=False)
state_order = temp['State/UT']

fig = px.bar(temp, 
             x="Confirmed", y="State/UT", color='State/UT', title='Confirmed',
             orientation='h', text='Confirmed', height=len(temp)*35,
             color_discrete_sequence = px.colors.qualitative.Vivid)
fig.show()

### Deaths vs Recovered

In [None]:
temp = latest[latest['Deaths']>0].sort_values('Deaths')
fig_d = px.bar(temp, y="State/UT", x="Deaths", orientation='h', title='Deaths', color_discrete_sequence = ['#ff2e63'], text='Deaths', height=len(temp)*40)

temp = latest[latest['Cured']>0].sort_values('Cured')
fig_r = px.bar(temp, y="State/UT", x="Cured", orientation='h', title='Cured', color_discrete_sequence = ['#2c786c'], text='Cured', height=len(temp)*40)

fig = make_subplots(rows=1, cols=2, shared_xaxes=False, horizontal_spacing=0.2, subplot_titles=("Deaths", "Recovered"))
fig.add_trace(fig_d['data'][0], row=1, col=1)
fig.add_trace(fig_r['data'][0], row=1, col=2)
fig.update_layout(height=800)
fig.show()

### Sate wise case over time

In [None]:
fig = px.bar(df.sort_values('Confirmed', ascending=False), x="Date", y="Confirmed", color='State/UT', title='State wise cases over time',
             color_discrete_sequence = px.colors.qualitative.Vivid)
fig.update_traces(textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()

### Mortality rate over time

In [None]:
fig = px.bar(df.sort_values('Mortality rate', ascending=False), x="Date", y="Mortality rate", color='State/UT', title='Mortality rate over time',
             color_discrete_sequence = px.colors.qualitative.Vivid)
fig.update_traces(textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()

### Recovery rate over time

In [None]:
fig = px.bar(df.sort_values('Recovery rate', ascending=False), x="Date", y="Recovery rate", color='State/UT', title='Recovery rate over time',
             color_discrete_sequence = px.colors.qualitative.Vivid)
fig.update_traces(textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()

### Detailed comparison of cases

In [None]:
temp = latest.sort_values('Confirmed', ascending=True)

fig = go.Figure(data=[
    go.Bar(name='Active', y=temp['State/UT'], x=temp['Active'], orientation='h'),
    go.Bar(name='Deaths', y=temp['State/UT'], x=temp['Deaths'], orientation='h'),
    go.Bar(name='Cured', y=temp['State/UT'], x=temp['Cured'], orientation='h')
])
# Change the bar mode
fig.update_layout(barmode='stack', height=900)
fig.update_traces(textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()

### State cases treemap

In [None]:
fig = px.treemap(latest, path=["State/UT"], values="Confirmed", height=700,
                 title='Number of confirmed cases from each states', color_discrete_sequence = px.colors.qualitative.Prism)
fig.data[0].textinfo = 'label+text+value'
fig.show()

### Number of affected states and union teritorries over time

In [None]:
no_of_states = df.groupby('Date')['State/UT'].unique().apply(len).values
dates = df.groupby('Date')['State/UT'].unique().apply(len).index

fig = go.Figure()

fig.add_trace(go.Scatter(x=dates, y=[36 for i in range(len(no_of_states))], 
                         mode='lines', name='Total no. of States+UT', 
                         line = dict(color='#393e46', dash='dash')))

fig.add_trace(go.Scatter(x=dates, y=no_of_states, hoverinfo='x+y',
                         mode='lines', name='No. of affected States+UT', 
                         line = dict(color='#ff2e63')))

fig.update_layout(title='No. of affected States / Union Territory', 
                  xaxis_title='Date', yaxis_title='No. of affected States / Union Territory')
fig.update_traces(textposition='top center')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()

### Confirmed vs Deaths

In [None]:
px.scatter(latest[latest['Confirmed']>10], x='Confirmed', y='Deaths', color='State/UT', size='Confirmed', 
           text='State/UT', log_x =True, title='Confirmed vs Death')

### Confirmed vs Cured

In [None]:
px.scatter(latest[latest['Confirmed']>10], x='Confirmed', y='Cured', color='State/UT', size='Confirmed', 
           text='State/UT', log_x =True, title='Confirmed vs Cured')

### Confirmed vs Deaths vs Cured between states

In [None]:
px.scatter_3d(latest[latest['Confirmed']>10], x='Confirmed', y='Deaths', z='Cured', size='Confirmed', color='State/UT', 
              text='State/UT', title='Confirmed vs Deaths vs Cured')

### Cases recorded over time

In [None]:
temp = df.groupby(['Date', 'State/UT'])['Confirmed'].sum()
temp = temp.reset_index().sort_values(by=['Date', 'State/UT'])

plt.style.use('seaborn')
g = sns.FacetGrid(temp, col="State/UT", hue="State/UT", sharey=False, col_wrap=4)
g = g.map(plt.plot, "Date", "Confirmed")
g.set_xticklabels(rotation=90)
plt.show()

### Bar graph race of cases over time

In [None]:
temp = df.copy(deep=True)
temp['Date'] = temp['Date'].dt.strftime('%Y-%m-%d')
temp = temp.pivot(index='Date', columns='State/UT', values='Confirmed').fillna(0).astype('int').reset_index()
temp = temp.melt(id_vars='Date', value_name='Confirmed')
temp['Confirmed'] = temp['Confirmed'].astype('int')
# temp = temp.sort_values('Total cases', ascending=False)
temp.head()

fig = px.bar(temp, y='State/UT', x='Confirmed', color='State/UT', orientation='h', 
             text='Confirmed', title='Over the time', animation_frame='Date', range_x=[0, 1500], height=1000)
fig.update_layout(yaxis={'categoryorder':'array', 
                         'categoryarray':tuple(reversed(state_order.to_list()))})
fig.show()

# More data

In [None]:
!ls ../input/covid19-corona-virus-india-dataset

In [None]:
p_df = pd.read_csv('../input/covid19-corona-virus-india-dataset/patients_data.csv')
p_df.head(5)

In [None]:
p_df['date_announced'] = pd.to_datetime(p_df['date_announced'], format='%d/%m/%Y')
p_df['status_change_date'] = pd.to_datetime(p_df['status_change_date'], format='%d/%m/%Y')

p_df['nationality'] = p_df['nationality'].replace('Indian', 'India')
p_df.head()

In [None]:
print(p_df.shape, '\n')
print(p_df.isna().sum())

In [None]:
print(p_df['age_bracket'].isna().sum(), 'out of', p_df.shape[0], 'values are missing')
px.histogram(p_df, x='age_bracket', color_discrete_sequence = ['#35495e'], nbins=50, title='Distribution of ages of confirmed patients')

In [None]:
fig = make_subplots(
    rows=1, cols=2, column_widths=[0.7, 0.3],
    subplot_titles = ['Gender vs Age', ''],
    specs=[[{"type": "histogram"}, {"type": "pie"}]]
)

temp = p_df[['age_bracket', 'gender']].dropna()
print('Total no. of values :', p_df.shape[0], '\nNo. of missing values :', p_df.shape[0]-temp.shape[0], '\nNo. of available values :', p_df.shape[0]-(p_df.shape[0]-temp.shape[0]))
gen_grp = temp.groupby('gender').count()

fig.add_trace(go.Histogram(x=temp[temp['gender']=='F']['age_bracket'], nbinsx=50, name='Female', marker_color='#6a0572'), 1, 1)
fig.add_trace(go.Histogram(x=temp[temp['gender']=='M']['age_bracket'], nbinsx=50, name='Male', marker_color='#39065a'), 1, 1)

fig.add_trace(go.Pie(values=gen_grp.values.reshape(-1).tolist(), labels=['Female', 'Male'], marker_colors = ['#6a0572', '#39065a']),1, 2)

fig.update_layout(showlegend=False)
fig.update_layout(barmode='stack')
fig.data[2].textinfo = 'label+text+value+percent'

fig.show()

In [None]:
fig = make_subplots(
    rows=1, cols=2, column_widths=[0.7, 0.3],
    subplot_titles = ['Cases vs Age', ''],
    specs=[[{"type": "histogram"}, {"type": "pie"}]]
)

temp = p_df[['age_bracket', 'current_status']].dropna()
print('Total no. of values :', p_df.shape[0], '\nNo. of missing values :', p_df.shape[0]-temp.shape[0], '\nNo. of available values :', p_df.shape[0]-(p_df.shape[0]-temp.shape[0]))
gen_grp = temp.groupby('current_status').count()

fig.add_trace(go.Pie(values=gen_grp.values.reshape(-1).tolist(), labels=['Deceased', 'Hospitalized', 'Recovered'], 
                     marker_colors = ['#fd0054', '#393e46', '#40a798'], hole=.3),1, 2)

fig.add_trace(go.Histogram(x=temp[temp['current_status']=='Deceased']['age_bracket'], nbinsx=50, name='Deceased', marker_color='#fd0054'), 1, 1)
fig.add_trace(go.Histogram(x=temp[temp['current_status']=='Recovered']['age_bracket'], nbinsx=50, name='Recovered', marker_color='#40a798'), 1, 1)
fig.add_trace(go.Histogram(x=temp[temp['current_status']=='Hospitalized']['age_bracket'], nbinsx=50, name='Hospitalized', marker_color='#393e46'), 1, 1)

fig.update_layout(showlegend=False)
fig.update_layout(barmode='stack')
fig.data[0].textinfo = 'label+text+value+percent'

fig.show()

In [None]:
print(p_df['current_status'].isna().sum(), 'out of', p_df.shape[0], 'values are missing')
fig = px.pie(p_df['current_status'].dropna(), 'current_status', 
             color_discrete_sequence =  ['#005691','#21bf73','#ff4d4d', '#3bb4c1'],
            title='Proportion of cases')
fig.data[0].textinfo = 'label+text+value+percent'
fig.show()

temp = p_df[['age_bracket', 'current_status']].dropna()
fig = px.histogram(temp, x='age_bracket', color='current_status', nbins=50, 
                   category_orders = {'current_status': ['Deceased', 'Recovered', 'Hospitalized', 'Migrated']},
                   color_discrete_sequence = ['#ff4d4d', '#21bf73', '#005691', '#3bb4c1'],
                  title='Distribution of ages of different cases of patients')
fig.show()

In [None]:
temp = pd.DataFrame(p_df[['type_of_transmission']].groupby('type_of_transmission')['type_of_transmission'].count())
temp.columns = ['count']
temp = temp.reset_index().sort_values(by='count')
fig = px.bar(temp, x='count', y='type_of_transmission', orientation='h', text='count', width=600, height=300,
       color_discrete_sequence = ['#35495e'], title='Type of transmission')
fig.update_xaxes(title='')
fig.update_yaxes(title='')
fig.show()

In [None]:
temp = p_df.groupby('nationality')['patient_number'].count().reset_index()
temp = temp.sort_values('patient_number')
temp = temp[temp['nationality']!='India']
fig = px.bar(temp, x='patient_number', y='nationality', orientation='h', text='patient_number', width=600,
       color_discrete_sequence = ['#35495e'], title='No. of foreign citizens')
fig.update_xaxes(title='')
fig.update_yaxes(title='')
fig.show()

In [None]:
dist = p_df.groupby(['detected_state', 'detected_district'])['patient_number'].count().reset_index()
dist.head()
fig = px.treemap(dist, path=["detected_state", "detected_district"], values="patient_number", height=700,
           title='Number of Confirmed Cases', color_discrete_sequence = px.colors.qualitative.Prism)
fig.data[0].textinfo = 'label+text+value'
fig.show()

In [None]:
temp = p_df[['date_announced', 'status_change_date', 'current_status']].dropna()
temp = temp[temp['status_change_date']!=temp['date_announced']]
temp['no_of_days'] = temp['status_change_date'] - temp['date_announced']
temp['no_of_days'] = temp['no_of_days'].dt.days
temp.head()

In [None]:
print('Total no. of values :', p_df.shape[0], '\nNo. of missing values :', p_df.shape[0]-temp.shape[0], '\nNo. of available values :', p_df.shape[0]-(p_df.shape[0]-temp.shape[0]))
px.box(temp, x="current_status", y="no_of_days", color='current_status')

In [None]:
p_df['notes'].value_counts()

In [None]:
p_df['notes'] = p_df['notes'].replace('Details Awaited', 'Details awaited')
p_df['notes'] = p_df['notes'].replace('Travelled from Dubai, UAE', 'Travelled from Dubai')
p_df['notes'] = p_df['notes'].replace('attended religious event Tablighi Jamaat in delhi', 'Attended Delhi Religious Conference')
p_df['notes'] = p_df['notes'].replace('Travelled from London', 'Travelled from UK')
p_df['notes'] = p_df['notes'].replace('Travelled from Dubai.', 'Travelled from Dubai')


temp = pd.DataFrame(p_df.groupby('notes')['notes'].count().sort_values(ascending=False))
temp.columns = ['count']
temp = temp.reset_index()
temp = temp[temp['notes']!='Details awaited']

print('Total no. of values :', p_df.shape[0], '\nNo. of missing values :', p_df.shape[0]-temp.shape[0], '\nNo. of available values :', p_df.shape[0]-(p_df.shape[0]-temp.shape[0]))

temp.head(10)

fig = px.bar(temp.head(10).sort_values('count', ascending=True), x='count', y='notes', orientation='h', text='count', width=600,
       color_discrete_sequence = ['#35495e'], title='Mining cases notes')
fig.update_xaxes(title='')
fig.update_yaxes(title='')

In [None]:
p_df['notes'].str.contains('Travelled from')

In [None]:
temp = p_df[~p_df['notes'].str.contains('Travelled from').isna()]
temp = temp[temp['notes'].str.contains('Travelled from')]
temp['notes'] = temp['notes'].str.replace('Travelled from ', '')
temp = temp.groupby('notes')['notes'].count().sort_values(ascending=False)
temp = pd.DataFrame(temp)
# temp.shape
temp.head(20)
# temp['notes'][2067]