# India & World COVID-19 Analysis, Visualization, Comparison and Predictions

![](https://i.imgur.com/ly4htII.jpg)

## Introduction to COVID-19 

Coronaviruses are important human and animal pathogens.At the end of 2019, a novel coronavirus was identified as the cause of a cluster of pneumonia cases in Wuhan, a city in the Hubei Province of China. It rapidly spread, resulting in an epidemic throughout China, followed by an increasing number of cases in other countries throughout the world.

   In February 2020, the World Health Organization designated the disease COVID-19, which stands for coronavirus disease 2019. The virus that causes COVID-19 is designated severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).previously, it was referred to as 2019-nCoV. Currently (on the date 21 Sep 2020), this leads to a total of 960K+ Deaths across the globe. Pandemic is spreading all over the world; it becomes more important to understand about this spread. This NoteBook is an effort to analyze the cumulative data of confirmed, deaths, and recovered cases over time. 

In this notebook, the main focus is to analyze the spread trend of this virus all over the world.

## SOURCES:

- WHO
- Microsoft COVID-19 Tracker
- COVID-19 Tracker by Johns Hopkins University
- COVID-19-India-Tracker

## Dataset 

https://github.com/CSSEGISandData/COVID-19
- 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE
- This dataset is updated on daily basis by Johns Hopkins CSSE

##### Please don't PANIC, stay safe, follow your nation and WHO guidelines. 
##### We all can defeat this together. Please don't spread rumors.

## Imports and Datasets
- Pandas - for dataset handeling
- Numpy - Support for Pandas and calculations
- Matplotlib - for visualization (Platting graphas)
- Seaborn - for visualization (Platting beautiful graphas)
- Plotly - for interative plots

### Imports Library

In [None]:
import pandas as pd
import numpy as np
import datetime
import requests
import warnings

import matplotlib.pyplot as plt
import matplotlib
import matplotlib.dates as mdates
import seaborn as sns
import squarify
import plotly.offline as py
import plotly_express as px
import plotly.io as pio
#pio.renderers.default = 'browser'

from fbprophet import Prophet
from fbprophet.plot import plot_plotly, add_changepoints_to_plot

from IPython.display import Image
warnings.filterwarnings('ignore')

### Retriving Dataset

In [None]:
confirmed_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
deaths_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
recovered_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
latest_data = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/09-26-2020.csv')

## Preprocessing

In [None]:
#Rename Country/Region to Country_Region
confirmed_df.rename(columns={"Country/Region": "Country_Region"}, inplace=True)
deaths_df.rename(columns={"Country/Region": "Country_Region"}, inplace=True)
recovered_df.rename(columns={"Country/Region": "Country_Region"}, inplace=True)

#Changing the Country names
confirmed_df.loc[confirmed_df['Country_Region'] == "US", "Country_Region"] = "USA"
recovered_df.loc[recovered_df['Country_Region'] == "US", "Country_Region"] = "USA"
deaths_df.loc[deaths_df['Country_Region'] == "US", "Country_Region"] = "USA"
latest_data.loc[latest_data['Country_Region'] == "US", "Country_Region"] = "USA"

#Group by Country
df1 = confirmed_df.groupby('Country_Region').sum().reset_index()
df2 = deaths_df.groupby('Country_Region').sum().reset_index()
df3 = recovered_df.groupby('Country_Region').sum().reset_index()


#Melting confirm,death and recovered dateframe
world_confirmed_df =df1.melt(id_vars=['Country_Region', 'Lat', 'Long'], value_vars=df1.columns[4:], var_name='Date', value_name='Confirmed')
world_deaths_df = df2.melt(id_vars=['Country_Region','Lat', 'Long'], value_vars=df2.columns[4:], var_name='Date', value_name='Deaths')
world_recovered_df = df3.melt(id_vars=['Country_Region', 'Lat', 'Long'], value_vars=df3.columns[4:], var_name='Date', value_name='Recovered')

#Drop Lat & Long
world_confirmed_df.drop(['Lat', 'Long'], inplace=True, axis=1)
world_deaths_df.drop(['Lat', 'Long'], inplace=True, axis=1)
world_recovered_df.drop(['Lat', 'Long'], inplace=True, axis=1)

#Merging confirm,death and recovered dateframe
world_covid_df = world_confirmed_df.merge(world_recovered_df, on=['Country_Region','Date'])
world_covid_df = world_deaths_df.merge(world_covid_df, on=['Country_Region','Date'])
world_covid_df['Date'] = pd.to_datetime(world_covid_df['Date'],
                              format='%m/%d/%y')
#.dt.date

#Add New Cases
world_covid_df['Active']=world_covid_df['Confirmed']-world_covid_df['Recovered']-world_covid_df['Deaths']
world_covid_df['Active'] = world_covid_df['Active'].astype(int)

#Add Active Cases
world_covid_df['Prev_Confirmed'] = world_covid_df.groupby('Country_Region')['Confirmed'].shift(1)
world_covid_df['New Cases'] = world_covid_df['Confirmed'] - world_covid_df['Prev_Confirmed']
world_covid_df.drop('Prev_Confirmed',inplace = True,axis=1)
world_covid_df['New Cases'].fillna(0, inplace=True)

#Add New Recovered Cases
world_covid_df['Prev_Recovered'] = world_covid_df.groupby('Country_Region')['Recovered'].shift(1)
world_covid_df['New Recovered'] = world_covid_df['Recovered'] - world_covid_df['Prev_Recovered']
world_covid_df.drop('Prev_Recovered',inplace = True,axis=1)
world_covid_df['New Recovered'].fillna(0, inplace=True)

#Add New Deaths Cases
world_covid_df['Prev_Deaths'] = world_covid_df.groupby('Country_Region')['Deaths'].shift(1)
world_covid_df['New Deaths'] = world_covid_df['Deaths'] - world_covid_df['Prev_Deaths']
world_covid_df.drop('Prev_Deaths',inplace = True,axis=1)
world_covid_df['New Deaths'].fillna(0, inplace=True)

In [None]:
#Data Type change into integer
world_covid_df['New Deaths'] = world_covid_df['New Deaths'].astype(int)
world_covid_df['New Cases'] = world_covid_df['New Cases'].astype(int)
world_covid_df['New Recovered'] = world_covid_df['New Recovered'].astype(int)

In [None]:
#Slice Data Deaths,Recovered
day_wise_covid = pd.DataFrame(df1.iloc[:,3:].sum(axis = 0, skipna=True))
day_wise_covid['Deaths'] = pd.DataFrame(df2.iloc[:,3:].sum(axis = 0, skipna=True))
day_wise_covid['Recovered'] = pd.DataFrame(df3.iloc[:,3:].sum(axis = 0, skipna=True))

#Add Active Cases
day_wise_covid.rename(columns = {0:'Confirmed'}, inplace = True) 
day_wise_covid['Active']=day_wise_covid['Confirmed']-day_wise_covid['Recovered']-day_wise_covid['Deaths']
day_wise_covid['Active'] = day_wise_covid['Active'].astype(int)

#Reset Index & set Date
day_wise_covid.reset_index(inplace=True)
day_wise_covid.rename(columns = {'index':'Date'}, inplace = True) 
day_wise_covid['Date'] = pd.to_datetime(day_wise_covid['Date'],
                              format='%m/%d/%y').dt.date
                    
#Add New Cases                    
day_wise_covid['Prev_Confirmed'] = day_wise_covid['Confirmed'].shift(1)
day_wise_covid['New Cases'] = day_wise_covid['Confirmed'] - day_wise_covid['Prev_Confirmed']
day_wise_covid.drop('Prev_Confirmed',inplace = True,axis=1)
day_wise_covid['New Cases'].fillna(0, inplace=True)

#Add New Recovered  
day_wise_covid['Prev_Recovered'] = day_wise_covid['Recovered'].shift(1)
day_wise_covid['New Recovered'] = day_wise_covid['Recovered'] - day_wise_covid['Prev_Recovered']
day_wise_covid.drop('Prev_Recovered',inplace = True,axis=1)
day_wise_covid['New Recovered'].fillna(0, inplace=True)

#Add New Deaths 
day_wise_covid['Prev_Deaths'] = day_wise_covid['Deaths'].shift(1)
day_wise_covid['New Deaths'] = day_wise_covid['Deaths'] - day_wise_covid['Prev_Deaths']
day_wise_covid.drop('Prev_Deaths',inplace = True,axis=1)
day_wise_covid['New Deaths'].fillna(0, inplace=True)

#Add Deaths/Recovered, Mortality Rate and Recovered_Rate(per 100)
day_wise_covid["Mortality Rate(per 100)"] = np.round(100*day_wise_covid["Deaths"]/day_wise_covid["Confirmed"],2)
day_wise_covid["Recovered Rate(per 100)"] = np.round(100*day_wise_covid["Recovered"]/day_wise_covid["Confirmed"],2)
day_wise_covid["Deaths/Recovered(per 100)"] = np.round(100*day_wise_covid["Deaths"]/day_wise_covid["Recovered"],2)
#day_wise_covid.isnull().sum()

#day_wise_covid['Mortality Rate(per 100)'] = day_wise_covid['Deaths_Rate(per 100)'].replace(np.nan, 0)
#day_wise_covid['Recovered_Rate(per 100)'] =day_wise_covid['Recovered_Rate(per 100)'].replace(np.nan, 0)
#day_wise_covid['Deaths/Recovered(per 100)'] = day_wise_covid['Deaths/Recovered(per 100)'].replace(np.nan, 0)
#day_wise_covid.isnull().sum()


In [None]:
#Data Type change into integer
day_wise_covid['New Deaths'] = day_wise_covid['New Deaths'].astype(int)
day_wise_covid['New Cases'] = day_wise_covid['New Cases'].astype(int)
day_wise_covid['New Recovered'] = day_wise_covid['New Recovered'].astype(int)

In [None]:
#world_covid_df

In [None]:
world_data_map = latest_data.groupby('Country_Region').sum().reset_index()
world_data_map["Mortality Rate(per 100)"] = np.round(100*world_data_map["Deaths"]/world_data_map["Confirmed"],2)
world_data_map["Recovered Rate(per 100)"] = np.round(100*world_data_map["Recovered"]/world_data_map["Confirmed"],2)
world_data_map.drop('Case-Fatality_Ratio',inplace = True,axis=1)
world_data_map['Active'] = world_data_map['Active'].astype(int)


# General Analysis of Data

## Global Reported Cases till Date

Total percentage of deaths reported, recoveries and active cases all across the world

In [None]:
# color pallette
cnf, dth, rec, act = '#393e46', '#ff2e63', '#21bf73', '#fe9801' 


In [None]:
world_confirmed = confirmed_df[confirmed_df.columns[-1:]].sum()
world_recovered = recovered_df[recovered_df.columns[-1:]].sum()
world_deaths = deaths_df[deaths_df.columns[-1:]].sum()
world_active = world_confirmed - (world_recovered - world_deaths)

labels = ['Active','Recovered','Deaths']
sizes = [world_active,world_recovered,world_deaths]
color= ['#66b3ff','green','red']
explode = []

for i in labels:
    explode.append(0.02)
    
plt.figure(figsize= (15,10))
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=9, explode =explode,colors = color)
centre_circle = plt.Circle((0,0),0.30,fc='white')

fig = plt.gcf()
fig.gca().add_artist(centre_circle)
plt.title('World COVID-19 Cases',fontsize = 20)
plt.axis('equal')  
plt.tight_layout()

Total number of confirmed cases, deaths reported, recoveries and active cases all across the world

In [None]:
df_temp = world_data_map.copy()
df_temp.drop(['Lat', 'Long_','FIPS','Country_Region','Incidence_Rate','Mortality Rate(per 100)','Recovered Rate(per 100)'], inplace=True, axis=1)
           
df_t = pd.DataFrame(pd.to_numeric(df_temp.sum()),dtype=np.float64).transpose()
df_t["Mortality Rate (per 100)"] = np.round(100*df_t["Deaths"]/df_t["Confirmed"],2)
df_t.style.background_gradient(cmap='Wistia',axis=1).format("{:.2f}").format("{:.0f}",subset=["Confirmed","Deaths","Recovered","Active"])


## Country Wise Reported Cases

Country Wise reported confirmed cases, recovered cases, deaths cases, active cases , Incidence Rate & Mortality Rate


In [None]:
#day_wise_covid['Active'] = day_wise_covid['Active'].astype(int)
#world_covid_df['Active'] = world_covid_df['Active'].astype(int)
#df_temp

In [None]:
df_temp = world_data_map.copy()
df_temp= df_temp.set_index("Country_Region")
df_temp.drop(['Lat', 'Long_','FIPS','Recovered Rate(per 100)'], inplace=True, axis=1)
df_temp["Mortality Rate(per 100)"] = np.round(100*df_temp["Deaths"]/df_temp["Confirmed"],2)

df_temp.sort_values('Confirmed', ascending= False).style.background_gradient(cmap='Blues',subset=["Confirmed"])\
                        .background_gradient(cmap='Reds',subset=["Deaths"])\
                        .background_gradient(cmap='Greens',subset=["Recovered"])\
                        .background_gradient(cmap='Purples',subset=["Active"])\
                        .background_gradient(cmap='Pastel1_r',subset=["Incidence_Rate"])\
                        .background_gradient(cmap='YlOrBr',subset=["Mortality Rate(per 100)"])


## Bar Race Chart Using Flourish App

In [None]:
%%HTML
<div class="flourish-embed flourish-bar-chart-race" data-src="visualisation/3733302" data-url="https://flo.uri.sh/visualisation/3733302/embed" aria-label=""><script src="https://public.flourish.studio/resources/embed.js"></script></div>

##### The above Bar chart race is an attempt to show the progression of total cases in World.


In [None]:
fig, ax = plt.subplots(figsize=(15, 10))
ax.clear()
fig = sns.lineplot(x="Date",y="Active",data = day_wise_covid ,color="y",label="Active")
fig = sns.lineplot(x="Date",y="Recovered",data = day_wise_covid ,color="green",label="Recovered")
fig = sns.lineplot(x="Date",y="Deaths",data = day_wise_covid ,color="r",label="Deaths")
fig = sns.lineplot(x="Date",y="Confirmed",data = day_wise_covid ,color="c",label="Confirmed")
fig.set_xlabel('Date\n',size=15,color='#4bb4f2') 
fig.set_ylabel('Number of Cases\n', 
              size=15,color='#4bb4f2') 
fig.set_title('World Covid-19 Cases',size=25,color='navy')
fig.ticklabel_format(style='plain', axis='y',useOffset=False)


# World growth of COVID19 over time
In this section, I'll have a look at how COVID19 has been growing throughout the world from 22nd january 2020. I'll be using tree maps to show the share of COVID19 Cases worldwide and chloropleth maps with a time slider to show the daily impact of virus.

In [None]:
fig = px.line(world_covid_df,
              x='Date', y='Confirmed', color='Country_Region',
              title='World growth COVID19 Cases')
fig.show()

# Current Share of Worldwide COVID19 Cases

In [None]:
def plot_treemap(col):
    fig = px.treemap(world_data_map, path=["Country_Region"], values=col, height=700,
                 title=col, color_discrete_sequence = px.colors.qualitative.Dark2)
    fig.data[0].textinfo = 'label+text+value'
    fig.show()

In [None]:
plot_treemap('Confirmed')

### Current Share of Worldwide COVID19 Deaths Cases

In [None]:
plot_treemap('Deaths')

# Geospatial Analysis

### Total Confirmed Cases growth

In [None]:
fig = px.choropleth(world_covid_df, locations="Country_Region", 
                    color=np.log(world_covid_df["Confirmed"]),
                    locationmode='country names', hover_name="Country_Region", 
                    animation_frame=world_covid_df["Date"].dt.strftime('%Y-%m-%d'),
                    title='Cases over time', color_continuous_scale=px.colors.sequential.matter)
#fig.update(layout_coloraxis_showscale=False)
fig.show()

### Total Deaths Cases growth

In [None]:
fig = px.choropleth(world_covid_df, locations="Country_Region", 
                    color=np.log(world_covid_df["Deaths"]),
                    locationmode='country names', hover_name="Country_Region", 
                    animation_frame=world_covid_df["Date"].dt.strftime('%Y-%m-%d'),
                    title='Deaths Cases over time', color_continuous_scale=px.colors.sequential.matter)
#fig.update(layout_coloraxis_showscale=False)
fig.show()

##### From the above plots:-

- China was the first country to experience the onset of virus.
- US ,India and Italy, which are the worst affected countries currently didn't recond many cases in january. This shows that how fast the virus spreads.
- Western Europe is the worst affected. Hence, it can be adjudged as the new epicenter of COVID19. USA is also in the reckoning.
- Lockdown has seem to have worked in China's favour as the growth rate has plummeted.

In [None]:
figure = px.choropleth(world_data_map,locations='Country_Region', locationmode='country names', color='Confirmed', hover_name='Country_Region',hover_data=["Active",'Confirmed','Recovered','Deaths'],color_continuous_scale=px.colors.sequential.Viridis,title='World Covid19 Cases')
figure.show()

# COVID19 in top 20 affected countries

### Top 20 countries (Confirmed Cases and Deaths)
I need to find the Top 20 affected countries. Since, the Confirmed cases and Fatalities are the cummulative 
sums till date, I'll find the top 20 countries by using the country data of the last date for which the training
 data is available.

In [None]:
top_20 = world_data_map.sort_values(by=['Confirmed'], ascending=False).head(20)
### Generate a Barplot
plt.figure(figsize=(12,10))
plot = sns.barplot(top_20['Confirmed'], top_20['Country_Region'])
for i,(value,name) in enumerate(zip(top_20['Confirmed'],top_20['Country_Region'])):
    plot.text(value,i-0.05,f'{value:,.0f}',size=10)
plt.suptitle('Top 20 countries Confirmed Cases',fontsize = 20)
plt.xlabel("Number of Cases",fontsize = 16)
plt.ylabel("Country",fontsize = 16)
plt.show()

In [None]:
top20_countries = world_covid_df[world_covid_df['Country_Region'].isin(top_20['Country_Region'])]

In [None]:
fig = px.line(top20_countries,
              x='Date', y='Confirmed', color='Country_Region',
              title='Confirmed Cases ')
fig.show()

In [None]:
fig = px.line(top20_countries,
              x='Date', y='Deaths', color='Country_Region',
              title='Deaths Cases ')
fig.show()

#### From the above line plots for the top 20 affected countries:

- The cases and deaths are monotonically increasing(almost exponentially) for rest of the countries.
- US & India has shown the greatest rise in the number of Confirmed & deaths Cases. Russia on the other hand having the 3rd highest rise in Confirmed Cases.
- Cases and Deaths for China have stagnated over time.


# Confirmed & Recovered Cases Top 5 Countries

In [None]:
top_5 = world_data_map.sort_values(by=['Confirmed'], ascending=False).head(5)

# Generate a Barplot
plt.figure(figsize=(15,5))
confirmed = sns.barplot(top_5['Country_Region'],top_5['Confirmed'], color = 'hotpink', label='Confirmed')
recovered = sns.barplot( top_5['Country_Region'],top_5['Recovered'], color = 'c', label='Recovered')

# Add Texts for Barplots
plt.suptitle('Top 5 Countries Confirmed & Recovered Cases',fontsize = 20)
plt.xlabel("Country",fontsize = 16)
plt.ylabel("Number of Cases",fontsize = 16)
plt.legend(loc=1)
plt.show()

# COVID-19 Mortality Rate & Recovered Rate Variation Over Period of Time

### Mortality Rate
##### For any epidemic the one of the most important evaluation is Mortality Rate. It is the measure of number of deaths in a particular population during a specific interval.


In [None]:
def plot_daywise_line(col, hue):
    fig = px.line(day_wise_covid, x="Date", y=col, width=700, color_discrete_sequence=[hue])
    fig.update_layout(title=col, xaxis_title="", yaxis_title="")
    fig.show()

In [None]:
plot_daywise_line('Mortality Rate(per 100)', dth)

### Recovered Rate
#### It is  the measure of number of recovered in a particular population during a specific interval.

In [None]:
plot_daywise_line('Recovered Rate(per 100)', rec)

In [None]:
plot_daywise_line('Deaths/Recovered(per 100)', cnf)

In [None]:
fig = px.scatter(world_data_map.sort_values('Deaths', ascending=False).iloc[:20, :], 
                 x='Confirmed', y='Deaths', color='Country_Region', size='Confirmed', 
                 height=700, text='Country_Region', log_x=True, log_y=True, 
                 title='Deaths vs Confirmed (Scale in log10)')
fig.update_traces(textposition='top center')
fig.update_layout(showlegend=False)
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

In [None]:
fig = px.scatter(world_data_map.sort_values('Deaths', ascending=False).iloc[:20, :], 
                 x='Recovered', y='Deaths', color='Country_Region', size='Recovered', 
                 height=700, text='Country_Region', log_x=True, log_y=True, 
                 title='Recovered vs Deaths (Scale in log10)')
fig.update_traces(textposition='top center')
fig.update_layout(showlegend=False)
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

In [None]:
fig = px.scatter(world_data_map.sort_values('Recovered', ascending=False).iloc[:20, :], 
                 x='Confirmed', y='Recovered', color='Country_Region', size='Confirmed', 
                 height=700, text='Country_Region', log_x=True, log_y=True, 
                 title='Recovered vs Confirmed (Scale in log10)')
fig.update_traces(textposition='top center')
fig.update_layout(showlegend=False)
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

# Global Prediction

## Prediction Curve for Global Confirmed Cases(30 Days) 

In [None]:
data = pd.DataFrame(columns = ['ds','y'])
data['ds'] = day_wise_covid['Date']
#dates
data['y'] = day_wise_covid['Confirmed']
#data

prop=Prophet()
prop.fit(data)
future=prop.make_future_dataframe(periods=30)
prop_forecast=prop.predict(future)
forecast = prop_forecast[['ds','yhat']].tail(30)

fig = plot_plotly(prop, prop_forecast)
fig = prop.plot(prop_forecast,xlabel='Date',ylabel='Deaths Cases(Lakhs)')

## Prediction Curve for Global Death Cases(30 Days)

In [None]:
data = pd.DataFrame(columns = ['ds','y'])
data['ds'] = day_wise_covid['Date']
#dates
data['y'] = day_wise_covid['Deaths']
#data

prop=Prophet()
prop.fit(data)
future=prop.make_future_dataframe(periods=30)
prop_forecast=prop.predict(future)
forecast = prop_forecast[['ds','yhat']].tail(30)

fig = plot_plotly(prop, prop_forecast)
fig = prop.plot(prop_forecast,xlabel='Date',ylabel='Deaths Cases(Lakhs)')

# INDIA vs USA COVID19

In [None]:
usa_df_data = world_covid_df [world_covid_df ['Country_Region'] == 'USA']
#world_covid_df 
#usa_df_data
usa_df_data.reset_index(drop=True,inplace=True)

usa_df_data["Mortality Rate(per 100)"] = np.round(100*usa_df_data["Deaths"]/usa_df_data["Confirmed"],2)
usa_df_data["Recovered Rate(per 100)"] = np.round(100*usa_df_data["Recovered"]/usa_df_data["Confirmed"],2)
usa_df_data["Incidence Rate(per 100)"] = np.round(100*usa_df_data["New Cases"]/usa_df_data["Confirmed"],2)

usa_df_data['Mortality Rate(per 100)'] = usa_df_data['Mortality Rate(per 100)'].replace(np.nan, 0)
usa_df_data['Incidence Rate(per 100)'] = usa_df_data['Incidence Rate(per 100)'].replace(np.nan, 0)
usa_df_data['Recovered Rate(per 100)'] = usa_df_data['Recovered Rate(per 100)'].replace(np.nan, 0)

#usa_df_data.tail(5)

In [None]:
india_df_data = world_covid_df [world_covid_df ['Country_Region'] == 'India']
#world_covid_df 
#india_df_data
india_df_data.reset_index(drop=True,inplace=True)

india_df_data["Mortality Rate(per 100)"] = np.round(100*india_df_data["Deaths"]/india_df_data["Confirmed"],2)
india_df_data["Recovered Rate(per 100)"] = np.round(100*india_df_data["Recovered"]/india_df_data["Confirmed"],2)
india_df_data["Incidence Rate(per 100)"] = np.round(100*india_df_data["New Cases"]/india_df_data["Confirmed"],2)

india_df_data['Mortality Rate(per 100)'] = india_df_data['Mortality Rate(per 100)'].replace(np.nan, 0)
india_df_data['Incidence Rate(per 100)'] = india_df_data['Incidence Rate(per 100)'].replace(np.nan, 0)
india_df_data['Recovered Rate(per 100)'] = india_df_data['Recovered Rate(per 100)'].replace(np.nan, 0)


#india_df_data.isnull().sum()
#india_df_data.tail(5)

In [None]:
temp = [usa_df_data,india_df_data]
temp = pd.concat(temp)
temp_new = temp.melt(id_vars=['Country_Region','Date'], value_vars=['Confirmed','Recovered','Deaths','Active'],var_name='Case', value_name='Count')
#temp_new

In [None]:
data1 = usa_df_data.tail(1)
data2 = india_df_data.tail(1)
temp4 = [data1,data2]
temp4 = pd.concat(temp4)
temp_new4 = temp4.melt(id_vars=['Country_Region'], value_vars=["Confirmed","Recovered","Active","Deaths"],var_name='Case', value_name='Count')
#temp_new4



# India vs USA Reported Cases till Date

In [None]:
plt.figure(figsize=(10,8))
sns.barplot(x = "Case",y="Count",data=temp_new4,hue="Country_Region",ci=None);


## India vs USA growth of COVID19 over time

### Confirmed Cases

In [None]:
data_df1 = temp_new[temp_new["Case"]=='Confirmed']
fig = px.line(data_df1, x="Date", y="Count", color='Country_Region', 
                  height=600, title='Confirmed', 
                  color_discrete_sequence = ["black","lightgrey"])
fig.update_layout(title="India vs USA Covid-19 Confirmed Cases", xaxis_title="", yaxis_title="")
fig.show()

### Active Cases

In [None]:
data_df1 = temp_new[temp_new["Case"]=='Active']
fig = px.line(data_df1, x="Date", y="Count", color='Country_Region', 
                  height=600, 
                  color_discrete_sequence = ["olive","darkseagreen"])
fig.update_layout(title="India vs USA Covid-19 Active Cases", xaxis_title="", yaxis_title="")
fig.show()

### Recovered Cases

In [None]:
data_df1 = temp_new[temp_new["Case"]=='Recovered']
fig = px.line(data_df1, x="Date", y="Count", color='Country_Region', 
                  height=600, 
                  color_discrete_sequence = ["green","yellowgreen"])
fig.update_layout(title="India vs USA Covid-19 Recovered Cases", xaxis_title="", yaxis_title="")
fig.show()

### Deaths Cases

In [None]:
data_df1 = temp_new[temp_new["Case"]=='Deaths']
fig = px.line(data_df1, x="Date", y="Count", color='Country_Region', 
                  height=600, 
                  color_discrete_sequence = ["red","orange"])
fig.update_layout(title="India vs USA Covid-19 Deaths Cases", xaxis_title="", yaxis_title="")
fig.show()

## Daily Cases Wise

In [None]:
temp2 = [usa_df_data,india_df_data]
temp2 = pd.concat(temp2)
temp_new2 = temp2.melt(id_vars=['Country_Region','Date'], value_vars=['New Cases','New Deaths'],var_name='New Case', value_name='Count')
#temp_new2

### New Confirmed Cases

In [None]:
data_df1 = temp_new2[temp_new2["New Case"]=='New Cases']
fig = px.line(data_df1, x="Date", y="Count", color='Country_Region', 
                  height=600,
                  color_discrete_sequence = ["black","grey"])
fig.update_layout(title="India vs USA Covid-19 Daily New Confirmed Cases", xaxis_title="", yaxis_title="")
fig.show()

### New Deaths Cases

In [None]:
df1 = temp_new2[temp_new2["New Case"]=='New Deaths']
fig = px.line(df1, x="Date", y="Count", color='Country_Region', 
                  height=600, title='Confirmed', 
                  color_discrete_sequence = ["brown","lightcoral"])
fig.update_layout(title="India vs USA Covid-19 Daily New Deaths Cases", xaxis_title="", yaxis_title="")
fig.show()

# India vs USA Covid-19 Mortality Rate (per 100)

In [None]:
temp3 = [usa_df_data,india_df_data]
temp3 = pd.concat(temp3)
temp_new3 = temp3.melt(id_vars=['Country_Region','Date'], value_vars=["Mortality Rate(per 100)","Recovered Rate(per 100)","Incidence Rate(per 100)"],var_name='Cases Rates', value_name='Count')
#temp_new3

In [None]:
df1 = temp_new3[temp_new3["Cases Rates"]=='Mortality Rate(per 100)']
fig = px.line(df1, x="Date", y="Count", color='Country_Region', 
                  height=600, 
                  color_discrete_sequence = ["deeppink","crimson"])
fig.update_layout(title="India vs USA Covid-19 Mortality Rate (per 100)", xaxis_title="", yaxis_title="")
fig.show()

## India vs USA Covid-19 Recovered Rate (per 100)

In [None]:
df1 = temp_new3[temp_new3["Cases Rates"]=='Recovered Rate(per 100)']
fig = px.line(df1, x="Date", y="Count", color='Country_Region', 
                  height=600, 
                  color_discrete_sequence = ["forestgreen","darkseagreen"])
fig.update_layout(title="India vs USA Covid-19 Recovered Rate (per 100)", xaxis_title="", yaxis_title="")
fig.show()

## India vs USA Covid-19 Incidence Rate (per 100)

In [None]:
df1 = temp_new3[temp_new3["Cases Rates"]=='Incidence Rate(per 100)']
fig = px.line(df1, x="Date", y="Count", color='Country_Region', 
                  height=600, 
                  color_discrete_sequence = ["lightseagreen","blue"])
fig.update_layout(title="India vs USA Covid-19 Incidence Rate (per 100)", xaxis_title="", yaxis_title="")
fig.show()


# INDIA Covid19

In [None]:
#import dataset
india_df = pd.read_csv('../input/covid19-in-india/covid_19_india.csv')

#Drop Lat & Long
india_df.drop(['ConfirmedIndianNational', 'ConfirmedForeignNational','Sno'], inplace=True, axis=1)

#Rename Cured to Recovered
india_df.rename(columns = {'Cured':'Recovered'}, inplace = True) 

#india_df.tail()

In [None]:
#Date format change
india_df['Date'] = pd.to_datetime(india_df['Date'],
                                  format='%d/%m/%y').dt.date
   

In [None]:
#Add Active Cases
india_df['Active']=india_df['Confirmed']-india_df['Recovered']-india_df['Deaths']
india_df['Active'] = india_df['Active'].astype(int)

In [None]:
#Rename State/UnionTerritory to Province_State
india_df.rename(columns = {'State/UnionTerritory':'Province_State'}, inplace = True) 

#india_df

In [None]:
#Add New Cases
india_df['Prev_Confirmed'] = india_df.groupby('Province_State')['Confirmed'].shift(1)
india_df['New Cases'] = india_df['Confirmed'] - india_df['Prev_Confirmed']
india_df.drop('Prev_Confirmed',inplace = True,axis=1)
india_df['New Cases'].fillna(0, inplace=True)

#Add New Recovered Cases
india_df['Prev_Recovered'] = india_df.groupby('Province_State')['Recovered'].shift(1)
india_df['New Recovered'] = india_df['Recovered'] - india_df['Prev_Recovered']
india_df.drop('Prev_Recovered',inplace = True,axis=1)
india_df['New Recovered'].fillna(0, inplace=True)

#Add New Deaths Cases
india_df['Prev_Deaths'] = india_df.groupby('Province_State')['Deaths'].shift(1)
india_df['New Deaths'] = india_df['Deaths'] - india_df['Prev_Deaths']
india_df.drop('Prev_Deaths',inplace = True,axis=1)
india_df['New Deaths'].fillna(0, inplace=True)

#Type Change into integer
india_df['New Deaths'] = india_df['New Deaths'].astype(int)
india_df['New Cases'] = india_df['New Cases'].astype(int)
india_df['New Recovered'] = india_df['New Recovered'].astype(int)

#Add Deaths/Recovered, Mortality & Recovered Rate(per 100)
india_df["Mortality Rate(per 100)"] = np.round(100*india_df["Deaths"]/india_df["Confirmed"],2)
india_df["Recovered Rate(per 100)"] = np.round(100*india_df["Recovered"]/india_df["Confirmed"],2)
india_df["Deaths/Recovered(per 100)"] = np.round(100*india_df["Deaths"]/india_df["Recovered"],2)
#india_df.isnull().sum()

In [None]:
#Remove Nan Value
india_df['Mortality Rate(per 100)'] = india_df['Mortality Rate(per 100)'].replace(np.nan, 0)
india_df['Recovered Rate(per 100)'] =india_df['Recovered Rate(per 100)'].replace(np.nan, 0)
india_df['Deaths/Recovered(per 100)'] = india_df['Deaths/Recovered(per 100)'].replace(np.nan, 0)
india_df.isnull().sum()


In [None]:
#Drop Unknown &  Unassigned
df_deffer = india_df['Province_State']=='Unknown' 
india_df.drop(india_df[df_deffer].index,inplace=True)
df_deffer = india_df['Province_State']=='Unassigned' 
india_df.drop(india_df[df_deffer].index,inplace=True)


In [None]:
india_data_json = requests.get('https://api.rootnet.in/covid19-in/unofficial/covid19india.org/statewise').json()
df_india_latest = pd.io.json.json_normalize(india_data_json['data']['statewise'])
df_deffer = df_india_latest['state']=='State Unassigned' 
df_india_latest.drop(df_india_latest[df_deffer].index,inplace=True)

df_india_latest = df_india_latest.set_index("state")
df_india_latest["Mortality Rate(per 100)"]= np.round(np.nan_to_num(100*df_india_latest["deaths"]/df_india_latest["confirmed"]),2)

## Reported Cases till Date

In [None]:
total = df_india_latest.sum()
total.name = "Total"
df_t = pd.DataFrame(total,dtype=float).transpose()
df_t["Mortality Rate(per 100)"] = np.round(100*df_t["deaths"]/df_t["confirmed"],2)
df_t.style.background_gradient(cmap='Purples',axis=1).format("{:.2f}").format("{:.0f}",subset=["confirmed","deaths","recovered","active"])

In [None]:
#df_india_latest

In [None]:
#Remove State Unassigned state
#df_deffer = df_india_latest['state']=='State Unassigned' 
#df_india_latest.drop(df_india_latest[df_deffer].index,inplace=True)
#df_india_days

## Statewise Insights

In [None]:
df_india_latest["Mortality Rate(per 100)"]= np.round(np.nan_to_num(100*df_india_latest["deaths"]/df_india_latest["confirmed"]),2)

df_india_latest.style.background_gradient(cmap='Blues',subset=["confirmed"])\
                        .background_gradient(cmap='Reds',subset=["deaths"])\
                        .background_gradient(cmap='Greens',subset=["recovered"])\
                        .background_gradient(cmap='Purples',subset=["active"])\
                        .background_gradient(cmap='YlOrBr',subset=["Mortality Rate(per 100)"])\
                        .format("{:.2f}",subset=["Mortality Rate(per 100)"])

In [None]:
india_df['Date'] = pd.to_datetime(india_df['Date'],format='%Y-%m-%d')
india_df.reset_index(drop=True,inplace=True) 
df_india_days =  india_df.groupby('Date').sum().reset_index() 
#india_df.head()

In [None]:
#df_india_days

## India growth COVID19 Cases

### Overall Cases

In [None]:
fig, ax = plt.subplots(figsize=(15, 10))
ax.clear()
#fig = sns.lineplot(x="date",y="new_cases",data = df_india,color="m",label="New Cases")
fig = sns.lineplot(x="Date",y="Active",data = df_india_days ,color="y",label="Active")
fig = sns.lineplot(x="Date",y="Recovered",data = df_india_days ,color="green",label="Recovered")
fig = sns.lineplot(x="Date",y="Deaths",data = df_india_days ,color="r",label="Deaths")
fig = sns.lineplot(x="Date",y="Confirmed",data = df_india_days ,color="c",label="Confirmed")
fig.set_xlabel('Date\n',size=15,color='#4bb4f2') 
fig.set_ylabel('Number of Cases\n', 
              size=15,color='#4bb4f2') 
fig.set_title('India Covid-19 Cases',size=25,color='navy')
fig.ticklabel_format(style='plain', axis='y',useOffset=False)

In [None]:
df1 = df_india_days.melt(id_vars='Date', value_vars=['New Cases','New Deaths','New Recovered'], 
                 var_name='Cases', value_name='Cases Count')
fig = px.line(df1, x="Date", y="Cases Count",color='Cases')
fig.update_layout(title="India Covid-19 New Cases", xaxis_title="", yaxis_title="")
fig.show()
#df_india_days


# State Wise

In [None]:
fig = px.line(india_df,x='Date', y='Confirmed', color='Province_State',title='India growth COVID19 Cases ')
fig.show()
#india_df.head()

In [None]:
#df_india_latest.head()

# Current Share of India COVID19 Cases

In [None]:
def plot_treemap(col):
    fig = px.treemap(df_india_latest, path=["state"], values=col, height=700,
                 title=col, color_discrete_sequence = px.colors.qualitative.Dark2)
    fig.data[0].textinfo = 'label+text+value'
    fig.show()

In [None]:
df_india_latest.reset_index(inplace=True) 
#df_india_latest.head()

In [None]:
plot_treemap("confirmed")

In [None]:
plot_treemap("recovered")

In [None]:
r = requests.get(url='https://raw.githubusercontent.com/geohacker/india/master/state/india_telengana.geojson')
geojson = r.json()

def change_state_name(state):
    if state == 'Odisha':
        return 'Orissa'
    elif state == 'Telengana':
        return 'Telangana'
    return state

In [None]:
df_india_latest['state'] = df_india_latest.apply(lambda x: change_state_name(x['state']), axis=1)


In [None]:
#df_india_latest.head()

# Top 10 State

In [None]:
def plot_hbar(df, col, n, hover_data=[]):
    fig = px.bar(df.sort_values(col).tail(n), 
                 x=col, y="state", color='state',  
                 text=col, orientation='h', width=700, hover_data=hover_data,
                 color_discrete_sequence = px.colors.qualitative.Dark24)
    fig.update_layout(title=col, xaxis_title="", yaxis_title="", 
                      yaxis_categoryorder = 'total ascending',
                      uniformtext_minsize=8, uniformtext_mode='hide')
    fig.show()




### Bar Plot Analysis

In [None]:
plot_hbar(df_india_latest, 'confirmed', 10)
#df_india_latest

In [None]:
plot_hbar(df_india_latest, 'recovered', 10)

In [None]:
plot_hbar(df_india_latest, 'deaths', 10)

# Geospatial Analysis

In [None]:
#columns = ['state', 'confirmed', 'recovered','deaths','active']
#df_india_latest = df_india_latest[columns]
df_india_latest.sort_values('confirmed',inplace=True, ascending=False)
df_india_latest.reset_index(drop=True,inplace=True)

df_india_latest.style.background_gradient(cmap='viridis')

fig = px.choropleth(df_india_latest, geojson=geojson, color="confirmed",
                    locations="state", featureidkey="properties.NAME_1",
                    hover_data=["active",'recovered','deaths'],
                    color_continuous_scale=px.colors.sequential.Viridis,
                    title='India: Total Current cases per state'
                   )
fig.update_geos(fitbounds="locations", visible=True)
fig.update_geos(projection_type="orthographic")
fig.update_layout(height=600,margin={"r":0,"t":30,"l":0,"b":30})
fig.show()


In [None]:
df_india_latest["Recovered_Rate(per 100)"] = np.round(100*df_india_latest["recovered"]/df_india_latest["confirmed"],2)
df_india_latest["Deaths/Recovered(per 100)"] = np.round(100*df_india_latest["deaths"]/df_india_latest["recovered"],2)

In [None]:
#df_india_latest.reset_index(inplace=True)
#df_india_latest

# India Confirmed vs Deaths (Scale in log10)

In [None]:
fig = px.scatter(df_india_latest.sort_values('deaths', ascending=False).iloc[:20, :], 
                 x='confirmed', y='deaths', color='state', size='confirmed', 
                 height=700, text='state', log_x=True, log_y=True, 
                 title='Confirmed vs Deaths (Scale in log10)')
fig.update_traces(textposition='top center')
fig.update_layout(showlegend=False)
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

## India Confirmed vs Recovered (Scale in log10)

In [None]:
fig = px.scatter(df_india_latest.sort_values('recovered', ascending=False).iloc[:20, :], 
                 x='confirmed', y='recovered', color='state', size='confirmed', 
                 height=700, text='state', log_x=True, log_y=True, 
                 title='Confirmed vs Recovered (Scale in log10)')
fig.update_traces(textposition='top center')
fig.update_layout(showlegend=False)
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

## India Recovered vs Deaths (Scale in log10)

In [None]:
fig = px.scatter(df_india_latest.sort_values('deaths', ascending=False).iloc[:20, :], 
                 x='recovered', y='deaths', color='state', size='recovered', 
                 height=700, text='state', log_x=True, log_y=True, 
                 title='Recovered vs Deaths (Scale in log10)')
fig.update_traces(textposition='top center')
fig.update_layout(showlegend=False)
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

# Prediction

## Prediction Curve for India Confirmed Cases (30 Days)

In [None]:
data = pd.DataFrame(columns = ['ds','y'])
data['ds'] = df_india_days['Date']
#dates
data['y'] = df_india_days['Confirmed']
#df_india_days
#data

prop=Prophet()
prop.fit(data)
future=prop.make_future_dataframe(periods=30)
prop_forecast=prop.predict(future)
forecast = prop_forecast[['ds','yhat']].tail(30)

fig = plot_plotly(prop, prop_forecast)
fig = prop.plot(prop_forecast,xlabel='Date',ylabel='Confirmed Cases (lakhs)')

## Prediction Curve for India Death Cases (30 Days)

In [None]:
data = pd.DataFrame(columns = ['ds','y'])
data['ds'] = df_india_days['Date']
#dates
data['y'] = df_india_days['Deaths']
#df_india_days
#data

prop=Prophet()
prop.fit(data)
future=prop.make_future_dataframe(periods=30)
prop_forecast=prop.predict(future)
forecast = prop_forecast[['ds','yhat']].tail(30)

fig = plot_plotly(prop, prop_forecast)
fig = prop.plot(prop_forecast,xlabel='Date',ylabel='Deaths Cases')

## Connect with me! 🌐

[<img target="_blank" src="https://img.icons8.com/bubbles/100/000000/linkedin.png" title="LinkedIn" align="left">](https://www.linkedin.com/in/rajabhijeet22/)
[<img target="_blank" src="https://img.icons8.com/bubbles/100/000000/github.png" title="Github"  align="left">](https://github.com/abhijeetraj22)[<img target="_blank" src="https://img.icons8.com/bubbles/100/000000/instagram-new.png" title="Instagram"  align="left">](https://www.instagram.com/abhijeet_raj_/?hl=en)[<img target="_blank" src="https://img.icons8.com/bubbles/100/000000/twitter.png" title="LinkedIn"  align="left">](https://twitter.com/abhijeet_raj_/)

## Feedback 
- Your feedback is much appreciated
- Comment if you have any doubts or you found any errors in the notebook
- Please UPVOTE if you LIKE this notebook
