## **COVID INFO AND ANALYSIS**

## Overview 
*(Source: WHO)*

Coronavirus disease (COVID-19) is an infectious disease caused by the SARS-CoV-2 virus.

Most people infected with the virus will experience mild to moderate respiratory illness and recover without requiring special treatment. However, some will become seriously ill and require medical attention. Older people and those with underlying medical conditions like cardiovascular disease, diabetes, chronic respiratory disease, or cancer are more likely to develop serious illness. Anyone can get sick with COVID-19 and become seriously ill or die at any age. 

The best way to prevent and slow down transmission is to be well informed about the disease and how the virus spreads. Protect yourself and others from infection by staying at least 1 metre apart from others, wearing a properly fitted mask, and washing your hands or using an alcohol-based rub frequently. Get vaccinated when it’s your turn and follow local guidance.

The virus can spread from an infected person’s mouth or nose in small liquid particles when they cough, sneeze, speak, sing or breathe. These particles range from larger respiratory droplets to smaller aerosols. It is important to practice respiratory etiquette, for example by coughing into a flexed elbow, and to stay home and self-isolate until you recover if you feel unwell.
## We have attempted to show how this deadly virus attacked the world

# Initialization

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)


## 

# Importing Libraries

In [None]:
#Data Processing
import pandas as pd
import numpy as np

#Data Visulaisation
import plotly.express as px

#Machine Learning Libraries
import sklearn 
from sklearn import linear_model
from sklearn.utils import shuffle

#Miscellaneous
import os
import warnings
warnings.filterwarnings('ignore')

# Reading DataSet

In [None]:
! aws s3 ls s3://databeanstalk-community/data/covid-19/

In [None]:
def read_data(path,file):
     return pd.read_csv(path+"/"+file)
path = 's3://databeanstalk-community/data/covid-19'
world = read_data(path,'worldometer_data.csv')
india = read_data(path,'covid_19_india.csv')
state = read_data(path,'StatewiseTestingDetails.csv')
daily = read_data(path,'nation_level_daily.csv')
vac_data = read_data(path,'country_vaccinations.csv')
pop_data = read_data(path,'worldometer_data.csv')
vac_manu = read_data(path,'country_vaccinations_by_manufacturer.csv')
state_vac=read_data(path,'covid_vaccine_statewise.csv')
daily_records=read_data(path,'day_wise.csv')

# Data Cleaning 
## Making the data more usable for working with it 

In [None]:
# For world Vaccination Dataset
usa_vac = vac_data[vac_data['country'] == 'United States']
uk_vac = vac_data[vac_data['country'] == 'United Kingdom']
ger_vac = vac_data[vac_data['country'] == 'Germany']
ita_vac = vac_data[vac_data['country'] == 'Italy']
fra_vac = vac_data[vac_data['country'] == 'France']
chi_vac = vac_data[vac_data['country'] == 'China']
rus_vac = vac_data[vac_data['country'] == 'Russia']
isr_vac = vac_data[vac_data['country'] == 'Israel']
uae_vac = vac_data[vac_data['country'] == 'United Arab Emirates']
can_vac = vac_data[vac_data['country'] == 'Canada']
jpn_vac = vac_data[vac_data['country'] == 'Japan']
ind_vac = vac_data[vac_data['country'] == 'India']
ino_vac = vac_data[vac_data['country'] == 'Indonesia']
mal_vac = vac_data[vac_data['country'] == 'Malaysia']
ban_vac = vac_data[vac_data['country'] == 'Bangladesh']
nig_vac = vac_data[vac_data['country'] == 'Nigeria']
phi_vac = vac_data[vac_data['country'] == 'Phillipines']
vie_vac = vac_data[vac_data['country'] == 'Vietnam']
egy_vac = vac_data[vac_data['country'] == 'Egypt']
pak_vac = vac_data[vac_data['country'] == 'Pakistan']
usa_vac.drop(usa_vac[usa_vac['daily_vaccinations'].isnull()].index, inplace=True)
uk_vac.drop(uk_vac[uk_vac['daily_vaccinations'].isnull()].index, inplace=True)
ger_vac.drop(ger_vac[ger_vac['daily_vaccinations'].isnull()].index, inplace=True)
ita_vac.drop(ita_vac[ita_vac['daily_vaccinations'].isnull()].index, inplace=True)
fra_vac.drop(fra_vac[fra_vac['daily_vaccinations'].isnull()].index, inplace=True)
chi_vac.drop(chi_vac[chi_vac['daily_vaccinations'].isnull()].index, inplace=True)
rus_vac.drop(rus_vac[rus_vac['daily_vaccinations'].isnull()].index, inplace=True)
isr_vac.drop(isr_vac[isr_vac['daily_vaccinations'].isnull()].index, inplace=True)
uae_vac.drop(uae_vac[uae_vac['daily_vaccinations'].isnull()].index, inplace=True)
can_vac.drop(can_vac[can_vac['daily_vaccinations'].isnull()].index, inplace=True)
jpn_vac.drop(jpn_vac[jpn_vac['daily_vaccinations'].isnull()].index, inplace=True)
ind_vac.drop(ind_vac[ind_vac['daily_vaccinations'].isnull()].index, inplace=True)
ino_vac.drop(ino_vac[ino_vac['daily_vaccinations'].isnull()].index, inplace=True)
mal_vac.drop(mal_vac[mal_vac['daily_vaccinations'].isnull()].index, inplace=True)
ban_vac.drop(ban_vac[ban_vac['daily_vaccinations'].isnull()].index, inplace=True)
nig_vac.drop(nig_vac[nig_vac['daily_vaccinations'].isnull()].index, inplace=True)
phi_vac.drop(phi_vac[phi_vac['daily_vaccinations'].isnull()].index, inplace=True)
vie_vac.drop(vie_vac[vie_vac['daily_vaccinations'].isnull()].index, inplace=True)
egy_vac.drop(egy_vac[egy_vac['daily_vaccinations'].isnull()].index, inplace=True)
pak_vac.drop(pak_vac[pak_vac['daily_vaccinations'].isnull()].index, inplace=True)


In [None]:
#For Indian Vaccination Dataset
df2=state_vac
df2 = df2.rename(columns= {'Updated On':'Date','Total Doses Administered':'TotalDoses','Male(Individuals Vaccinated)':'Male','Female(Individuals Vaccinated)':'Female', 'Total Individuals Vaccinated':'TotalVaccinated',' Covaxin (Doses Administered)':'Covaxin','CoviShield (Doses Administered)':'CoviShield','Sputnik V (Doses Administered)':'Sputnik'})
df2.Date = pd.to_datetime(df2.Date, format="%d/%m/%Y")
df3=india
df1=state
df2 = df2[df2['State'] !='India']
df2 = df2.rename(columns= {'Updated On':'Date','Total Doses Administered':'TotalDoses','Male(Individuals Vaccinated)':'Male','Female(Individuals Vaccinated)':'Female', 'Total Individuals Vaccinated':'TotalVaccinated',' Covaxin (Doses Administered)':'Covaxin','CoviShield (Doses Administered)':'CoviShield','Sputnik V (Doses Administered)':'Sputnik'})
df2.Date = pd.to_datetime(df2.Date, format="%d/%m/%Y")
df2_2=df2[df2['Date']=="2021-08-9"]
df2_2.dropna()
df2_1 = df3[df3['Date']=='2021-08-11']

# Data Visualization

## World Dataset

In [None]:
features=['TotalCases','TotalDeaths','TotalRecovered','ActiveCases']
for i in features:
    fig=px.treemap(world.iloc[0:25],
               values=i,
               path=['Country/Region'],
               template='plotly_dark',
               title="Tree Map depicting Impact of Covid-19 w.r.t {}".format(i))
    fig.show()

In [None]:
px.line(daily_records,
       x='Date',
       y=['Confirmed', 'Deaths', 'Recovered','Active'],
       template='plotly_dark',
       title='Daily trends of Covid-19 cases',
       labels={'Date':'Month','value':'Statistics'})

In [None]:
fig2=px.bar(world.iloc[0:20][::-1],
      y='Country/Region',
      x=['TotalCases','TotalRecovered', 'ActiveCases','TotalDeaths','Serious,Critical'],
      template='plotly_dark',
      title='Severly Hit Countries')
fig2.update_xaxes(tickangle=270)
fig2.show()

In [None]:
px.pie(world.iloc[0:20],
       names='Country/Region',
       values='TotalCases',
      template='plotly_dark',
      title='Distribution of Total Cases ')

## Indian Dataset

In [None]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go

grouped_data=read_data(path,'full_grouped.csv')
grouped_data.head()

def country_visualisations(df,country):
    data_group=df[df['Country/Region']==country]
    data=data_group.loc[:,['Date','Confirmed','Deaths','Recovered','Active']]
    figure1=make_subplots(rows=1,cols=4,subplot_titles=('Confirmed','Active','Recovered','Deaths'))
    
    figure1.add_trace(go.Scatter(name='Confirmed',x=data['Date'],y=data['Confirmed']),row=1,col=1)
    
    figure1.add_trace(go.Scatter(name='Active',x=data['Date'],y=data['Active']),row=1,col=2)
        
    figure1.add_trace(go.Scatter(name='Recovered',x=data['Date'],y=data['Recovered']),row=1,col=3)
    
    figure1.add_trace(go.Scatter(name='Deaths',x=data['Date'],y=data['Deaths']),row=1,col=4)
    
    figure1.update_layout(height=500,
                         width=2000,
                         title_text='Recorded Cases in {}'.format(country),
                         template='plotly_dark')
    figure1.show()

In [None]:
country_visualisations(grouped_data,'India')
country_visualisations(grouped_data,'Peru')

## World Vaccination Situation 

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=usa_vac['date'], y=usa_vac['daily_vaccinations'], mode='lines+markers', name='USA'))
fig.add_trace(go.Scatter(x=uk_vac['date'],y=uk_vac['daily_vaccinations'], mode='lines+markers', name='UK'))
fig.add_trace(go.Scatter(x=ger_vac['date'],y=ger_vac['daily_vaccinations'], mode='lines+markers', name='Germany'))
fig.add_trace(go.Scatter(x=ind_vac['date'],y=ind_vac['daily_vaccinations'], mode='lines+markers', name='India'))
fig.update_layout(title='Comparison of Daily Vaccinations' , template='plotly_dark' )
fig.show()

## Indian Situation

### Testing Situation In India

In [None]:
state.Date = pd.to_datetime(state.Date, format="%Y/%m/%d")
fig = px.line(state, x='Date', y='TotalSamples', color='State', title='Total number of samples collected for Covid-19 testing(Statewise)' , template="plotly_dark")

fig.show()

### Result of Testing


In [None]:
fig = px.bar(df2_1, x='State/UnionTerritory', y=['Confirmed','Cured','Deaths'], template="plotly_dark")
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.show()

### Indian Vaccination Status

In [None]:
fig = px.bar(df2_2, x='State', y='TotalDoses',title='Total Doses (Jan-Aug)',template="plotly_dark")
fig.update_traces(textposition='outside')
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.update_xaxes(tickfont=dict(size=14))
fig.update_yaxes(tickfont=dict(size=14))
fig.show()

In [None]:
fig = px.bar(df2_2, x="State", y=["Covaxin", "CoviShield",'TotalDoses'], template="plotly_dark")

fig.update_layout(barmode='stack',legend_orientation="h",legend=dict(x= 0.3, y=1.0),
                  xaxis={'categoryorder':'total descending'},
                 title_text='<b>Covid-19 Total Vaccinations in India according to type of vaccine<b>',
                  title_x=0.5,
                  width= 1100,
                  height= 500
                 )
fig.update_xaxes(tickfont=dict(size=14))
fig.update_yaxes(tickfont=dict(size=14))
fig.show()

# Machine Learning Part

## Attempt At Linear Regression 

In [None]:
from sklearn import model_selection
from sklearn.linear_model import LinearRegression
df2_1=df2_1[df2_1['State/UnionTerritory']!='Maharashtra']
states_clubbed=df2_1[["Confirmed","Cured","Deaths"]]
predict="Deaths"
X=np.array(states_clubbed.drop(predict,1))
y=np.array(states_clubbed[predict])
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.25)
linear = LinearRegression()
linear.fit(X_train,y_train)
Y_pred = linear.predict(X_test)
print(linear.score(X_test, y_test))
print(linear.score(X_train,y_train))


In [None]:
df2_1
fig = px.scatter(df2_1, x='Cured', y='Deaths', template="plotly_dark")
fig.show()