# CORONA VIRUS DATA ANALYSIS. 

This notebook aims uses publicly available data made available by the datameet group. They scrape data from trustable government sites as mentioned on their github. In addition to this they have also done a great job at scraping data from non governmental sites like twitter , to collect data on the number of non coronavirus deaths that are taking place in India.

## Data source

https://github.com/datameet/covid19/tree/master/data

## API used to fetch data

https://api.github.com <br>
Accept: application/vnd.github.v3+json <br>
GET /repos/:owner/:repo/contents/:path <br>

In [150]:
import requests

In [151]:
help(requests.get)

Help on function get in module requests.api:

get(url, params=None, **kwargs)
    Sends a GET request.
    
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary, list of tuples or bytes to send
        in the query string for the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response



In [486]:
response = requests.get('https://api.github.com/repos/datameet/covid19/contents/data',params = {'Accept':'application/vnd.github.v3+json'})

In [487]:
downloads = {file['name']:requests.get(file['download_url']) for file in eval(response.text)}
downloads

{'all_totals.json': <Response [200]>,
 'icmr_testing_status.json': <Response [200]>,
 'mohfw.json': <Response [200]>,
 'non_virus_deaths.json': <Response [200]>,
 'total_confirmed_cases.json': <Response [200]>}

In [488]:
import pandas as pd
from io import StringIO
import numpy as np
import json

In [489]:
mohfw = pd.read_json(StringIO(json.dumps([obj['value'] for obj in json.loads(downloads['mohfw.json'].text)['rows']])),orient = 'records')

icmr_testing = pd.read_json(StringIO(json.dumps([obj['value'] for obj in json.loads(downloads['icmr_testing_status.json'].text)['rows']])),orient = 'records')

non_virus_deaths = pd.read_json(StringIO(json.dumps([obj['value'] for obj in json.loads(downloads['non_virus_deaths.json'].text)['rows']])),orient = 'records')

total_confirmed_cases = pd.read_json(StringIO(json.dumps([{'date':obj['key'][0],'cases':obj['value']} for obj in json.loads(downloads['total_confirmed_cases.json'].text)['rows']])),orient = 'records')

all_totals = pd.read_json(StringIO(json.dumps([{'day':obj['key'][0],'type':obj['key'][1],'value':obj['value']} for obj in json.loads(downloads['all_totals.json'].text)['rows']])),orient = 'records')

In [490]:
all_set_totals = all_totals.set_index('day').pivot(columns = 'type').reset_index()

In [158]:
!pip install plotly

Collecting plotly
[?25l  Downloading https://files.pythonhosted.org/packages/15/90/918bccb0ca60dc6d126d921e2c67126d75949f5da777e6b18c51fb12603d/plotly-4.6.0-py2.py3-none-any.whl (7.1MB)
[K     |████████████████████████████████| 7.2MB 355kB/s eta 0:00:01
Collecting retrying>=1.3.3 (from plotly)
  Downloading https://files.pythonhosted.org/packages/44/ef/beae4b4ef80902f22e3af073397f079c96969c69b2c7d52a57ea9ae61c9d/retrying-1.3.3.tar.gz
Building wheels for collected packages: retrying
  Building wheel for retrying (setup.py) ... [?25ldone
[?25h  Created wheel for retrying: filename=retrying-1.3.3-cp37-none-any.whl size=11429 sha256=b27e970bdfc908c71f1ac6a47a5ff2ed4e66ef8372e685f745192a14416b8d97
  Stored in directory: /home/akash/.cache/pip/wheels/d7/a9/33/acc7b709e2a35caa7d4cae442f6fe6fbf2c43f80823d46460c
Successfully built retrying
Installing collected packages: retrying, plotly
Successfully installed plotly-4.6.0 retrying-1.3.3


### What does a graph of the number of cases per day look like in India?

In [491]:
import plotly.graph_objs as go

In [492]:
fig = go.Figure(data = [go.Scatter(name = 'no. of cases',
                              x = total_confirmed_cases['date'],
                              y = total_confirmed_cases['cases'],
                             marker = {'opacity':1,'showscale':False})],
                
               layout = go.Layout(title = 'Total number of cases in India over time.',
                                  xaxis = {'title':'Date'},
                                  yaxis = {'title':'No. of cases'},
                                  plot_bgcolor = 'white',
                                 showlegend = False
                                )
               )
fig.show()

### What does a graph of the growth of cases per day look like in India?

In [493]:
total_confirmed_cases['growth'] = total_confirmed_cases['cases']

In [494]:
for i in range(total_confirmed_cases.shape[0]-1,0,-1):
    total_confirmed_cases['growth'].loc[i] = total_confirmed_cases['growth'].loc[i] - total_confirmed_cases['growth'].loc[i-1] 
    

In [495]:
total_confirmed_cases.head(10)

Unnamed: 0,date,cases,growth
0,2020-01-30 13:33:00+05:30,1,1
1,2020-02-02 10:39:00+05:30,2,1
2,2020-02-03 12:13:00+05:30,3,1
3,2020-03-02 14:28:00+05:30,5,2
4,2020-03-03 19:36:00+05:30,6,1
5,2020-03-10 12:00:00+05:30,47,41
6,2020-03-11 17:30:00+05:30,60,13
7,2020-03-12 11:00:00+05:30,73,13
8,2020-03-12 18:00:00+05:30,74,1
9,2020-03-13 10:00:00+05:30,75,1


In [496]:
fig = go.Figure(data = [go.Scatter(name = 'no. of cases per day',
                              x = total_confirmed_cases['date'],
                              y = total_confirmed_cases['growth'],
                             marker = {'opacity':1,'showscale':False}),
                        go.Scatter(name = 'Moving average',
                              x = total_confirmed_cases['date'],
                              y = total_confirmed_cases['growth'].rolling(7).mean(),
                             marker = {'opacity':1,'showscale':False})
                       ],
                
               layout = go.Layout(title = 'Growth in number of cases in India over per day.',
                                  xaxis = {'title':'Date'},
                                  yaxis = {'title':'Growth in no. of cases'},
                                  plot_bgcolor = 'white',
                                 showlegend = False
                                )
               )
fig.show()

Number of cases are not increasing every day. It fluctuates quite a lot , this however could depend on a lot of things , like number of tests done that day . The general trend is however upwards.

### Total number of non virus deaths

In [497]:
non_virus_deaths.deaths.sum()

137

### Total number of virus deaths

In [498]:
all_set_totals.value.death.iloc[-1:]

68    149
Name: death, dtype: int64

In [499]:
non_virus_modified = pd.get_dummies(non_virus_deaths.explode('reason'),columns = ['reason'],prefix = None)

In [500]:
non_virus_modified.rename(columns=lambda x: x.replace('reason_',''), inplace=True)

In [501]:
non_virus_modified.columns

Index(['_id', '_rev', 'type', 'location', 'district', 'state', 'incident_date',
       'deaths', 'source_date', 'source_link', 'source',
       'Access to care denied', 'Asked to leave after lockdown',
       'Asphyxiation', 'Assault', 'Buried under snow', 'Death',
       'Death by waiting in the sun', 'Dehydration', 'Delay in treatment',
       'Died under mysterious circumstances', 'Drank aftershave',
       'Exhausation', 'Exhaustion', 'Farmer suicide', 'Fear of infection',
       'Fear of lockdown', 'Fear of police after escaping quaratine',
       'Forest fire', 'Front-line medical staff work stress', 'Got sick',
       'Had cancer', 'Hate crime', 'Health deterioration', 'Home qurantine',
       'Icu closed', 'Lack of transportation', 'Lack of treatment',
       'Lack of work & food', 'Lathicharge',
       'Left untreated by doctor due to lockdown', 'Lockdown',
       'Lockdown caused loss of livelihood', 'Lonliness',
       'Lost balance of the bike and hit divider while going ho

In [502]:
reasons = ['Access to care denied', 'Asked to leave after lockdown',
       'Asphyxiation', 'Assault', 'Buried under snow', 'Death',
       'Death by waiting in the sun', 'Dehydration', 'Delay in treatment',
       'Died under mysterious circumstances', 'Drank aftershave',
       'Exhausation', 'Exhaustion', 'Farmer suicide', 'Fear of infection',
       'Fear of lockdown', 'Fear of police after escaping quaratine',
       'Forest fire', 'Front-line medical staff work stress', 'Got sick',
       'Had cancer', 'Hate crime', 'Health deterioration', 'Home qurantine',
       'Icu closed', 'Lack of transportation', 'Lack of treatment',
       'Lack of work & food', 'Lathicharge',
       'Left untreated by doctor due to lockdown', 'Lockdown',
       'Lockdown caused loss of livelihood', 'Lonliness',
       'Lost balance of the bike and hit divider while going home',
       'Medical emergency', 'Medical negligance', 'Migration',
       'No staff around', 'Police beating', 'Police brutality', 'Quarantine',
       'Road accident', 'Road blocked', 'Roadblock', 'Starvation', 'Stigma',
       'Stray dog snatched a newborn', 'Stuck under a pit during migration',
       'Sucide', 'Suicide', 'Suspected patient',
       'Travelled by moped with family',
       'Unclear - financial issues (not connected to covid directly) or isolation',
       'Walking', 'Withdrawal', 'Withdrawal symptoms', '\withdrawal']

for column in reasons:
    non_virus_modified[column] = non_virus_modified[column].mul(non_virus_modified['deaths'])

In [503]:
non_virus_modified.columns[11:]

Index(['Access to care denied', 'Asked to leave after lockdown',
       'Asphyxiation', 'Assault', 'Buried under snow', 'Death',
       'Death by waiting in the sun', 'Dehydration', 'Delay in treatment',
       'Died under mysterious circumstances', 'Drank aftershave',
       'Exhausation', 'Exhaustion', 'Farmer suicide', 'Fear of infection',
       'Fear of lockdown', 'Fear of police after escaping quaratine',
       'Forest fire', 'Front-line medical staff work stress', 'Got sick',
       'Had cancer', 'Hate crime', 'Health deterioration', 'Home qurantine',
       'Icu closed', 'Lack of transportation', 'Lack of treatment',
       'Lack of work & food', 'Lathicharge',
       'Left untreated by doctor due to lockdown', 'Lockdown',
       'Lockdown caused loss of livelihood', 'Lonliness',
       'Lost balance of the bike and hit divider while going home',
       'Medical emergency', 'Medical negligance', 'Migration',
       'No staff around', 'Police beating', 'Police brutality', 'Quar

In [504]:
non_virus_modified['Suicide'] = non_virus_modified['Suicide'].add(non_virus_modified['Sucide'])
non_virus_modified['Withdrawal'] = non_virus_modified['Withdrawal'].add(non_virus_modified['Withdrawal symptoms'])
non_virus_modified['Withdrawal'] = non_virus_modified['Withdrawal'].add(non_virus_modified['\withdrawal'])

In [505]:
non_virus_modified.drop(columns = ['Suicide','Withdrawal symptoms','\withdrawal'],inplace = True)

In [506]:
non_virus_modified[non_virus_modified.columns[11:]].head(5)

Unnamed: 0,Access to care denied,Asked to leave after lockdown,Asphyxiation,Assault,Buried under snow,Death,Death by waiting in the sun,Dehydration,Delay in treatment,Died under mysterious circumstances,...,Starvation,Stigma,Stray dog snatched a newborn,Stuck under a pit during migration,Sucide,Suspected patient,Travelled by moped with family,Unclear - financial issues (not connected to covid directly) or isolation,Walking,Withdrawal
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0


In [507]:
causes_and_deaths = non_virus_modified[non_virus_modified.columns[11:]].sum()

In [508]:
fig = go.Figure(data=[go.Pie(labels = causes_and_deaths.index, values=causes_and_deaths)])
fig.show()

This graph tells us that 15% (37) of the non virus deaths reported in India are due to lockdown or migration. This can prove to be a seperate problem to India where relaxing lockdown policy would become necessary if this trend continues within the next 2 weeks.

### What does the number of positive to testing ratio look like ? 

In [509]:
us_response = requests.get('https://covidtracking.com/api/us/daily.csv')

In [510]:
us_data = pd.read_csv(StringIO(us_response.text))

In [511]:
us_tests = us_data.positive / (us_data.total + us_data.pending + us_data.negative)
india_tests = icmr_testing.confirmed_positive / icmr_testing.samples
india_tests.index = india_tests.index + 9

In [512]:
fig = go.Figure(data = [go.Scatter(name = 'US positive to test ratio',
                              x = list(us_tests.index),
                              y = list(us_tests)[::-1],
                             marker = {'opacity':1,'showscale':False}),
                        go.Scatter(name = 'India positive to test ratio',
                              x = list(india_tests.index),
                              y = india_tests,
                             marker = {'opacity':1,'showscale':False})
                       ],
                
               layout = go.Layout(title = 'US vs India positive to test ratio',
                                  xaxis = {'title':'Date'},
                                  yaxis = {'title':'%Tested positive'},
                                 showlegend = False
                                )
               )
fig.show()