# COVID-19 DEATH RATE INDIA

This jupyter book tests the bits of the program which ensembles the 
data from `Ministry of Health & Family Welfare India` for COVID-19 spread summary

STANDARDS
 - Date_Format  `DD-MM-YYYY`
 
VERSION 2.0
 -  22 MAR 2020
 - Source site formate changed

VERSION 1.0
 - 19 MAR 2020
 - Base program
 - column value hardcoded due to heterogenous data

In [1]:
# libraries for the soup
import pandas as pd     # creating structured data
import requests         # fetching the page containing data
import re               # dissecting data
from bs4 import BeautifulSoup   # parsing the page to ease data extractioin

In [2]:
# checking connections
try:
    url = "https://www.mohfw.gov.in/"
    response = requests.get(url)
    print("Checking connections ...", response)
    if len(re.findall('200', str(response))) != 0:
        print("CONN OK")
    else: 
        print("CONN ERR\n EXIT")
        quit()
except Exception as e:
    print("Err Establishing Connection, Check connectivity.")
    quit()
# parsing retrieved html with beautiful soup
crude_data = BeautifulSoup(response.text, 'html.parser')

Checking connections ... <Response [200]>
CONN OK


### META INFO *snapshot*
![ref img: /log/screenshots](data_src_22_03_2020.png)

### STATE-WISE INFORMAITION *snapshot*
![ref img: /log/screenshots](data_src_22_03_2020_2.png)

## 1. Extracting META INFO

>each `META_DATA` set:
- Total number of passengers screened at airport
- Total number of Active COVID 2019 cases across India
- Total number of Discharged/Cured COVID 2019 cases across India
- Total number of Migrated COVID-19 Patient
- Total number of Deaths due to COVID 2019 across India
- remarks , *example* `(*including foreign nationals, as on 19.03.2020 at 05:00 PM)` 
    - (self created column)
    - data as its from source page
    - this is parse to extract data and time information

Each set of above data is an `observation`

In [21]:
# Dictionary for storing set of observations
observation = dict()

In [23]:
# Extracting informations of the META_DATA set
# Block last inspected date 22-03-2020
block = crude_data.findAll('div', attrs = {'class': 'iblock_text'})

for row in block:
    val = row.find('span').text
    col = row.find('div').text
    observation[col.strip()] = [ int(val.replace(',', '')), ]   # indian number system uses <comma> as separated for lakh,thousands
#print(observation)

{'Passengers screened at airport': [1490303], 'Active COVID 2019 cases *': [295], 'Cured/discharged cases': [23], 'Death cases': [5], 'Migrated COVID-19 Patient': [1]}


In [30]:
# Extracting date, time, remakrs
remark = crude_data.find('div', attrs = {'class': 'content newtab'}).find('p').text
meta_date = re.findall("[0-3][0-9][.][0-1][0-9][.]202[0-9]", remark)    # DD-MM-YYYY formation
meta_time = re.findall("[0-9][0-9]:[0-9][0-9] [AP]M", remark)              # 12-hours HH:MM AM/PM
observation['date'] = meta_date
observation['time'] = meta_time
observation['remark'] = [remark]
#print(observation)

{'Passengers screened at airport': [1490303], 'Active COVID 2019 cases *': [295], 'Cured/discharged cases': [23], 'Death cases': [5], 'Migrated COVID-19 Patient': [1], 'date': ['22.03.2020'], 'time': ['11:45 AM'], 'remark': ['(*including foreign nationals, as on 22.03.2020 at 11:45 AM)']}


In [31]:
# Loading past observations
file_meta = "covid_meta.csv"
try:
    df_meta = pd.read_csv(file_meta)
except FileNotFoundError:
    print("File 'covid_meta.csv' not found. CREATING")
    df_meta = pd.DataFrame()

In [32]:
# Appending new observation with old
df_tmp = pd.DataFrame(observation)
df_meta = df_meta.append(df_tmp, sort=False)
df_meta.to_csv(file_meta, index = False)    # Writing to file

In [None]:
print("META FILE: OK")
print("PREVIEW")

In [33]:
# Preview information

df = pd.read_csv(file_meta)
df.set_index(['date', 'time'], inplace= True)
df.tail()


Unnamed: 0_level_0,Unnamed: 1_level_0,remark,Total number of passengers screened at airport,Total number of Active COVID 2019 cases across India *,Total number of Discharged/Cured COVID 2019 cases across India *,Total number of Migrated COVID-19 Patient *,Total number of Deaths due to COVID 2019 across India *,Passengers screened at airport,Active COVID 2019 cases *,Cured/discharged cases,Death cases,Migrated COVID-19 Patient
date,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
20.03.2020,09:00 AM,"(*including foreign nationals, as on 20.03.202...",1431734.0,171.0,19.0,1.0,4.0,,,,,
20.03.2020,05:00 PM,"(*including foreign nationals, as on 20.03.202...",1459993.0,196.0,22.0,1.0,4.0,,,,,
21.03.2020,09:00 AM,"(*including foreign nationals, as on 21.03.202...",1459993.0,231.0,22.0,1.0,4.0,,,,,
21.03.2020,04:45 PM,"(*including foreign nationals, as on 21.03.202...",1490303.0,256.0,22.0,1.0,4.0,,,,,
22.03.2020,11:45 AM,"(*including foreign nationals, as on 22.03.202...",,,,,,1490303.0,295.0,23.0,5.0,1.0


## 2. Extracting state wise information

`observations` is a list of `observation` which has the following set of information.

1. S.No. 
2. Name of State / UT
3. Total Confirmed cases (Indian National)
4. Total Confirmed cases ( Foreign National )
5. Cured/Discharged/Migrated
2. Death

In [37]:
rows = crude_data.find('div', attrs = {'class': 'content newtab'}).findAll('tr')
rows[1]

<tr>
<td align="'centre" width="47">1</td>
<td align="'centre" valign="bottom" width="83">Andhra Pradesh</td>
<td align="'centre" valign="bottom" width="91">3</td>
<td align="'centre" valign="top" width="90">0</td>
<td align="'centre" valign="top" width="83">0</td>
<td align="'centre" valign="top" width="83">0</td>
</tr>

In [38]:
# Extracting each observation and appending to observations
rows = crude_data.find('div', attrs = {'class': 'content newtab'}).findAll('tr')
observations = []
for row in rows[1:-1]:    # 1st or 0th index belongs to header, last row refers to summed info (total)
    observation = {}
    values = row.text.strip('\n')
    values = values.replace(",", '')
    values_list  = values.split('\n')
    observation['date'] = meta_date[0]
    observation['time'] = meta_time[0]
    observation['Name of State / UT'] = str(values_list[1])
    observation['Total Confirmed cases (Indian National)'] = int(values_list[2])
    observation['Total Confirmed cases ( Foreign National )'] = int(values_list[3])
    observation['Cured/Discharged/Migrated'] = int(values_list[4])
    observation['Death'] = int(values_list[5])
    observations.append(observation)
#print(observation)

{'date': '22.03.2020', 'time': '11:45 AM', 'Name of State / UT': 'West Bengal', 'Total Confirmed cases (Indian National)': 4, 'Total Confirmed cases ( Foreign National )': 0, 'Cured/Discharged/Migrated': 0, 'Death': 0}


 draw a total day wise results when performing data analysis

In [42]:
# Loading past observations
file_data = "covid.csv"
try:
    df = pd.read_csv(file_data)
except FileNotFoundError:
    print("File 'covid' not found. CREATING")
    df= pd.DataFrame()

In [43]:
# Appending new observation with old
df_tmp = pd.DataFrame(observations)
df = df.append(df_tmp, sort=False)
df.to_csv(file_data, index = False)    # Writing to file

In [44]:
print("DATA : OK")

DATA : OK


In [45]:
# Preview information

df = pd.read_csv(file_data)
df.set_index(['date', 'time'], inplace= True)
df.tail()



Unnamed: 0_level_0,Unnamed: 1_level_0,Cured/Discharged/Migrated,Death,Name of State / UT,Total Confirmed cases ( Foreign National ),Total Confirmed cases (Indian National)
date,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
22.03.2020,11:45 AM,0,0,Jammu and Kashmir,0,4
22.03.2020,11:45 AM,0,0,Ladakh,0,13
22.03.2020,11:45 AM,9,0,Uttar Pradesh,1,24
22.03.2020,11:45 AM,0,0,Uttarakhand,0,3
22.03.2020,11:45 AM,0,0,West Bengal,0,4
