# COVID-19 DEATH RATE INDIA

This jupyter book tests the bits of the program which ensembles the 
data from `Ministry of Health & Family Welfare India` for COVID-19 spread summary

STANDARDS
 - Date_Format  `DD-MM-YYYY`
 
VERSION 3.0
  - 31 MAR 2020
  - source site formate changed
  - date format changed
 
VERSION 2.0
 -  22 MAR 2020
 - Source site formate changed

VERSION 1.0
 - 19 MAR 2020
 - Base program
 - column value hardcoded due to heterogenous data

In [7]:
# libraries for the soup
import pandas as pd     # creating structured data
import requests         # fetching the page containing data
import re               # dissecting data
from bs4 import BeautifulSoup   # parsing the page to ease data extractioin

In [8]:
# checking connections
try:
    url = "https://www.mohfw.gov.in/"
    response = requests.get(url)
    print("Checking connections ...", response)
    if len(re.findall('200', str(response))) != 0:
        print("CONN OK")
    else: 
        print("CONN ERR\n EXIT")
        quit()
except Exception as e:
    print("Err Establishing Connection, Check connectivity.")
    quit()
# parsing retrieved html with beautiful soup
crude_data = BeautifulSoup(response.text, 'html.parser')

Checking connections ... <Response [200]>
CONN OK


### META INFO *snapshot*
![ref img: /log/screenshots](data_src_22_03_2020.png)

### STATE-WISE INFORMAITION *snapshot*
![ref img: /log/screenshots](data_src_22_03_2020_2.png)

## 1. Extracting META INFO

>each `META_DATA` set:
- Total number of passengers screened at airport
- Total number of Active COVID 2019 cases across India
- Total number of Discharged/Cured COVID 2019 cases across India
- Total number of Migrated COVID-19 Patient
- Total number of Deaths due to COVID 2019 across India
- remarks , *example* `(*including foreign nationals, as on 19.03.2020 at 05:00 PM)` 
    - (self created column)
    - data as its from source page
    - this is parse to extract data and time information

Each set of above data is an `observation`

In [9]:
# Dictionary for storing set of observations
observation = dict()

In [10]:
# Extracting date, time, remakrs
remark = crude_data.find('div', attrs = {'class': 'status-update'}).find('span').text
meta_date = re.findall("[0-3][0-9] [a-zA-Z]* 202[0-9]", remark)    # DD-MM-YYYY formation
meta_time = re.findall(" [0-9][0-9]:[0-9][0-9] ", remark)              # 12-hours HH:MM AM/PM
observation['date'] = meta_date
observation['time'] = meta_time
observation['remark'] = [remark]
print(observation)

{'date': ['31 March 2020'], 'time': [' 20:30 '], 'remark': ['as on : 31 March 2020, 20:30 GMT+5:30']}


In [11]:
# Extracting informations of the META_DATA set
# Block last inspected date 22-03-2020
block = crude_data.find('div', attrs = {'class': 'site-stats-count'}).findAll('li')

for row in block[:-1]:
    val = row.find('strong').text
    col = row.find('span').text
    try:
        observation[col.strip()] = [ int(val.replace(',', '')), ]   # indian number system uses <comma> as separated for lakh,thousands
    except ValueError :
        observation[col.strip()] = [ val, ]   # indian number system uses <comma> as separated for lakh,thousands

print(observation)

{'date': ['31 March 2020'], 'time': [' 20:30 '], 'remark': ['as on : 31 March 2020, 20:30 GMT+5:30'], 'Active Cases': [1238], 'Cured / Discharged': [123], 'Deaths': [35], 'Migrated': [1]}


In [46]:
# Loading past observations
file_meta = "covid_meta.csv"
try:
    df_meta = pd.read_csv(file_meta)
except FileNotFoundError:
    print("File 'covid_meta.csv' not found. CREATING")
    df_meta = pd.DataFrame()

In [47]:
# Appending new observation with old
df_tmp = pd.DataFrame(observation)
df_meta = df_meta.append(df_tmp, sort=False)
df_meta.to_csv(file_meta, index = False)    # Writing to file

In [8]:
print("META FILE: OK")
print("PREVIEW")

META FILE: OK
PREVIEW


In [48]:
# Preview information

df = pd.read_csv(file_meta)
df.set_index(['date', 'time'], inplace= True)
df.tail()


Unnamed: 0_level_0,Unnamed: 1_level_0,remark,Total number of passengers screened at airport,Total number of Active COVID 2019 cases across India *,Total number of Discharged/Cured COVID 2019 cases across India *,Total number of Migrated COVID-19 Patient *,Total number of Deaths due to COVID 2019 across India *,Passengers screened at airport,Active COVID 2019 cases *,Cured/discharged cases,Death cases,Migrated COVID-19 Patient,Active COVID 2019 cases #,Active Cases,Cured / Discharged,Deaths,Migrated
date,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
29.03.2020,07:30 PM,"(*Including 48 foreign Nationals, as on 29.03....",,,,,,1524266.0,901.0,95.0,27.0,1.0,,,,,
30.03.2020,10:30 AM,"(*Including 49 foreign Nationals, as on 30.03....",,,,,,1524266.0,942.0,99.0,29.0,1.0,,,,,
30.03.2020,09:30 PM,"(*Including 49 foreign Nationals, as on 30.03....",,,,,,1524266.0,,101.0,32.0,1.0,1117.0,,,,
30.03.2020,09:30 PM,"(*Including 49 foreign Nationals, as on 30.03....",,,,,,1524266.0,,101.0,32.0,1.0,1117.0,,,,
31 March 2020,20:30,"as on : 31 March 2020, 20:30 GMT+5:30",,,,,,,,,,,,1238.0,123.0,35.0,1.0


## 2. Extracting state wise information

`observations` is a list of `observation` which has the following set of information.

1. S.No. 
2. Name of State / UT
3. Total Confirmed cases (Indian National)
4. Total Confirmed cases ( Foreign National )
5. Cured/Discharged/Migrated
2. Death

In [12]:
# Extracting each observation and appending to observations
rows = crude_data.find('section', attrs= {'id': 'state-data'}).find('table', attrs = {'class': 'table table-striped'}).findAll('tr')
rows[1]
observations = []
for row in rows[1:-1]:    # 1st or 0th index belongs to header, last row refers to summed info (total)
    observation = {}
    values = row.text.strip('\n')
    values = values.replace(",", '')
    values_list  = values.split('\n')
    observation['date'] = meta_date[0]
    observation['time'] = meta_time[0]
    observation['Name of State / UT'] = str(values_list[1])
    observation['Total Confirmed cases (Indian National)'] = (values_list[2])
    observation['Cured/Discharged/Migrated'] = (values_list[3])
    observation['Death'] = (values_list[4])
    observations.append(observation)
print(observation)

{'date': '31 March 2020', 'time': ' 20:30 ', 'Name of State / UT': '1397#', 'Total Confirmed cases (Indian National)': '', 'Cured/Discharged/Migrated': '124', 'Death': ''}


 draw a total day wise results when performing data analysis

In [13]:
# Loading past observations
file_data = "covid.csv"
try:
    df = pd.read_csv(file_data)
except FileNotFoundError:
    print("File 'covid' not found. CREATING")
    df= pd.DataFrame()

In [14]:
# Appending new observation with old
df_tmp = pd.DataFrame(observations)
df = df.append(df_tmp, sort=False)
df.to_csv(file_data, index = False)    # Writing to file

In [14]:
print("DATA : OK")

DATA : OK


In [15]:
# Preview information

df = pd.read_csv(file_data)
df.set_index(['date', 'time'], inplace= True)
df.tail()



Unnamed: 0_level_0,Unnamed: 1_level_0,Cured/Discharged/Migrated,Death,Name of State / UT,Total Confirmed cases ( Foreign National ),Total Confirmed cases (Indian National)
date,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
31 March 2020,20:30,1.0,1.0,Telengana,,79.0
31 March 2020,20:30,2.0,0.0,Uttarakhand,,7.0
31 March 2020,20:30,14.0,0.0,Uttar Pradesh,,101.0
31 March 2020,20:30,0.0,2.0,West Bengal,,26.0
31 March 2020,20:30,124.0,,1397#,,
