#  [Visualizing India's COVID19 vaccination progress](https://www.moad.computer/blog/visualizing-indias-covid19-vaccination-progress)
## Author: [Dr. Rahul Remanan](https://www.linkedin.com/in/rahulremanan/)
## CEO, [Moad Computer](https://moad.computer)

[**Run this in Google Colab**](https://colab.research.google.com/github/MoadComputer/covid19-visualization/blob/main/examples/Visualizing_India_COVID19_vaccination_progress.ipynb)

The interactive dashboard built using [Altair](https://altair-viz.github.io/), a declarative visualization library in Python, visualizes the India's progress in COVID19 vaccination status.

The state-wise vaccination statistics is sourced from the [Indian government's ministry of health and family welfare website](https://mohfw.gov.in).

This notebook is divided into two parts:


*   Part 01 -- Demo visualization using Altair
*   Part 02 -- Visualizing vaccination statistics of India



In [None]:
DATA_UPDATE_DATE = '12-April-2022'
DATA_SOURCE = 'https://mohfw.gov.in'

# Setup

## Helper functions

In [None]:
#@title
def colab_mode():
  try:
    from google import colab
    return True
  except:
    return False

def kaggle_mode():
  try:
    import kaggle_datasets
    return True
  except:
    return False

def apply_corrections(input_df):
  for state in list(input_df['state'].values):
    input_df.loc[input_df['state']==state,'state']=re.sub('[^A-Za-z ]+', '',str(state))
  input_df.loc[input_df['state']=='Karanataka','state']='Karnataka' 
  input_df.loc[input_df['state']=='Himanchal Pradesh','state']='Himachal Pradesh' 
  input_df.loc[input_df['state']=='Telengana','state']='Telangana'  
  input_df.loc[input_df['state']=='Dadra and Nagar Haveli','state']='Dadra and Nagar Haveli and Daman and Diu'
  input_df.loc[input_df['state']=='Dadar Nagar Haveli','state']='Dadra and Nagar Haveli and Daman and Diu'
  input_df.loc[input_df['state']=='Dadra Nagar Haveli','state']='Dadra and Nagar Haveli and Daman and Diu'
  input_df.loc[input_df['state']=='Daman & Diu','state']='Dadra and Nagar Haveli and Daman and Diu'
  input_df.loc[input_df['state']=='Daman and Diu','state']='Dadra and Nagar Haveli and Daman and Diu'
  return input_df

def json_writer(json_input, json_output='output.json'):
  with open(json_output, 'w') as f:
    f.write(json_input)

def custom_tooltips():
  return [{'field' :'properties.state', 
           'type'  :'nominal', 
           'title' : 'State'}, 
          {'field' :'properties.fully_vaccinated_percentage', 
           'type'  :'quantitative',
           'title' :'Fully vaccinated (%)'},
          {'field' :'properties.partly_vaccinated_percentage', 
           'type'  :'quantitative',
           'title' :'Single dose (%)'},
          {'field' :'properties.dose_1', 
           'type'  :'quantitative',
           'title' :'1st dose administered'},
          {'field' :'properties.dose_2', 
           'type'  :'quantitative',
           'title' :'2nd dose administered'},
          {'field' :'properties.population', 
           'type'  :'quantitative',
           'title' :'Total population'},
          {'field' :'properties.update_date', 
           'type'  :'nominal',
           'title' :'Updated on'},
          {'field' :'properties.data_source', 
           'type'  :'nominal',
           'title' :'Data from'}]

In [None]:
if colab_mode():
  !python3 -m pip install -q git+https://github.com/altair-viz/altair
  !python3 -m pip install -q geopandas
if kaggle_mode():
  !python3 -m pip install -q vega_datasets

# Import libraries

In [None]:
import re, json, geopandas, altair as alt, numpy as np, pandas as pd
from tqdm import tqdm
from vega_datasets import data

# Part 01 -- [Demo visualization of US airport locations](https://github.com/altair-viz/altair/blob/master/altair/examples/airports.py)
This is an example visualization using Altair to demonstrate its ability to interactively superimpose relevant statistics on corresponding geographical data.

It is adapted using the official example script in Altair library. 

To minimize data dependencies of this example script, Altair's built-in airport location data and mapping information are used here.

In [None]:
airports = data.airports.url
states = alt.topo_feature(data.us_10m.url, feature='states')

# The state-wise map of the US as background
background = alt.Chart(states).mark_geoshape(
    fill='lightblue',
    stroke='white'
).properties(
    width=500,
    height=300
).project('albersUsa')

# Overlay airport counts on the background
points = alt.Chart(airports).transform_aggregate(
    latitude='mean(latitude)',
    longitude='mean(longitude)',
    count='count()',
    groupby=['state']
).mark_circle().encode(
    longitude='longitude:Q',
    latitude='latitude:Q',
    size=alt.Size('count:Q', title='Number of Airports'),
    color=alt.value('red'),
    tooltip=['state:N', 'count:Q']
).properties(
    title='Number of airports in each US state'
)

background + points

# Part 02 -- Visualizing COVID19 vaccination progress in India

This is an interactive dashboard for visualizing the India's progress in COVID19 vaccinations. It is built using Altair and [GeoPandas](https://geopandas.org/en/stable/). 

The state-wise vaccination statistics is sourced from the [Indian government's ministry of health and family welfare website](https://mofw.gov.in).

## Read map data and vaccination statistics
The state-wise map data and the latest vaccination status for India are obtained from the Indian government's ministry of health and family welfare website and cached in the project's [GitHub repo](https://github.com/MoadComputer/covid19-visualization).

In [None]:
REPO_URL = 'https://raw.githubusercontent.com/MoadComputer/covid19-visualization/main/data'

India_statewise = geopandas.read_file(
  f'{REPO_URL}/GeoJSON_assets/India.geojson')
India_stats = pd.read_csv(
  f'{REPO_URL}/Coronavirus_stats/India/Population_stats_India_statewise.csv')
covid19_data = pd.read_csv(
  f'{REPO_URL}/Coronavirus_stats/India/COVID19_India_statewise.csv')
preds_df = pd.read_csv(
  f'{REPO_URL}/Coronavirus_stats/India/experimental/output_preds.csv')
India_vaccinations = pd.read_csv(
  f'{REPO_URL}/Coronavirus_stats/India/COVID19_vaccinations_India_statewise.csv')

India_statewise = apply_corrections(India_statewise)
India_stats = apply_corrections(India_stats)
India_vaccinations = India_vaccinations[India_vaccinations.state != 'Miscellaneous']
India_vaccinations = apply_corrections(India_vaccinations)

covid19_data = apply_corrections(covid19_data)

In [None]:
json_writer(India_statewise.to_json(), 'India.geojson')

## Pre-process

In [None]:
covid19_data = pd.merge(covid19_data, India_stats, on='state', how='left')
covid19_data = pd.merge(covid19_data, India_vaccinations, on='state', how='left')
covid19_data = pd.merge(India_statewise, covid19_data, on='state', how='left')

## Automated error correction
A very simple, yet effective error correction code to catch some simple mistakes in the source MoHFW sourced COVID19 vaccine statistics for India.

In [None]:
def vac_dose_ecc(input_df):
  for s in tqdm(input_df['state']):
    dose_1 = int(input_df.loc[input_df['state']==s]['dose_1'].to_numpy()[0])
    dose_2 = int(input_df.loc[input_df['state']==s]['dose_2'].to_numpy()[0])
    if dose_1 < dose_2:
      print(
       f'\nFound an entry for: {s} with, \
       \n\t second doses ({dose_2}) greater than first doses ({dose_1}) ...')
      print('Whoops!!! Seems like a mathematical impossibility ...')
      print(f'Autoswitching dose 1 and dose 2 columns for: {s} ...')
      input_df.loc[input_df['state']==s, 'dose_1'] = dose_2
      input_df.loc[input_df['state']==s, 'dose_2'] = dose_1
  return input_df
    
covid19_data = vac_dose_ecc(covid19_data)

In [None]:
covid19_data['fully_vaccinated_percentage'] = (
    (covid19_data['dose_2']/covid19_data['population'])*100).astype(np.uint8)
covid19_data['partly_vaccinated_percentage'] = (
    (covid19_data['dose_1']/covid19_data['population'])*100).astype(np.uint8)

covid19_data = covid19_data.fillna(0)

covid19_data['update_date'] = DATA_UPDATE_DATE
covid19_data['data_source'] = DATA_SOURCE

In [None]:
covid19_data.head(2)

## Set Altair rendering options

In [None]:
alt.renderers.enable(embed_options={'actions': False})

## Plot vaccine statistics on the state-wise map of India

In [None]:
map_data = alt.Data(
    values=covid19_data.to_json(), format=alt.DataFormat(property='features', type='json'))

In [None]:
vac_plot = alt.Chart(map_data).mark_geoshape().encode( 
    color={'field' : 'properties.fully_vaccinated_percentage', 
           'type'  : 'quantitative', 
           'title' : 'Fully vaccinated (%)',
           'scale' : alt.Scale(scheme='greens')}, 
    tooltip=custom_tooltips(),
).properties(width=400, height=480
).project('mercator')

In [None]:
vac_plot

## Save plot as HTML
The saved HTML plot can be used for web deployment.

In [None]:
vac_plot.save('India_vaccination.html', embed_options={'actions': {
    'export': True,
    'source': False,
    'compiled': False,
    'editor': False
  }})