### [Source covid19](https://www.covid19india.org/)
https://api.covid19india.org/

Recommended approach from doc: json parsing of V4 endopoints
		
| Status | Link to API | Description |
| --- | --- | --- |
| <img src=https://github.githubassets.com/images/icons/emoji/unicode/1f49a.png width="20"> | https://api.covid19india.org/v4/min/timeseries.min.json | Daily numbers across C,R,D and Tested per state (historical data) |
| <img src=https://github.githubassets.com/images/icons/emoji/unicode/1f49a.png width="20"> | https://api.covid19india.org/v4/min/data.min.json | Current day numbers across districts and states |
| <img src=https://github.githubassets.com/images/icons/emoji/unicode/1f49a.png width="20"> | https://api.covid19india.org/v4/min/data-all.min.json | Per day numbers across districts and states - consider using timeseries in place of this. This is a huge file and is a mix of timeseries and data.min.json |

**Doc Note**: *Please consider using the above endpoints for all your data needs. All the data we show on the website is fuelled by the above endpoints.*

#### Time-series structure
Per state level time-series (*conf., rec., dec., tested, vacc.*)

https://api.covid19india.org/documentation/timeseries.min.html

In [None]:
import requests
import pandas as pd
import time

In [None]:
url = "https://api.covid19india.org/v4/min/timeseries.min.json"
response_ts = requests.get(url)

In [None]:
# read json and normalize
start_time = time.time()
wide_ts_df = pd.json_normalize(response_ts.json())
total_sec = time.time() - start_time
print(f"{round(total_sec,1)} secs execution")

In [None]:
# build long format from column names structure (renames as desired)
long_ts_df = wide_ts_df.columns.str.split(".", expand=True).droplevel(1).to_frame(
    index=False, name=["state", "time_period", "obs_type", "obs_cat"]
)

In [None]:
# add values from series
long_ts_df["val"] = wide_ts_df.values[0]

**Notes**

- no key for `delta` should take the neareast previous, eg: `AN, 2020-04-10, delta, recovered` not present means `AN, 2020-04-09, delta, recovered: 10` value has not changed
- `delta7` means "*7-day moving average*" --> calculations confirmed it's last 7 days **sum** rather than **avg**

In [None]:
long_ts_df.loc[30:40]

In [None]:
print(f"Total data points number: {len(long_ts_df.state)}")
states = long_ts_df.state.unique()
print(f"{len(long_ts_df.state.unique())} states:\n{states}")
types = long_ts_df.obs_type.unique()
print(f"obs_type:\n{types}")
categs = long_ts_df.obs_cat.unique()
print(f"obs_cat:\n{categs}")

##### Time-series data vis

In [None]:
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
from jupyter_dash import JupyterDash
import plotly.express as px

In [None]:
# detect proxy configuration for JupyterHub or Binder
JupyterDash.infer_jupyter_proxy_config()

In [None]:
# dropdowns: state, obs_type, obs_cat, time_period
dd_st = dcc.Dropdown(
    id="my_st",
    options=[
        {"label": value, "value": key}
        for key, value in zip(states, states)
    ],
    value='AN'
)
dd_type = dcc.Dropdown(
    id="my_typ",
    options=[
        {"label": value, "value": key}
        for key, value in zip(types, types)
    ],
    value='delta7'
)
dd_cat = dcc.Dropdown(
    id="my_cat",
    options=[
        {"label": value, "value": key}
        for key, value in zip(categs, categs)
    ],
    value='confirmed'
)
time_ps = sorted(long_ts_df.time_period.unique(), reverse=True)
dd_time = dcc.Dropdown(
    id="my_time",
    options=[
        {"label": value, "value": key}
        for key, value in zip(time_ps, time_ps)
    ],
    value='2021-05-01'
)

In [None]:
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
# Build App
app = JupyterDash(__name__, external_stylesheets=external_stylesheets)

In [None]:
# App Layout
app.layout = html.Div([
    html.H2("Indian States Covid Time-Series"),
    html.H6("Browse by State, type and category of obs_values and cut-off time"),
    html.Div([
        html.Div(
            ["Select State", dd_st],
            style={'width': '24%', 'display': 'inline-block'},
        ),
        html.Div(
            ["Select type", dd_type],
            style={'width': '24%', 'display': 'inline-block'},
        ),
        html.Div(
            ["Select category", dd_cat],
            style={'width': '24%', 'display': 'inline-block'},
        ),
        html.Div(
            ["Select cut-off time", dd_time],
            style={'width': '24%', 'display': 'inline-block'},
        ),
    ]),
    html.Br(),
    dcc.Graph(id='time-series')
])

In [None]:
# Define callback to update graph
@app.callback(
    Output("time-series", "figure"),
    Input("my_st", "value"),
    Input("my_typ", "value"),
    Input("my_cat", "value"),
    Input("my_time", "value"),
)
def query_2_plot(state, obs_type, obs_cat, co_time):
    # return all times if co_time None
    co_time = co_time if co_time else long_ts_df.time_period.min()
    # don't return plot if missing values for query
    if any([not state, not obs_type, not obs_cat]):
        return {}
    else:
        query = "state == @state & obs_type == @obs_type & obs_cat == @obs_cat & time_period > @co_time"
        fig = px.line(
            long_ts_df.query(query),
            x="time_period",
            y="val",
            line_shape="spline",
        ).update_traces(mode="lines+markers")
        return fig

In [None]:
# Run app and display result inline in the notebook
app.run_server(mode='inline')

#### "Data-all" data structure
Described as: *Per day numbers across states and districts - consider using timeseries in place of this -. This is a huge file and is a mix of time-series and current day data*

No documentantion @https://api.covid19india.org/

**Note**: time-series data don't go into district as descripted. Is state time-series and current day data enough?
- Actually I would need time-series at district level to go into *delta14_7* $\rightarrow$ exploration below

In [None]:
url = "https://api.covid19india.org/v4/min/data-all.min.json"
response_all = requests.get(url)

In [None]:
# our aim here --> districts delta confirmed if present for the previous week --> delta14_7
def conf_ds_deltaX_Y(json_resp, x=14, y=7):
    '''
    Json normalize from json_resp is time-processing unfeasible
    Thus, extract only 'delta confirmed' for all districts previous week
    :param json_resp: json response from 'data-all' Covid19 India API
    :param x: lower limit number of days (integer)
    :param y: upper limit number of days (integer)
    :return: truncated json, all districts data restricted to: prev. week delta confirmed
    '''
    # reported days series
    dates = pd.Series(list(json_resp.keys()))
    # latest reported date assumed equal to all districts
    last_date = pd.to_datetime(dates).max()
    # filter range of days
    cut_date_0 = (last_date - pd.to_timedelta(x, unit='d')).strftime('%Y-%m-%d')
    cut_date_1 = (last_date - pd.to_timedelta(y, unit='d')).strftime('%Y-%m-%d')
    filter_x_y = (dates > cut_date_0) & (dates <= cut_date_1)
    # loop through range of days and return data in nested dictionary
    trunc_json = {}
    for day in dates[filter_x_y]:
        trunc_json[day] = {}
        for st in json_resp[day]:
            if 'districts' in json_resp[day][st]:
                trunc_json[day][st] = {}
                for ds in json_resp[day][st]['districts']:
                    if 'delta' in json_resp[day][st]['districts'][ds]:
                        if 'confirmed' in json_resp[day][st]['districts'][ds]['delta']:
                            trunc_json[day][st][ds] = json_resp[day][st]['districts'][ds]['delta']['confirmed']
    return trunc_json

In [None]:
start_time = time.time()
trunc_json = conf_ds_deltaX_Y(response_all.json())
total_sec = time.time() - start_time
print(f"{round(total_sec,1)} secs execution")
# normalize truncated json with range of days
wide_ds_range_df = pd.json_normalize(trunc_json, sep='//')

In [None]:
# build long format from column names (renames as desired)
long_ds_range_df = wide_ds_range_df.columns.str.split("//", expand=True).to_frame(
    index=False, name=["time_period", "state", "district"]
)
# add delta confirmed values from series
long_ds_range_df["val"] = wide_ds_range_df.values[0]

In [None]:
long_ds_range_df

#### Current day data structure
State and details as of the current day: *contains information about districts*

https://api.covid19india.org/documentation/v4_data.html

In [None]:
url = "https://api.covid19india.org/v4/min/data.min.json"
response_data = requests.get(url)

##### State data
We here parse data at state level

In [None]:
# filter state metadata and districts out from json data
json_st = {
    key_1: {
        key_2: response_data.json()[key_1][key_2]
        for key_2 in response_data.json()[key_1] if key_2 not in ['districts', 'meta']
    } for key_1 in response_data.json()
}

In [None]:
# read json_st and normalize
wide_st_df = pd.json_normalize(json_st)
# build long format from column names structure (renames as desired)
long_st_df = wide_st_df.columns.str.split(".", expand=True).to_frame(
    index=False, name=["state", "obs_type", "obs_cat"]
)

In [None]:
# add values from series
long_st_df["val"] = wide_st_df.values[0]

**Notes**

- Doc caveat: any **obs_cat** category under key `delta` won't be present if a state/district doesn't see a change in such category (eg: `recovered`) for the current day
- Could any state/district not be even reported for the current day?
- research `delta21_14` meaning
- compute the `delta14_7` for situation analysis

###### Use time-series to compute delta14_7 for states

In [None]:
# filter states delta confirmed for the previous week --> delta14_7
def conf_st_deltaX_Y(st_ts_df, x=14, y=7):
    '''
    :param st_ts_df: state data timeseries Covid19 India API
    :param x: lower limit number of days (integer)
    :param y: upper limit number of days (integer)
    :return: dataframe to append (current day state data structure)
    '''
    # latest reported date assumed equal to all states/obs_types/obs_cat
    last_date = pd.to_datetime(st_ts_df.time_period).max()
    # filter range of days
    cut_date_0 = (last_date - pd.to_timedelta(x, unit='d')).strftime('%Y-%m-%d')
    cut_date_1 = (last_date - pd.to_timedelta(y, unit='d')).strftime('%Y-%m-%d')
    # obs_cat is confirmed
    obs_cat = 'confirmed'
    # query state timeseries (delta confirmed in range of days)
    query = "obs_type == 'delta' & obs_cat == @obs_cat  & time_period > @cut_date_0 & time_period <= @cut_date_1"
    # deltaX_Y calculated
    deltaX_Y_calc = st_ts_df.query(query).groupby('state').agg({'val': 'sum'}).reset_index()
    # obs_type is deltaX_Y
    obs_type = f"delta{x}_{y}"
    # fill cols obs_cat, obs_type with constants (match current day state data structure)
    deltaX_Y_calc['obs_cat'] = obs_cat
    deltaX_Y_calc['obs_type'] = obs_type
#     # reorder column 'val'
#     val_c = deltaX_Y_calc.val
#     deltaX_Y_calc.drop('val', axis = 1, inplace = True)
#     deltaX_Y_calc.insert(3, 'val', val_c)
    return deltaX_Y_calc

In [None]:
# # function test OK: match delta7 with delta7_0
# query_d7c = "obs_type == 'delta7' & obs_cat == 'confirmed'"
# pd.concat(
#     [
#         conf_st_deltaX_Y(long_ts_df, x=7, y=0).set_index('state'),
#         long_st_df.query(query_d7c).set_index('state').val,
#     ], axis = 1
# ).reset_index()

In [None]:
# pandas concat works with different column order (keeps first)
long_st_df = pd.concat([long_st_df, conf_st_deltaX_Y(long_ts_df)], ignore_index=True)
long_st_df

In [None]:
print(f"{len(long_st_df.state.unique())} states:")
print(long_st_df.state.unique())

##### State metadata
Metadata at state level, **important** information here: population of the state (based on NCP projections)

To join eventually into state data

In [None]:
# filter state metadata from json data
json_meta_st = {
    key_1: {
        key_2: response_data.json()[key_1][key_2]
        for key_2 in response_data.json()[key_1] if key_2 == 'meta'
    } for key_1 in response_data.json()
}

In [None]:
# read json_meta_st and normalize
wide_meta_st_df = pd.json_normalize(json_meta_st, max_level=2)
# build temporary long format from column names
long_meta_st_df = wide_meta_st_df.columns.str.split(".", expand=True).droplevel(1).to_frame(
    index=False, name=["state", "column"]
)
long_meta_st_df["val"] = wide_meta_st_df.values[0]
# pivot temporary long into state metadata table
meta_st_df = long_meta_st_df.pivot(index='state', columns='column', values='val').reset_index()
# delete index name `column` from pivot
meta_st_df.rename_axis(None, axis=1, inplace=True)

In [None]:
# un nest state metadata tested column
tested_df = meta_st_df.tested.apply(pd.Series).rename(
    columns={"date": "test_date", "source": "test_source"}
)
# concat back to metadata
meta_st_df = pd.concat([meta_st_df, tested_df], axis = 1).drop('tested', axis = 1)

In [None]:
# un nest state metadata vaccinated column if present
if 'vaccinated' in meta_st_df.columns:
    vac_df = meta_st_df.vaccinated.apply(pd.Series).rename(
        columns={"date": "vaccinated_date", "source": "vaccinated_source"}
    )
    # concat back to metadata
    meta_st_df = pd.concat([meta_st_df, vac_df], axis = 1).drop('vaccinated', axis = 1)

In [None]:
meta_st_df

- Export State Metadata (code commented to deploy in Binder)

In [None]:
# file_path = "./excel/state_meta.xlsx"
# meta_st_df.to_excel(file_path, index=False)

##### Compare State delta7 with $\sum{(\mathrm{time-series})}$

In [None]:
# excercise on state level (filter delta confirmed)
set_type = 'delta'
query = "obs_type == @set_type & obs_cat == 'confirmed'"
d_conf_ts = long_ts_df.query(query)
# cutting date query
max_time = pd.to_datetime(d_conf_ts.time_period).max()
cut_date = (max_time - pd.to_timedelta(7, unit='d')).strftime('%Y-%m-%d')
query_t = "time_period > @cut_date"
# delta7 a pelo
delta7_calc = d_conf_ts.query(query_t).groupby('state').agg({'val': 'sum'}).rename(
    columns={"val": "beto_calc"}
)
# add delta7 from both API state time-series and current
# max_time = max_time - pd.to_timedelta(1, unit='d')
query_ts = "obs_type == 'delta7' & obs_cat == 'confirmed' & time_period == @max_time.strftime('%Y-%m-%d')"
set_type = 'delta7'
comp_delta7 = pd.concat(
    [
        delta7_calc,
        long_ts_df.query(query_ts).set_index('state').val,
        long_st_df.query(query).set_index('state').val,
    ], axis = 1
).reset_index()
comp_delta7.columns = [comp_delta7.columns[0], comp_delta7.columns[1], 'val_ts', 'val_st']
comp_delta7
# print(comp_delta7)
# max_time = max_time + pd.to_timedelta(1, unit='d')
# print(long_ts_df.query(query_ts))

##### Compare State delta21_14 with $\sum{(\mathrm{time-series})}$

In [None]:
# excercise on state level (filter delta confirmed)
set_type = 'delta'
query = "obs_type == @set_type & obs_cat == 'confirmed'"
d_conf_ts = long_ts_df.query(query)
# cutting date query
max_time = pd.to_datetime(d_conf_ts.time_period).max()
cut_date_0 = (max_time - pd.to_timedelta(21, unit='d')).strftime('%Y-%m-%d')
cut_date_1 = (max_time - pd.to_timedelta(14, unit='d')).strftime('%Y-%m-%d')
query_t = "time_period > @cut_date_0 & time_period <= @cut_date_1"
# delta21_14 a pelo
delta21_14_calc = d_conf_ts.query(query_t).groupby('state').agg({'val': 'sum'}).rename(
    columns={"val": "beto_calc"}
)
# add delta21_14 from API state current
max_time = max_time - pd.to_timedelta(1, unit='d')
set_type = 'delta21_14'
comp_delta21_14 = pd.concat(
    [
        delta21_14_calc,
        long_st_df.query(query).set_index('state').val,
    ], axis = 1
).reset_index().rename(columns={"val": "val_st"})
comp_delta21_14
# print(comp_delta21_14)
# (abs(comp_delta21_14.beto_calc-comp_delta21_14.val_st)/comp_delta21_14.val_st*100).median()

##### District data
Eventually join into state data and metadata

In [None]:
# filter district data and metadata from json data
json_ds = {
    key_1: {
        key_2: response_data.json()[key_1][key_2]
        for key_2 in response_data.json()[key_1] if key_2 == 'districts'
    } for key_1 in response_data.json()
}

In [None]:
# read json_ds and normalize - use custom separator: district names have points!
start_time = time.time()
wide_ds_df = pd.json_normalize(json_ds, max_level=4, sep='//')
total_sec = time.time() - start_time
print(f"{round(total_sec,1)} secs execution")

In [None]:
# build long format from column names (renames as desired)
long_ds_df = wide_ds_df.columns.str.split("//", expand=True).droplevel(1).to_frame(
    index=False, name=["state", "district", "obs_type", "obs_cat"]
)
# add values from series
long_ds_df["val"] = wide_ds_df.values[0]

In [None]:
# filter metadata in temporary long format
filter_meta = long_ds_df.obs_type == 'meta'
long_meta_ds_df = long_ds_df[filter_meta]
# district data in long format (drop metadata)
long_data_ds_df = long_ds_df.drop(long_meta_ds_df.index)

###### Use data-all to compute delta14_7 for districts

In [None]:
# use data-all range dataframe to compute delta14_7
ds_delta_14_7 = long_ds_range_df.groupby(['state', 'district']).agg({'val': 'sum'}).reset_index()
# fill cols obs_cat, obs_type with constants (match current day district data structure)
ds_delta_14_7['obs_cat'] = 'confirmed'
ds_delta_14_7['obs_type'] = 'delta14_7'

In [None]:
# pandas concat works with different column order (keeps first)
long_data_ds_df = pd.concat([long_data_ds_df, ds_delta_14_7], ignore_index=True)
long_data_ds_df

In [None]:
# # function test OK: match delta7 with delta7_0 in districts
# query_d7c = "obs_type == 'delta7' & obs_cat == 'confirmed'"
# trunc_json = conf_ds_deltaX_Y(response_all.json(), 7, 0)
# wide_ds_7_0_df = pd.json_normalize(trunc_json, sep='//')
# long_ds_7_0_df = wide_ds_7_0_df.columns.str.split("//", expand=True).to_frame(
#     index=False, name=["time_period", "state", "district"]
# )
# long_ds_7_0_df["val"] = wide_ds_7_0_df.values[0]
# pd.concat(
#     [
#         long_ds_7_0_df.groupby(['state', 'district']).agg({'val': 'sum'}),
#         long_data_ds_df.query(query_d7c).set_index(['state', 'district']).val,
#     ], axis = 1
# ).reset_index()

##### District metadata
Metadata at district level, **important** and **outdated** information: population of the district (based on 2011 census)

**Note**: district names could be repeated among states

To join eventually into state data and metadata

In [None]:
# pivot temporary long into district metadata table
meta_ds_df = long_meta_ds_df.drop(columns='obs_type').set_index(
    ['state', 'district', 'obs_cat']
).unstack(level=-1).reset_index(col_level=1).droplevel(level=0, axis=1).rename_axis(None, axis=1)

In [None]:
# un nest district tested column
ds_tested_df = meta_ds_df.tested.apply(pd.Series).drop(0, axis = 1).rename(
    columns={"date": "test_date", "source": "test_source"}
)
# concat back to metadata
meta_ds_df = pd.concat([meta_ds_df, ds_tested_df], axis = 1).drop('tested', axis = 1)

In [None]:
# un nest district vaccinated column
ds_vac_df = meta_ds_df.vaccinated.apply(pd.Series).drop(0, axis = 1).rename(
    columns={"date": "vaccinated_date"}
)
# concat back to metadata
meta_ds_df = pd.concat([meta_ds_df, ds_vac_df], axis = 1).drop('vaccinated', axis = 1)

In [None]:
meta_ds_df

- Export District Metadata (code commented to deploy in Binder)

In [None]:
# file_path = "./excel/district_meta.xlsx"
# meta_ds_df.to_excel(file_path, index=False)

#### EDA: hospital bed occupancy
Hospital bed occupany as reported in state bulletins (`csv` file from):

https://api.covid19india.org/csv/latest/statewise_tested_numbers_data.csv

In [None]:
csv_url = "https://api.covid19india.org/csv/latest/statewise_tested_numbers_data.csv"
st_bulletin_df = pd.read_csv(csv_url, dtype=str)

In [None]:
# possible columns to explore
bed_occup_col = [
    'People on ICU Beds',
    'Total Num ICU Beds',
    'Beds Occupied(Normal/Isolation)',
    'Total Num Beds (Normal/Isolation)',
    'People on O2 Beds',
    'Total Num of O2 Beds',
    'People on Ventilator',
    'Total Num Ventilators',
]
st_bulletin_df.columns

#### Current week data vis
Based on these situation analysis indicators:

|  |  |
| --- | --- |
| <img src="https://drive.google.com/uc?export=view&id=1X1hVR5y00vprU1jFT20nSP3Jc41jVsWY" width="200"> | <img src="https://drive.google.com/uc?export=view&id=1saMjeevjiVlv_Dq7BNRNUgKdOApjwFeS" width="200"> |
| <img src="https://drive.google.com/uc?export=view&id=10frXzVNHFAFNW1GrErj3QwKZRZGPZl9A" width="200"> | <img src="https://drive.google.com/uc?export=view&id=1AdDqL3kVyjaepYR8N9t6Q5Y2iwaXCkYK" width="200"> |

In [None]:
# dropdowns: state/district, situation indicators
geo_level = ['State', 'District']
dd_level = dcc.Dropdown(
    id="my_level",
    options=[
        {"label": value, "value": key}
        for key, value in zip(geo_level, geo_level)
    ],
    value='State'
)
sit_ind = [
    'Case Incidence',
    'Percent change in cases',
    'Test Positivity Rate (TPR)',
    'Case Fatality Ratio (CFR)',
]
dd_ind = dcc.Dropdown(
    id="my_ind",
    options=[
        {"label": value, "value": key}
        for key, value in zip(sit_ind, sit_ind)
    ],
    value='Case Incidence'
)

In [None]:
# Build App: current day
app_c = JupyterDash(__name__, external_stylesheets=external_stylesheets)
# App Layout
app_c.layout = html.Div([
    html.H2("Situation Analysis Framework"),
    html.H6("Switch State/District and select Indicator"),
    html.Div([
        html.Div(
            ["Switch:", dd_level],
            style={'width': '30%', 'display': 'inline-block'},
        ),
        html.Div(
            ["Situation Indicator:", dd_ind],
            style={'width': '65%', 'display': 'inline-block'},
        ),
    ]),
    html.Br(),
    dcc.Graph(id='bar-plot')
])

In [None]:
# Define callback to update graph
@app_c.callback(
    Output("bar-plot", "figure"),
    Input("my_level", "value"),
    Input("my_ind", "value"),
)
def plot_indicator(geo_lev, indicator):
    # don't return plot if any missing values
    if any([not geo_lev, not indicator]):
        return {}
    else:
        # data/metadata level
        data = long_st_df if geo_lev == 'State' else long_data_ds_df
        meta = meta_st_df if geo_lev == 'State' else meta_ds_df
        # left join data/meta
        key_join = "state" if geo_lev == 'State' else ["state", "district"]
        data_meta_df = data.merge(meta, on=key_join, how="left", sort=False)
        query = "obs_cat == 'confirmed'"
        df = data_meta_df.query(query).set_index(key_join)
        obs_d07 = df.obs_type == 'delta7'
        query_t = "obs_cat == 'tested'"
        df_t = data_meta_df.query(query_t).set_index(key_join)
        obs_t_d07 = df_t.obs_type == 'delta7'
        query_d = "obs_cat == 'deceased'"
        df_d = data_meta_df.query(query_d).set_index(key_join)
        obs_d_d07 = df_d.obs_type == 'delta7'
        
        if "change" in indicator:
            obs_d14 = df.obs_type == 'delta14_7'
            # assumes no delta zeros or instead Inf will result
            ind_calc = (df.val[obs_d07] - df.val[obs_d14]) / df.val[obs_d07] * 100
        elif "Incidence" in indicator:
            # newly confirmed per million population (per week --> delta7)
            ind_calc = df.val[obs_d07] * 1e6 / df.population[obs_d07]
        elif "Fatality" in indicator:
            # total deaths over total confirmed
            # assumes no delta zeros or instead Inf will result
            ind_calc = df_d.val[obs_d_d07] / df.val[obs_d07] * 100
        else:
            # test positivity rate (per week --> delta7)
            # assumes no delta zeros or instead Inf will result
            ind_calc = df.val[obs_d07] / df_t.val[obs_t_d07] * 100
        
        fig = px.bar(
                ind_calc.reset_index().rename(columns={0: "val"}),
                x=geo_lev.lower(),
                y="val",
            ).update_layout(xaxis={'categoryorder':'total descending'})
        return fig

In [None]:
# Run app and display result inline in the notebook
app_c.run_server(mode='inline')

- Case Incidence for external analysis (excel export commented to deploy in Binder)

In [None]:
# left join data/meta
data_meta_df = long_st_df.merge(meta_st_df, on="state", how="left", sort=False)
query = "obs_cat == 'confirmed'"
df = data_meta_df.query(query).set_index("state")
obs_d07 = df.obs_type == 'delta7'
# newly confirmed per million population (per week --> delta7)
ind_calc = df.val[obs_d07] * 1e6 / df.population[obs_d07]
st_inc_df = (
    pd.concat([ind_calc, df[['val', 'population']][obs_d07]], axis = 1)
).reset_index().rename(columns={0: "case_inc", "val": "delta7"})

In [None]:
# left join data/meta
data_meta_df = long_data_ds_df.merge(meta_ds_df, on=["state", "district"], how="left", sort=False)
query = "obs_cat == 'confirmed'"
df = data_meta_df.query(query).set_index(["state", "district"])
obs_d07 = df.obs_type == 'delta7'
# newly confirmed per million population (per week --> delta7)
ind_calc = df.val[obs_d07] * 1e6 / df.population[obs_d07]
ds_inc_df = (
    pd.concat([ind_calc, df[['val', 'population']][obs_d07]], axis = 1)
).reset_index().rename(columns={0: "case_inc", "val": "delta7"})

In [None]:
# file_path = "./excel/case_incidence.xlsx"
# with pd.ExcelWriter(file_path) as writer:
#     st_inc_df.to_excel(writer, sheet_name='state', index=False)
#     ds_inc_df.to_excel(writer, sheet_name='district', index=False)

##### Drafts test/check
Commented code

In [None]:
# # test in place a ver
# geo_lev = 'State' # 'District' #
# indicator = 'Case Incidence' # 'Percent change in cases' #
# # data/metadata level
# data = long_st_df if geo_lev == 'State' else long_data_ds_df
# meta = meta_st_df if geo_lev == 'State' else meta_ds_df
# # left join data/meta
# key_join = "state" if geo_lev == 'State' else ["state", "district"]
# data_meta_df = data.merge(meta, on=key_join, how="left", sort=False)
# query = "obs_cat == 'confirmed'"
# df = data_meta_df.query(query).set_index(key_join)
# obs_d07 = df.obs_type == 'delta7'
# if "change" in indicator:
#     obs_d14 = df.obs_type == 'delta21_14'
#     # assumes no delta zeros or instead Inf will result
#     ind_calc = (df.val[obs_d07] - df.val[obs_d14]) / df.val[obs_d07] * 100
# else:
#     # newly confirmed per million population (per week)
#     ind_calc = df.val[obs_d07] * 7e6 / df.population[obs_d07]

In [None]:
# long_st_df[long_st_df.state == 'MZ']
# meta_st_df[meta_st_df.state == 'MZ']
# 3894*7 / 1192000

### [Source CoWIN](https://dashboard.cowin.gov.in/)
Is API documented?

#### Yves shared link 1
https://api.cowin.gov.in/api/v1/reports/v2/getPublicReports?state_id=&district_id=&date=2021-07-15

- Check out structure

In [None]:
# API parameters
st_id = ""
ds_id = ""
date = "2021-07-21"
url = "https://api.cowin.gov.in/api/v1/reports/v2/getPublicReports"
api_param = {
    "state_id": st_id,
    "district_id": ds_id,
    "date": date,
}
response_cowi = requests.get(url, params=api_param)
response_cowi.url

In [None]:
# keys in data_structure levels
if response_cowi.status_code == 200:
    keys_1 = [key for key in response_cowi.json()]
    print(f"Keys @level 1:\n{keys_1}")
    keys_2 = []
    for key in keys_1:
        # check keys for dicts or list of dicts
        if isinstance(response_cowi.json()[key], dict):
            keys_2.append(list(response_cowi.json()[key].keys()))
        elif isinstance(response_cowi.json()[key], list):
            keys_list = []
            for elem in response_cowi.json()[key]:
                keys_list.append(list(elem.keys()))
            keys_2.append(keys_list)
        else:
            keys_2.append('None')
    print(f"Keys @level 2:\n{keys_2}")

In [None]:
# check if nested info at level 2
data_types = []
for i, key in enumerate(keys_1):
    if isinstance(keys_2[i], list):
        for j, elem in enumerate(keys_2[i]):            
            # check not list of list
            if not isinstance(elem, list):
                data = response_cowi.json()[key][elem]
                data_types.append(type(data))
#                 print(type(data))
            else:
                for key_2 in elem:
                    data = response_cowi.json()[key][j][key_2]
                    data_types.append(type(data))
#                     print(type(data))

In [None]:
print(set(data_types))
data_types.count(dict)

- `topBlock` Extraction

In [None]:
# read json and normalize
wide_top_df = pd.json_normalize(response_cowi.json()['topBlock'])
long_top_df = wide_top_df.columns.str.split(".", expand=True).to_frame(
    index=False, name=["obs_type", "obs_cat"]
)
long_top_df["val"] = wide_top_df.values[0]
long_top_df.set_index(["obs_type", "obs_cat"])

- `vaccinationDoneByTime` Extraction

Doesn't look relevant for our analysis

In [None]:
# read json and normalize
vac_by_time_df = pd.json_normalize(response_cowi.json()['vaccinationDoneByTime'])
vac_by_time_df

- `last7DaysRegistration` Extraction

Doesn't look relevant for our analysis <!-- -->

In [None]:
# read json and normalize
reg7_df = pd.json_normalize(response_cowi.json()['last7DaysRegistration'])
reg7_df

- `last30DaysAefi` Extraction

**AEFI**: Adverse event following immunization

Doesn't look relevant for our analysis <!-- -->

In [None]:
# read json and normalize
aefi30_df = pd.json_normalize(response_cowi.json()['last30DaysAefi'])
aefi30_df

- `last5daySessionStatus` Extraction

Doesn't look relevant for our analysis, *data length doesn't match key name* <!-- -->

In [None]:
# read json and normalize
ses5_df = pd.json_normalize(response_cowi.json()['last5daySessionStatus'])
ses5_df

- `getBeneficiariesGroupBy` Extraction

Data at state - *name not code* - level: `state_id` could be tested as API parameter if required

<!-- Doesn't look relevant for our analysis, *data length doesn't match key name* -->

In [None]:
# read json and normalize
ben_df = pd.json_normalize(response_cowi.json()['getBeneficiariesGroupBy'])
ben_df

- `aefiPercentage`: is this for the day or the total among time-series?

In [None]:
print(f"{response_cowi.json()['aefiPercentage']} %")

#### Yves shared link 2
https://api.cowin.gov.in/api/v1/reports/v2/getVacPublicReports?state_id=&district_id=&date=2021-07-15

- Check out structure

In [None]:
# API parameters
st_id = ""
ds_id = ""
date = "2021-07-21"
url = "https://api.cowin.gov.in/api/v1/reports/v2/getVacPublicReports"
api_param = {
    "state_id": st_id,
    "district_id": ds_id,
    "date": date,
}
response_cowi = requests.get(url, params=api_param)
response_cowi.url

In [None]:
# keys in data_structure levels
if response_cowi.status_code == 200:
    keys_1 = [key for key in response_cowi.json()]
    print(f"Keys @level 1:\n{keys_1}")
    keys_2 = []
    for key in keys_1:
        # check keys for dicts or list of dicts
        if isinstance(response_cowi.json()[key], dict):
            keys_2.append(list(response_cowi.json()[key].keys()))
        elif isinstance(response_cowi.json()[key], list):
            keys_list = []
            for elem in response_cowi.json()[key]:
                keys_list.append(list(elem.keys()))
            keys_2.append(keys_list)
        else:
            keys_2.append('None')
    print(f"Keys @level 2:\n{keys_2}")

In [None]:
# check if nested info at level 2
data_types = []
for i, key in enumerate(keys_1):
    if isinstance(keys_2[i], list):
        for j, elem in enumerate(keys_2[i]):            
            # check not list of list
            if not isinstance(elem, list):
                data = response_cowi.json()[key][elem]
                data_types.append(type(data))
#                 print(type(data))
            else:
                for key_2 in elem:
                    data = response_cowi.json()[key][j][key_2]
                    data_types.append(type(data))
#                     print(type(data))

In [None]:
print(set(data_types))
data_types.count(dict)