# Roll Your Own Analysis: ENCV Data

This notebook will demonstrate how to download data from the ENCV REST API and produce a few basic visualizations by performing the following actions:

1. <a href="#Downloading-ENCV-Data">Downloading ENCV data</a>, 
2. <a href="#Preparing-ENCV-Data">Preparing ENCV data</a>, and
3. <a href="#Visualizing-ENCV-Data">Visualizing ENCV data</a>.

Be sure to have your full API key ready before proceeding. These keys are managed at the public health authority level.

## Downloading ENCV Data

This section will show you how to make an ENCV data request to perform further analysis using your full API key. Begin by following the steps below.

1) Import Python's Request library and construct the necessary header to include the API key.

In [None]:
import requests

header = {"X-Api-Key": # " Remove the #, and insert API key here "}

2) Request the desired data. Unlike the ENPA API, there are no available parameters for the ENCV API. The default returning dataset will display the last 90 days of data for your jurisdiction. After 90 days, the data is erased, so be sure to regularly request and back up your data.

In [None]:
ENCVrequest = requests.get("https://adminapi.verification.apollo-project.org/api/stats/realm.json",
                      headers = header)
print("Reason: ", ENCVrequest.reason, "\nStatus Code: ", ENCVrequest.status_code)

A _Status Code_ of 200 indicates that the data was successfully retrieved!

Please note that no debiasing is necessary for ENCV data, so you can begin preparing the data and creating visualizations.

## Preparing ENCV Data

This section will describe how to prepare the data downloaded from the ENCV API. Begin by following the steps below.

1) Convert the downloaded data from string to dictionary form using the `json` library, so it can be easily manipulated.

In [None]:
import json
convertedENCV = json.loads(ENCVrequest.text)

Here is what the data look like.

In [None]:
convertedENCV

For better context, here is a data dictionary along with with the corresponding ENPA equivalent metrics.

*Note: Please download and view this notebook in an environment compatible with .ipynb files to view the table properly.*

| Column Name | Type | Default | Definition | ENPA Equivalent | Notes
| :-: | :-: | :-: | :-- | :-: | :-- |
code_claim_age_distribution | `int[]` | N/A | Shows the distribution of time from code issue to claim. |  | Buckets are: 1m, 5m, 15m, 30m, 1h, 2h, 3h, 6h, 12h, 24h, >24h
code_claim_mean_age_seconds | `int` | 0 | The mean time in seconds for codes to be claimed. |  | 
codes_claimed | `int` | 0 | The number of successful claims. Codes can only be claimed by the end user using the API. Typically, the iOS or Android application is responsible for claiming the code. | Codes Verified | Refers to the number of codes that a user validated on their mobile device before the code expired. The delta between codes issued and codes claimed can be used as a rudimentary measure of adoption.
codes_invalid | `int` | 0 | The number of codes that were rejected by the system. This includes codes with typographical errors and codes that have expired. |  | A large number of invalid codes likely corresponds to short codes expiring. The timeout for short codes can be adjusted.
codes_invalid_by_os | `bigint[]` | N/A | An array where the index is the controller.OperatingSystem enums (unknown_os, ios, android). |  | 
codes_issued | `int` | 0 | The total number of codes issued by the health authority and due to self-report requests from user devices. Codes can be issued via the ENCV web interface or via the API. Both types of codes are included in this metric. |  | Issued codes does not necessarily correspond to the number of paitents notified. A single paitent could be notified multiple times, while another paitent could never receive their notification due to an SMS error.
date | `date` | N/A | The date in which the statistics are valid. |  | Unlike many ENPA metrics, there is only one entry for each day. The date is in Zulu format (e.g., 2022-03-05T00:00:00Z).
realm_id | `int` | N/A | The identification number for the realm. |  | 
tokens_claimed | `int` | 0 | The total number of users that successfully claimed a verification code and then consented to key release. | Keys Uploaded | The delta between codes claimed and tokens claimed indicates users that enter a valid verification code but don't get through the consent to share data screen.
tokens_invalid | `int` | 0 | The number of tokens which failed to exchange due to a user error. |  | This includes User Report Tokens Claimed.
user_report_tokens_claimed | `int` | 0 | The number of tokens claimed that represent a user-initiated request. |  | This sum is also included in Tokens Claimed.
user_reports_claimed | `int` | 0 | The specific number of codes that were claimed because the user initiated a self-report request. |  | 
user_reports_issued | `int` | 0 | The specific number of codes that were issued because the user initiated a self-report request. |  | These numbers are also included in the sum of codes issued and codes claimed. Taking the difference between user reports issued and codes issued can be used to figure out the number of codes issued by the health authority.

Here is the above table summarized in a flowchart. Note that all User Report metrics represent a subset of their corresponding metric. For example, User Reports Claimed is a subset of Codes Claimed.

![ENCVFlow](https://raw.githubusercontent.com/c19hcc/enpa-pha-jupyternotebooks/main/images/ENCVFlow.jpg)

2) Let's extract a few metrics to create some graphs. In this notebook, we will focus on Codes Claimed, Tokens Claimed, and the Code Claim Distribution. We will create a dataframe with each record representing a single day.

In [None]:
import pandas as pd
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Initialize Lists for Data Storage
dates, codesClaimed, tokensClaimed = [], [], []
ageDist1Min, ageDist5Mins, ageDist15Mins, totalTime = [], [], [], []

for i in range(len(convertedENCV['statistics'])):
    dates.append(datetime.strptime(convertedENCV['statistics'][i]['date'][0:10], "%Y-%m-%d"))
    codesClaimed.append(convertedENCV['statistics'][i]['data']['codes_claimed'])
    tokensClaimed.append(convertedENCV['statistics'][i]['data']['tokens_claimed'])
    ageDist1Min.append(convertedENCV['statistics'][i]['data']['code_claim_age_distribution'][0])
    ageDist5Mins.append(ageDist1Min[i] + convertedENCV['statistics'][i]['data']['code_claim_age_distribution'][1])
    ageDist15Mins.append(ageDist5Mins[i] + convertedENCV['statistics'][i]['data']['code_claim_age_distribution'][2])
    total = 0
    for j in range(0, 11):
        total += convertedENCV['statistics'][i]['data']['code_claim_age_distribution'][j]
    totalTime.append(total)

3) Let's store our data in a dataframe for easy graphing and find the proportion of three of the buckets in the Code Claim Distribution.

In [None]:
df = pd.DataFrame(list(zip(dates, codesClaimed, tokensClaimed, ageDist1Min, ageDist5Mins, ageDist15Mins, totalTime)), 
                  columns = ['Date', 'CodesClaimed', 'TokensClaimed', 'age1Min', 'age5Mins', 'age15Mins', 'totalTime'])

df['1MinProp'] = df['age1Min'] / df['totalTime'] * 100
df['5MinsProp'] = df['age5Mins'] / df['totalTime'] * 100
df['15MinsProp'] = df['age15Mins'] / df['totalTime'] * 100

Here is what the dataframe looks like.

In [None]:
df

## Visualizing ENCV Data

This section will describe how to prepare and visualize the data downloaded from ENCV's REST API. Begin by following the steps below.

1) Consider how many days of data you want the graph to include. The maximum is 90 days. Change the `duration` variable to specify the duration of your graph.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

# Be sure lastDate < duration
lastDate = x = 0 # how many days ago is the last day to graph (0 = today)
duration = 14 # how many days back to graph 

2) Run the code cell below to construct the Codes Claimed line chart. The dataframe, `df`, is already organized, so it is easy to plot the data.

In [None]:
n = x + duration

fig, ax = plt.subplots()
ax.plot(df['Date'][x:n], df['CodesClaimed'][x:n], linewidth = 3)
fig.suptitle('Number of Codes Claimed', fontsize = '16')

ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_ylim(0, max(df['CodesClaimed'][x:n])*1.05)

fig.set_size_inches(9.66, 6) # golden ratio 
ax.set_xlabel('Date', fontsize = '14')
ax.set_ylabel('Number of Codes Claimed', fontsize = '14');

In [None]:
# Save the above figure to your current directory
startDate = datetime.strftime(df['Date'][x], "%Y-%m-%d")
endDate = datetime.strftime(df['Date'][n], "%Y-%m-%d")
fig.savefig(f'CodesClaimed_{startDate}_to_{endDate}.pdf', format='pdf', dpi=400)

3) Run the code cell below to construct the Tokens Claimed line chart. 

In [None]:
fig, ax = plt.subplots()
ax.plot(df['Date'][x:n], df['TokensClaimed'][x:n], linewidth = 3)
fig.suptitle('Number of Tokens Claimed', fontsize = '16')

ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_ylim(0, max(df['TokensClaimed'][x:n])*1.05)

fig.set_size_inches(9.66, 6)
ax.set_xlabel('Date', fontsize = '14')
ax.set_ylabel('Number of Tokens Claimed', fontsize = '14');

In [None]:
# Save the above figure to your current directory
fig.savefig(f'TokensClaimed_{startDate}_to_{endDate}.pdf', format='pdf', dpi=400)

4) Run the code cell below to construct the Codes Claimed Age Distribution comparative bar charts.

In [None]:
fig,ax = plt.subplots(nrows=3, ncols=1)
fig.suptitle('Codes Claim Age Distribution')
fig.tight_layout()

fig.set_figheight(9.66)
fig.set_figwidth(6)
ax[0].bar(df['Date'][x:n], df['1MinProp'][x:n], color = 'lightgrey')
ax[1].bar(df['Date'][x:n], df['5MinsProp'][x:n], color = 'lightgrey')
ax[2].bar(df['Date'][x:n], df['15MinsProp'][x:n], color = 'lightgrey')

ax[0].set_title('0-1 Minutes Codes Claimed')
ax[1].set_title('0-5 Minutes Codes Claimed')
ax[2].set_title('0-15 Minutes Codes Claimed')

for i in range(0, 3):
    ax[i].spines['right'].set_visible(False)
    ax[i].spines['top'].set_visible(False)
    ax[i].set_ylim(0,100)
    ax[i].set_yticklabels(['{:1.0f}%'.format(x) for x in ax[0].get_yticks()])
    ax[i].set_xticks(df['Date'][x:n][::4]);

In [None]:
# Save the above figure to your current directory
fig.savefig(f'CodesClaimedDistribution_{startDate}_to_{endDate}.pdf', format='pdf', dpi=400)