# Roll Your Own Analysis

This notebook will demonstrate how to download raw data from the ENPA REST API, debias it, and display a basic visualization by performing the following actions:

1. <a href="#Downloading-Raw-ENPA-Data">Downloading raw ENPA data</a>, 
2. <a href="#Debiasing-Raw-Data">Debiasing raw data</a>, and 
3. <a href="#Visualizing-the-Debiased-Data">Visualizing the debiased data</a>.

Be sure to have your full API key ready before proceeding.

## Downloading Raw ENPA Data

This section will describe how to download raw ENPA data to perform further analysis using your full API key. Begin by following the steps below.

1) Navigate to the *API Docs* tab within the ENPA portal, and select *Authorize*.

![Step4](https://raw.githubusercontent.com/c19hcc/enpa-pha-jupyternotebooks/main/images/Step4.png)

2) Input your *Full Key* into the dialog box and select *Authorize*. Note, the full API key must be included as an `X-Api-Key` header in all requests.

![Step5W](https://raw.githubusercontent.com/c19hcc/enpa-pha-jupyternotebooks/main/images/Step5W.png)

3) Import Python's Request library and construct the necessary header to include the API key.

In [None]:
import requests

myFullKey = # " Insert your full key here "

header = {"X-Api-Key": myFullKey}

4) Specify parameters and request the desired data.
* `datasets`: Choose from one of several datasets to download :`notification`, `notificationInteractions`, `riskParameters`, `codeVerified`, `keysUploaded`, `beaconCount`, `dateExposure`, `dateExposure14d`, `dateExposureV2`, `keysUploadedVaccineStatus` , `keysUploadedVaccineStatus14d`, `keysUploadedVaccineStatusV2`, `keysUploadedWithReportType`, `keysUploadedWithReportType14d`, `periodicExposureNotification`, `periodicExposureNotification14d`, `secondaryAttack`, `secondaryAttack14d`, or `userRisk`.
* `raw`: Specify `True` to return the raw, biased data or `False` to return the debiased data.
* `start_date`: Specify a start date to download data (must be in YYYY-MM-DD format); if omitted, the default is 90 days before either the end date (if specified) or the current date.
* `end_date`: Specify an end date to download data (must be in YYYY-MM-DD format); if omitted, the default is the current date. 
* `country`: Input an ISO 3166-1 code for the country (e.g., `US`).
* `state`: Input an ISO-3166-2 code for the subdivision, state, or province (e.g., `US-VA`).

In [None]:
parameters = {
    "datasets": "notification",
    "raw": True,
    "start_date": "2022-01-01",
    "end_date": "2022-01-31",
    "country": "US",
    "state": "US-VA"
}

In [None]:
notificationRequest = requests.get("https://api.enpa-pha.io/api/public/v2/enpa-data",
                      headers = header, params = parameters)
print("Reason: ", notificationRequest.reason, "\nStatus Code: ", notificationRequest.status_code)

A *Status Code* of 200 indicates that the data was successfully retrieved!

5) In addition to requesting the `notification` dataset, we also need to request the `userRisk` dataset to help correct some of the Apple iOS values in the raw data. Use the same parameters as you did in step 4, but change the `datasets` parameter to `userRisk`.

In [None]:
parameters = {
    "datasets": "userRisk",
    "raw": True,
    "start_date": "2022-01-01",
    "end_date": "2022-01-31",
    "country": "US",
    "state": "US-VA"
}

In [None]:
userRiskRequest = requests.get("https://api.enpa-pha.io/api/public/v2/enpa-data",
                      headers = header, params = parameters)
print("Reason: ", userRiskRequest.reason, "\nStatus Code: ", userRiskRequest.status_code)

A *Status Code* of 200 indicates that the data was successfully retrieved!

## Debiasing Raw Data

This section will describe how to prepare and debias the raw data downloaded from ENPA's REST API. Begin by following the steps below.

1) Convert the downloaded data from string to dictionary form using the `json` library, so it can be easy manipulated.

In [None]:
import json
convertedNotifications = json.loads(notificationRequest.text)
convertedUserRisk = json.loads(userRiskRequest.text)

Here is what the data look like. Let's try to visualize total number of notifications over time as a stacked bar chart indicating iOS and Android notifications.

In [None]:
convertedNotifications

In [None]:
convertedUserRisk

2) Crawling through the dictionary structure, we need to find the `totalCount`, the total number of clients (`total_individual_clients`); `sum`, the sum value (sum of an individual element of the `sum` array(s)); and `epsilon`, the epsilon from the raw value. These values are needed to debias the data. Here is what the debias function looks like.

In [None]:
import math

def getMostLikelyPopulationCount(totalCount, sumPart, epsilon):
    '''
    Debiases raw sum values (`sumPart`).
    
    :param totalCount: The total number of clients (`total_individual_clients`)
    :param sumPart: The sum value (sum of an individual element of the `sum` array(s))
    :param epsilon: Epsilon from the raw value
    :return: 
        - `mostLikelyPopulation` (float): The debiased (expected) value
        - `standardDeviation` (float): Standard deviation
    '''
    p = 1 / (1 + math.exp(epsilon))
    sqrtPTimesOneMinusP = math.exp(epsilon / 2) / (1 + math.exp(epsilon))
    mostLikelyPopulation = (sumPart - totalCount * p) / (1 - 2 * p)
    standardDeviation = math.sqrt(totalCount) * sqrtPTimesOneMinusP
    return mostLikelyPopulation, standardDeviation

3) Let's debias the notification data by leveraging the debias function. We will create a dataframe aggregated by date with the debiased Apple and Google Notification data.

In [None]:
from datetime import datetime
import pandas as pd

# Initialize Arrays for Data Storage
datesApple, totalCountApple, sumPartApple, epsilonApple, notificationsApple = [], [], [], [], []
datesGoogle, totalCountGoogle, sumPartGoogle, epsilonGoogle, notificationsGoogle = [], [], [], [], []

for i in range(len(convertedNotifications['rawData'])):
    # Data Collection to Debiasing Apple/iOS Data
    if(convertedNotifications['rawData'][i]['aggregation_id'] == "com.apple.EN.UserNotification"):
        datesApple.append(datetime.strptime(convertedNotifications['rawData'][i]['aggregation_start_time'][0:10], "%Y-%m-%d"))
        totalCountApple.append(convertedNotifications['rawData'][i]['total_individual_clients']) # totalCount parameter for debiasing
        sumPartApple.append(convertedNotifications['rawData'][i]['sum'][1]) # sumPart parameter for debiasing
        epsilonApple.append(convertedNotifications['rawData'][i]['epsilon']) # epsilon parameter for debiasing
        
    # Data Collection to Debiasing Google/Android Data
    elif(convertedNotifications['rawData'][i]['aggregation_id'] == "PeriodicExposureNotification"):
        datesGoogle.append(datetime.strptime(convertedNotifications['rawData'][i]['aggregation_start_time'][0:10], "%Y-%m-%d"))
        totalCountGoogle.append(convertedNotifications['rawData'][i]['total_individual_clients']) # totalCount parameter for debiasing
        sumPartGoogle.append(convertedNotifications['rawData'][i]['sum'][1]) # sumPart parameter for debiasing
        epsilonGoogle.append(convertedNotifications['rawData'][i]['epsilon']) # epsilon parameter for debiasing
        
# Aggregate Notification data by Date
RawAppleData = pd.DataFrame(list(zip(datesApple, totalCountApple, sumPartApple, epsilonApple)), columns = ['Date', 'Total', 'SumPart', 'Epsilon'])        
RawAppleGrouping = RawAppleData.groupby(['Date'])[['Total','SumPart']].sum()    
RawGoogleData = pd.DataFrame(list(zip(datesGoogle, totalCountGoogle, sumPartGoogle, epsilonGoogle)), columns = ['Date', 'Total', 'SumPart', 'Epsilon'])        
RawGoogleGrouping = RawGoogleData.groupby(['Date'])[['Total','SumPart']].sum()    

# Debias the Apple Notification Data
for i in range(RawAppleGrouping.shape[0]):
    (mostLikelyNotifications, standardDeviation) = getMostLikelyPopulationCount(RawAppleGrouping.Total[i], RawAppleGrouping.SumPart[i], 8)
    notificationsApple.append(mostLikelyNotifications)

# Debias the Google Notification Data
for i in range(RawGoogleGrouping.shape[0]):
    (mostLikelyNotifications, standardDeviation) = getMostLikelyPopulationCount(RawGoogleGrouping.Total[i], RawGoogleGrouping.SumPart[i], 8)
    notificationsGoogle.append(mostLikelyNotifications)

# Clean Up Data
RawAppleGrouping['DebiasedAppleNotifications'] = notificationsApple
RawGoogleGrouping['DebiasedGoogleNotifications'] = notificationsGoogle
DebiasedData = pd.DataFrame(list(zip(RawAppleGrouping.index, RawAppleGrouping.DebiasedAppleNotifications, RawGoogleGrouping.DebiasedGoogleNotifications)), columns = ['Date', 'Apple Notifications', 'Google Notifications'])

Here is what the Debiased data looks like:

In [None]:
DebiasedData

4) Before visualizating the data, there is one more step to prepare the data. The Apple Notification data needs to be altered slightly more to fully debias the data. It is now time to debias and scale the `userRisk` dataset that was requested in Step 5 of the <a href="#Downloading-Raw-ENPA-Data">Downloading Raw ENPA Data</a> section. We will start by calling the `debiasAndScale` function which will perform the heavy lifting for us.

In [None]:
def debiasAndScale(userRiskData, startPlace, endPlace):
    '''
    Debiases and determines the adjustment ratio from the userRisk dataset.
    This function is run once for each day in the dataset.
    
    :param userRiskData: The userRisk dataset of interest.
    :param startPlace: The first occurance of a particular day.
    :param endPlace: The last occurance of a particular day.
    :return: 
        - `ratio` (float): The adjustment ratio for Apple Notification counts
        - `date` (float): The adjustment ratio's corresponding date for bookkeeping purposes
    '''
    newSumPart = np.zeros(len(userRiskData['rawData'][0]['sum']))
    totalClients = []
    debiasedUserRisk = []

    for i in range(startPlace, endPlace):
        totalClients.append(userRiskData['rawData'][i]['total_individual_clients'])
        for j in range(len(userRiskData['rawData'][0]['sum'])):
            newSumPart[j] += userRiskData['rawData'][i]['sum'][j]
        epsilon = userRiskData['rawData'][i]['epsilon']

    # Debias UserRisk
    for i in range(len(userRiskData['rawData'][0]['sum'])):
        sumPart = newSumPart[i]
        (mostLikelyRisk, stdev) = getMostLikelyPopulationCount(sum(totalClients), sumPart, epsilon)
        debiasedUserRisk.append(mostLikelyRisk)

    # Scale UserRisk
    ratio = sum(totalClients)/sum(debiasedUserRisk)
    date = datetime.strptime(userRiskData['rawData'][startPlace]['aggregation_start_time'][0:10], "%Y-%m-%d")
    return (ratio, date)

5) Now run the code to prepare the `userRisk` data and call the `debiasAndScale` function.

In [None]:
import numpy as np 

datesUserRisk = []
ratios = []
consolidatedDates = []
flag = 0

for i in range(len(convertedUserRisk['rawData'])):
    date = datetime.strptime(convertedUserRisk['rawData'][i]['aggregation_start_time'][0:10], "%Y-%m-%d")
    
    # Group Dates and Prepare the Adjustment Ratios
    if date in datesUserRisk and i != (len(convertedUserRisk['rawData']) - 1):
        datesUserRisk.append(date)
    elif (date not in datesUserRisk and i > 0) or (i == (len(convertedUserRisk['rawData']) - 1)):
        ratio, returnedDate = debiasAndScale(convertedUserRisk, flag, i)
        ratios.append(ratio)
        consolidatedDates.append(returnedDate)
        datesUserRisk.append(date)
        flag = i
    elif i == 0:
        datesUserRisk.append(date)
        
UserRiskData = pd.DataFrame(list(zip(consolidatedDates, ratios)), columns = ['Date', 'UserRisk Ratio'])

Here is what the `userRisk` ratio data looks like:

In [None]:
UserRiskData

6) All that is left is combining the datasets appropriately. Recall that the `userRisk` adjustment ratio should be applied only to the Apple data, not the Google data. Multiply the adjustment ratio by the Apple notification counts to receive the corrected data.

In [None]:
MergedData = DebiasedData.merge(UserRiskData, on = 'Date', how = 'left')
MergedData['Corrected Apple Notifications'] = MergedData['Apple Notifications'] * MergedData['UserRisk Ratio']
MergedData['Total Notifications'] = MergedData['Corrected Apple Notifications'] + MergedData['Google Notifications']
MergedData

## Visualizing the Debiased Data

This section will describe how to prepare and visualize the raw data downloaded from ENPA's REST API. Begin by following the steps below.

1) Run the code below to construct the bar chart. The dataframe, `MergedData`, is already organized, so it is easy to plot the total notifications over time. We can use a stacked bar to show the number of notifications from Apple/iOS devices or Google/Android devices.

2) Next, save the plot to your current directory. The filename will include the date range of the plot (e.g., `TotalNotificationsOverTime_2022-01-01_to_2022-01-31.png`).

In [None]:
from matplotlib import pyplot as plt

# Plotting a Stacked Bar
plt.figure(figsize=(15, 8))
plt.rcParams.update({'font.size': 14})
barWidth = 0.6
plt.bar(MergedData['Date'], MergedData['Corrected Apple Notifications'], barWidth, label = 'iOS')
plt.bar(MergedData['Date'], MergedData['Google Notifications'], barWidth, label = 'Android', bottom = MergedData['Corrected Apple Notifications'])
plt.legend()
plt.xticks(rotation=45)
plt.xlabel("Date")
plt.ylabel("Number of Notifications")
plt.title("Total Notifications Over Time")

startDate = datetime.strftime(MergedData['Date'][0], "%Y-%m-%d")
endDate = datetime.strftime(MergedData['Date'][len(MergedData['Date']) - 1], "%Y-%m-%d")

plt.savefig(f'TotalNotificationsOverTime_{startDate}_to_{endDate}.png', format='png', dpi=400)
plt.show();