# Covid19  Intervention Scoring using SageMaker
### Primary objectives:
1. Score / weigh effectiveness of each intervention for various countries using a weighted combination of scoring methods
2. Assign a daily aggregated intervention score for each country using the calculated intervention weights - these scores will be used for case count projection

We appreciate that users might not have the required CPU or memory to run the ML operations locally, hence we are providing this notebook in addition to the standalone notebook (interventions_scorer.ipynb), so that users can off-load the compute and memory heavy operations to Amazon SageMaker, a cloud based ML platform from AWS

We'll use SageMaker Processing to push a processing script to a SageMaker managed container created from a user provided docker image. So we'll start by creating the docker image with all the required python libraries and our custom python modules and helper scripts. Once the docker image is built, we'll push them to Amazon Elastic Container Registry (ECR) service so that SageMaker can use it to locate and launch the container from this image.

## Dockerize the core simulation modules and push the image to ECR

In [1]:
%%sh

# The name of our algorithm
algorithm_name='covid19-simulation'


account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Login Succeeded
Sending build context to Docker daemon  39.92MB
Step 1/20 : FROM ubuntu:16.04
 ---> c522ac0d6194
Step 2/20 : RUN apt-get update
 ---> Using cache
 ---> 6436c1466959
Step 3/20 : RUN apt-get -y install software-properties-common python-software-properties
 ---> Using cache
 ---> fb5b4c93ac0b
Step 4/20 : RUN add-apt-repository ppa:deadsnakes/ppa
 ---> Using cache
 ---> b4711930d7f7
Step 5/20 : RUN apt-get update
 ---> Using cache
 ---> 2f4cc0a97c70
Step 6/20 : RUN apt-get install --fix-missing -y wget curl unzip python3.6
 ---> Using cache
 ---> 61feafc1add3
Step 7/20 : RUN wget https://bootstrap.pypa.io/get-pip.py
 ---> Using cache
 ---> 87c9b1a07282
Step 8/20 : RUN curl https://bootstrap.pypa.io/get-pip.py | python3.6
 ---> Using cache
 ---> 20be20bcb620
Step 9/20 : RUN apt purge -y python2.7-minimal
 ---> Using cache
 ---> fcc83b30cf96
Step 10/20 : RUN ln -s /usr/bin/python3.6 /usr/bin/python
 ---> Using cache
 ---> 630188b04f5c
Step 11/20 : RUN apt-get install libgom



## Create a ScriptProcessor object 

In [2]:
import sagemaker
from sagemaker import get_execution_role
from time import gmtime, strftime
from sagemaker.processing import ScriptProcessor, ProcessingInput
import boto3
account_id = boto3.client('sts').get_caller_identity().get('Account')
region = boto3.session.Session().region_name
sagemaker_session = sagemaker.Session()
role = 'covid19_sagemaker_exec' #get_execution_role()
ecr_repository = 'covid19-simulation'
tag = ':latest'
uri_suffix = 'amazonaws.com'
if region in ['cn-north-1', 'cn-northwest-1']:
    uri_suffix = 'amazonaws.com.cn'
covid_repository_uri = '{}.dkr.ecr.{}.{}/{}'.format(account_id, region, uri_suffix, ecr_repository + tag)

In [3]:
covid_processor = ScriptProcessor(base_job_name='covid19-simulation',
                                  image_uri=covid_repository_uri,
                                  command=['python'],
                                  role=role,
                                  instance_count=1,
                                  instance_type='ml.r5.xlarge',
                                  max_runtime_in_seconds=1200,
                                  env={'mode': 'python'})

## Derive the Effectiveness Score for different interventions

We used the publicly available data from https://oxcgrtportal.azurewebsites.net/api/CSVDownload for our experiments. But feel free to use any other more granular data following similar data structure.

### Source the data appropriately and upload it to S3 bucket

In our case, we can download the latest intervention data as <i>OxCGRT_Download_Full.csv</i>  from the URL above into <i>../data</i> folder before running the subsequent code

In [4]:
import sys
import os
import urllib
sys.path.insert(1, 'src')
import config

# Set this flag to True if you want to download the latest COVID19 intervention data from respective web source
# Set it as False in case of subsequent runs on the same day.
LOAD_LATEST_DATA = True
    
if LOAD_LATEST_DATA:
    url = config.oxcgrt_intervention_data_online
    local_file = os.path.join(config.base_data_dir, config.oxcgrt_intervention_data_offline)
    #try:
    with urllib.request.urlopen(url) as response, open(local_file, 'wb') as out_file:
        data = response.read() # a `bytes` object
        out_file.write(data)
        print ('Downloaded latest data from: {}'.format(url))
#     except Exception as e:
#         print ('Error while downloading {}: {}'.format(url, e.__class__)) 

Downloaded latest data from: https://oxcgrtportal.azurewebsites.net/api/CSVDownload


In [5]:
import boto3
s3 = boto3.resource('s3')
def copy_to_s3(local_file, s3_path, override=False):
    assert s3_path.startswith('s3://')
    split = s3_path.split('/')
    bucket = split[2]
    path = '/'.join(split[3:])
    buk = s3.Bucket(bucket)
    
    if len(list(buk.objects.filter(Prefix=path))) > 0:
        if not override:
            print('File already exists.\nSet override to upload anyway.\n')
            return
        else:
            print('Overwriting existing file')
    with open(local_file, 'rb') as data:
        print('Uploading file to {}'.format(s3_path))
        buk.put_object(Key=path, Body=data)


In [6]:
#set your S3 bucket name here
bucket_name = 'covid19-sim-dummy'
input_prefix = 'covid19'
input_file_name = 'OxCGRT_Download_Full.csv'
local_path = 'data/input/{}'.format(input_file_name)
s3_data_path = 's3://{}/{}/{}'.format(bucket_name, input_prefix, input_file_name)

s3_output_path = 's3://{}/{}/{}'.format(bucket_name, input_prefix, 'intervention_impact')
buk = s3.Bucket(bucket_name)
if len(list(buk.objects.filter(Prefix="{}/{}/{}".format(bucket_name, input_prefix, 'intervention_impact')))) == 0:
    dir_name = "{}/{}/{}/".format(bucket_name, input_prefix, 'intervention_impact')
    s3_serv = boto3.client('s3')
    s3_serv.put_object(Bucket=bucket_name, Key=(dir_name))
    print ('Created directory: {}/{}/{}/'.format(bucket_name, input_prefix, 'intervention_impact'))
    
print(s3_data_path)

copy_to_s3(local_path, s3_data_path, override=True)

s3://covid19-sim-dummy/covid19/OxCGRT_Download_Full.csv
Overwriting existing file
Uploading file to s3://covid19-sim-dummy/covid19/OxCGRT_Download_Full.csv


### Create the Intervention Scoring script

This script is the entry point to the intervention score computation process. 

In [7]:
%%writefile intervention_scorer.py
import sys
sys.path.insert(1, '/opt/program')
import config
config.sagemaker_run = True
config.base_data_dir = config.base_data_dir_sagemaker
config.base_output_dir = config.base_output_dir_sagemaker 

import intervention_effectiveness_scorer as intv_scorer

import pandas as pd
import numpy as np
import pickle
import time
import os
from datetime import datetime, timedelta

    
if __name__=='__main__':
    
    #parser = argparse.ArgumentParser()
    #args, _ = parser.parse_known_args()
    # Convert command line args into a map of args
    args_iter = iter(sys.argv[1:])
    args = dict(zip(args_iter, args_iter))
    
    #Data source for the whole analysis
    intv_scorer.data_src = args['data_src']
    #Select a country only if it has exceeded the conf_cases_threshold
    intv_scorer.conf_cases_threshold = int(args['conf_cases_threshold'])
    #Select records having confirmed cases >= min_case_threshold
    intv_scorer.min_case_threshold = int(args['min_case_threshold'])
    #window for rollong averages of conf case counts
    intv_scorer.smoothing_window_len = int(args['smoothing_window_len'])
    #number of lags to use for time-series style modeling of conf cases
    intv_scorer.num_lags = int(args['num_lags'])
    #Skip a few recent dayes data for potential missing values
    intv_scorer.recent_days_to_skip = int(args['recent_days_to_skip'])
    #median incubation period for Covid19
    intv_scorer.incubation_period = int(args['incubation_period'])
    
    #Export location of intervention scores
    analysis_outcome_export_loc = args['analysis_outcome_export_loc']
    #Export location of weighted & aggregated intervention scores
    aggregated_intervention_scores_export_loc = args['aggregated_intervention_scores_export_loc']
    
    fit_stringency_index = 0.5
    fit_conf_cases = 0.25
    fit_intv_effect = 0.25
    if 'fit_stringency_index' in args:
        fit_stringency_index = float(args['fit_stringency_index'])
    if 'fit_conf_cases' in args:
        fit_conf_cases = float(args['fit_conf_cases'])
    if 'fit_intv_effect' in args:
        fit_intv_effect = float(args['fit_intv_effect'])
    
    intv_scorer.intervention_scoring_methods = {'fit_stringency_index':fit_stringency_index, 
                                    'fit_conf_cases':fit_conf_cases, 
                                    'fit_intv_effect':fit_intv_effect}
      
    if 'selected_countries' in args:
        selected_countries = args['selected_countries']
    
    # Calculating relative weights/importance of different interventions
    data_all, selected_countries, all_country_intv_scores = intv_scorer.score_interventions (selected_countries=None)
    all_country_intv_scores.to_csv(analysis_outcome_export_loc, index=False)
    
    interventions = all_country_intv_scores['intervention'].unique().tolist()
    relevant_cols = ['CountryName', 'CountryCode', 'ConfirmedCases', 'ConfirmedDeaths', 'StringencyIndex'] + interventions
    data_filtered = data_all.loc[data_all['CountryCode'].isin(selected_countries), relevant_cols].copy()
    data_filtered.reset_index(inplace=True)
    data_filtered.fillna(0, inplace=True)
    # Assign an aggregated intervention score for each country, each day
    data_filtered = intv_scorer.assign_weighted_aggregations (data_filtered, all_country_intv_scores, selected_countries)
    data_filtered.to_csv(aggregated_intervention_scores_export_loc)
    

Writing intervention_scorer.py


### Launch the Intervention scoring on SageMaker 

In [8]:
from sagemaker.processing import ProcessingInput, ProcessingOutput
import shutil

data_src = '/opt/ml/processing/input/OxCGRT_Download_Full.csv'
selected_countries = ''
#Select a country only if it has exceeded the conf_cases_threshold
conf_cases_threshold = 10000
#Select records having confirmed cases >= min_case_threshold
min_case_threshold = 0
#window for rollong averages of conf case counts
smoothing_window_len = 3
#number of lags to use for time-series style modeling of conf cases
num_lags = 1
#Skip a few recent dayes data for potential missing values
recent_days_to_skip = 5 
#median incubation period for Covid19
incubation_period = 5

fit_stringency_index = 0.5
fit_conf_cases = 0.5
fit_intv_effect = 0.0

#Export location of intervention scores
analysis_outcome_export_loc = '/opt/ml/processing/out/countries_intervention_impacts.csv'
#Export location of weighted & aggregated intervention scores
aggregated_intervention_scores_export_loc = '/opt/ml/processing/out/countries_aggr_intervention_scores.csv'


covid_processor.run(code='intervention_scorer.py',
                      inputs=[ProcessingInput(
                        source=s3_data_path,
                        input_name='OxCGRT_Download_Full.csv',
                        destination='/opt/ml/processing/input')], 
                      outputs=[ProcessingOutput(output_name='simulation_output',
                                                source='/opt/ml/processing/out',
                                                destination=s3_output_path)],
                      arguments=['data_src', data_src, \
                                 'conf_cases_threshold', str(conf_cases_threshold), 'min_case_threshold', str(min_case_threshold), \
                                 'smoothing_window_len', str(smoothing_window_len), 'num_lags', str(num_lags), \
                                 'recent_days_to_skip', str(recent_days_to_skip), 'incubation_period', str(incubation_period), \
                                 'analysis_outcome_export_loc', analysis_outcome_export_loc, \
                                 'aggregated_intervention_scores_export_loc', aggregated_intervention_scores_export_loc, \
                                 'fit_stringency_index', str(fit_stringency_index), 'fit_conf_cases', str(fit_conf_cases), \
                                 'fit_intv_effect', str(fit_intv_effect)], 
                     logs=True)

preprocessing_job_description = covid_processor.jobs[-1].describe()

preprocessing_job_description



Parameter 'session' will be renamed to 'sagemaker_session' in SageMaker Python SDK v2.



Job Name:  covid19-simulation-2020-08-24-13-05-20-649
Inputs:  [{'InputName': 'OxCGRT_Download_Full.csv', 'S3Input': {'S3Uri': 's3://covid19-sim-dummy/covid19/OxCGRT_Download_Full.csv', 'LocalPath': '/opt/ml/processing/input', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'code', 'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-161150306912/covid19-simulation-2020-08-24-13-05-20-649/input/code/intervention_scorer.py', 'LocalPath': '/opt/ml/processing/input/code', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}]
Outputs:  [{'OutputName': 'simulation_output', 'S3Output': {'S3Uri': 's3://covid19-sim-dummy/covid19/intervention_impact', 'LocalPath': '/opt/ml/processing/out', 'S3UploadMode': 'EndOfJob'}}]
.........................[34mCountries with more than 10000 confirmed cases: 87[0m
[34m* * * * * * * * * * Scoring inter

[34m* * * * * * * * * * Scoring interventions for country: Saudi Arabia [SAU][0m
[34mData dimension for counry SAU is (232, 17)[0m
[34m* * * * * * * * * * Scoring interventions for country: Sudan [SDN][0m
[34mData dimension for counry SDN is (232, 17)[0m
[34m* * * * * * * * * * Scoring interventions for country: Senegal [SEN][0m
[34mData dimension for counry SEN is (232, 17)[0m
[34m* * * * * * * * * * Scoring interventions for country: Singapore [SGP][0m
[34mData dimension for counry SGP is (209, 17)[0m
[34m* * * * * * * * * * Scoring interventions for country: El Salvador [SLV][0m
[34mData dimension for counry SLV is (232, 17)[0m
[34m* * * * * * * * * * Scoring interventions for country: Serbia [SRB][0m
[34mData dimension for counry SRB is (232, 17)[0m
[34m* * * * * * * * * * Scoring interventions for country: Sweden [SWE][0m
[34mData dimension for counry SWE is (201, 17)[0m
[34m* * * * * * * * * * Scoring interventions for country: Turkey [TUR][0m
[34mD

{'ProcessingInputs': [{'InputName': 'OxCGRT_Download_Full.csv',
   'S3Input': {'S3Uri': 's3://covid19-sim-dummy/covid19/OxCGRT_Download_Full.csv',
    'LocalPath': '/opt/ml/processing/input',
    'S3DataType': 'S3Prefix',
    'S3InputMode': 'File',
    'S3DataDistributionType': 'FullyReplicated',
    'S3CompressionType': 'None'}},
  {'InputName': 'code',
   'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-161150306912/covid19-simulation-2020-08-24-13-05-20-649/input/code/intervention_scorer.py',
    'LocalPath': '/opt/ml/processing/input/code',
    'S3DataType': 'S3Prefix',
    'S3InputMode': 'File',
    'S3DataDistributionType': 'FullyReplicated',
    'S3CompressionType': 'None'}}],
 'ProcessingOutputConfig': {'Outputs': [{'OutputName': 'simulation_output',
    'S3Output': {'S3Uri': 's3://covid19-sim-dummy/covid19/intervention_impact',
     'LocalPath': '/opt/ml/processing/out',
     'S3UploadMode': 'EndOfJob'}}]},
 'ProcessingJobName': 'covid19-simulation-2020-08-24-13-05-20-649',
 'Pr

### Download the results back in your local environment
The intervention scoring results need to be downloaded from S3 as those would be required while running the Simulation process (covid19_simulator_sagemker.ipynb)

In [9]:
!aws s3 cp 's3://covid19-sim-dummy/covid19/intervention_impact/countries_intervention_impacts.csv' ./data/input
!aws s3 cp 's3://covid19-sim-dummy/covid19/intervention_impact/countries_aggr_intervention_scores.csv' ./data/input

download: s3://covid19-sim-dummy/covid19/intervention_impact/countries_intervention_impacts.csv to data/input/countries_intervention_impacts.csv
download: s3://covid19-sim-dummy/covid19/intervention_impact/countries_aggr_intervention_scores.csv to data/input/countries_aggr_intervention_scores.csv
