<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Monitor Sagemaker ML With Watson OpenScale

In this notebook, we will use a German Credit dataset to create a logistic regression model using AWS SageMaker. We'll prepare the data and store it in AWS S3, create the model, and deploy the model to the AWS cloud. We'll then score the model.

Contents
 - [1.0 Setup](#setup)
 - [2.0 Load and explore data](#load)
 - [3.0 Create logistic regression model using SageMaker linear-learner algorithm](#model)
 - [4.0 Deploy the SageMaker model in the AWS Cloud](#deploy)
 - [5.0 Score the model](#score)

**Note:** This notebook works correctly with kernel `Python 3.7.x`.

## 1.0 Setup<a id="setup"></a>

Before you use the sample code in this notebook, you must perform the following setup tasks:

- [Create an AWS SageMaker Service](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-set-up.html), and [get the AWS keys](https://github.com/IBM/monitor-sagemaker-ml-with-watson-openscale#get-aws-keys)
- Install reqiured python packages from PyPi repository

### Package installation

In [None]:
!pip install -U boto3 | tail -n 1
!pip install -U sagemaker | tail -n 1
!pip install -U pandas==1.2.5 | tail -n 1
!pip install -U scikit_learn==0.20.3 | tail -n 1
!pip install -U category_encoders | tail -n 1

## Restart the kernel now to ensure the recently installed packages are used

## 2.0 Load and explore data<a id="load"></a>

In this section you will prepare your data for training using SageMaker linear-learner algorithm.

- Load data from github repository
- Explore data
- Prepare training data
- Store training data in S3 Object Storage

### 2.1 Load data from github repository

In [None]:
import numpy as np
import pandas as pd

In [None]:
data = pd.read_csv('https://raw.githubusercontent.com/IBM/monitor-sagemaker-ml-with-watson-openscale/master/data/credit_risk_training.csv')

### 2.2 Explore data

In [None]:
print("Sample records:")
display(data.head())

print("Label column summary:")
display(data.Risk.value_counts())

### 2.3 Prepare training data

You will use SageMaker build-in linear-learner algorithm. This algorithm expects first column to be the label when training data is in `text/csv` format.

Moreover label column have to be numeric, so you will recode it.

In [None]:
target = 'Risk'
string_features = [nm for nm, ty in zip(data.dtypes.index, data.dtypes.values) if (nm != target) and (ty is np.dtype('O')) ]
numeric_features = [nm for nm, ty in zip(data.dtypes.index, data.dtypes.values) if (nm != target) and (ty is not np.dtype('O'))]

In [None]:
data_recoded = pd.concat([data[[target]], pd.get_dummies(data[string_features]), data[numeric_features]], axis=1)
data_recoded.replace({target: {'Risk': 1, 'No Risk': 0}}, inplace = True)

In [None]:
train_data_filename = 'credit_risk_training_recoded.csv'
data_recoded.to_csv(path_or_buf = train_data_filename, index = False, header = False)

**Note:** Header row have to be omitted. First column have to be target.

In [None]:
print(data_recoded.columns.tolist())

### 2.4 Store training data in S3 Object Storage

In [None]:
import time
import json
import boto3
import sagemaker

#### 2.4.1 Add AWS credentials

In [None]:
aws_credentials = {'access_key': '***', 
                   'secret_key': '***', 
                   'region_name': '***'}

In [None]:
import boto3
import sagemaker

session = boto3.session.Session(
    aws_access_key_id = aws_credentials['access_key'],
    aws_secret_access_key = aws_credentials['secret_key'],
    region_name = aws_credentials['region_name']
)
region = session.region_name
sagemaker_session = sagemaker.Session(session)
bucket = sagemaker_session.default_bucket()

s3 = session.resource('s3')

#### 2.4.2 Get bucket name

In [None]:
print('Default bucket: {}'.format(bucket))


**Tip:** You can run following code `[bkt.name for bkt in s3.buckets.all()]` to list all your buckets.

In [None]:
[bkt.name for bkt in s3.buckets.all()]

#### 2.4.2 Replace `bucket_name` with name of bucket in your S3 Object Storage and path where training data will be stored.


In [None]:
bucket_name = '*******'
train_data_path = 'credit_risk'

In [None]:
output_data_path = 's3://{}/credit-risk/output'.format(bucket_name)
time_suffix = time.strftime("%Y-%m-%d-%H-%M", time.gmtime())

In [None]:
s3_bucket = s3.Bucket(bucket_name)
s3_bucket.upload_file(Filename = train_data_filename, Key = '{}/{}'.format(train_data_path, train_data_filename))

Let's check if your data have been uploaded successfully.

In [None]:
for s3_obj in s3_bucket.objects.all():
    if (s3_obj.bucket_name == bucket_name) and (train_data_path in s3_obj.key):
        train_data_uri = 's3://{}/{}'.format(s3_obj.bucket_name, s3_obj.key)
        print(train_data_uri)

<a id="model"></a>
## 3.0 Create logistic regression model using SageMaker linear-learner algorithm

In this section you will learn how to:

- Setup training parameters
- Start training job

### Setup training parameters

In [None]:
from sagemaker.amazon.amazon_estimator import get_image_uri

sm_client = session.client('sagemaker')

In [None]:
training_image = get_image_uri(session.region_name, 'linear-learner')

iam_client = session.client('iam')
[role_arn, *_] = [role['Arn'] for role in iam_client.list_roles()['Roles'] if 'AmazonSageMaker-ExecutionRole' in role['RoleName'] or  'SagemakerFull' in role['RoleName']]

linear_job_name = 'Credit-risk-linear-learner-' + time_suffix

In [None]:
linear_training_params = {
    "AlgorithmSpecification": {
        "TrainingImage": training_image,
        "TrainingInputMode": "File"
    },
    "HyperParameters": {
        "feature_dim": str(data_recoded.shape[1] - 1),
        "mini_batch_size": "100",
        "predictor_type": "binary_classifier",
        "epochs": "10",
        "num_models": "32",
        "loss": "auto"
    },
    "InputDataConfig": [{
        "ChannelName": "train",
        "ContentType": "text/csv", 
        "DataSource": {
            "S3DataSource": {
                "S3DataType": "S3Prefix",
                "S3Uri": train_data_uri,
                "S3DataDistributionType": "ShardedByS3Key"
            }
        }
    }],
    "OutputDataConfig": {"S3OutputPath": output_data_path},
    "ResourceConfig": {
        "InstanceCount": 1,
        "InstanceType": "ml.c4.xlarge",
        "VolumeSizeInGB": 2
    },
    "RoleArn": role_arn,
    "StoppingCondition": {
        "MaxRuntimeInSeconds": 6 * 60
    },
    "TrainingJobName": linear_job_name

}

### Start training job

In [None]:
sm_client.create_training_job(**linear_training_params)

In [None]:
try:
    sm_client.get_waiter('training_job_completed_or_stopped').wait(TrainingJobName = linear_job_name)
except Exception:
    print('Traing job error.')

train_job_details = sm_client.describe_training_job(TrainingJobName = linear_job_name)
train_job_status = train_job_details['TrainingJobStatus']

if train_job_status == 'Failed':
    print(train_job_details['FailureReason'])
else:
    train_job_arn = train_job_details['TrainingJobArn']
    print(train_job_arn)
    trained_model_uri = train_job_details['ModelArtifacts']['S3ModelArtifacts']
    print(trained_model_uri)

## 4. Deploy the SageMaker model in the AWS Cloud <a id="deploy"></a>

In this section you will learn howto:

- Setup deployment parameters
- Create deployment configuration endpoint
- Create online scoring endpoint

### 4.1 Setup deployment parameters

In [None]:
linear_hosting_container = {'Image': training_image, 'ModelDataUrl': trained_model_uri}

create_model_details = sm_client.create_model(
    ModelName = linear_job_name,
    ExecutionRoleArn = role_arn,
    PrimaryContainer = linear_hosting_container)

print(create_model_details['ModelArn'])

### 4.2 Create deployment configuration endpoint

In [None]:
endpoint_config = 'Credit-risk-linear-endpoint-config-' + time_suffix
print(endpoint_config)

create_endpoint_config_details = sm_client.create_endpoint_config(
    EndpointConfigName = endpoint_config,
    ProductionVariants = [{
        'InstanceType': 'ml.m4.xlarge',
        'InitialInstanceCount': 1,
        'ModelName': linear_job_name,
        'VariantName': 'AllTraffic'}])

endpoint_config_details = sm_client.describe_endpoint_config(EndpointConfigName = endpoint_config)
print(endpoint_config_details)

### 4.3 Create online scoring endpoint

In [None]:
scoring_endpoint = 'Credit-risk-endpoint-scoring-' + time_suffix

create_endpoint_details = sm_client.create_endpoint(
    EndpointName = scoring_endpoint,
    EndpointConfigName = endpoint_config)

In [None]:
try:
    sm_client.get_waiter('endpoint_in_service').wait(EndpointName = scoring_endpoint)
except Exception:
    print('Create scoring endpoint error')

scoring_endpoint_details = sm_client.describe_endpoint(EndpointName = scoring_endpoint)
scoring_enpoint_config_status = scoring_endpoint_details['EndpointStatus']

if scoring_enpoint_config_status != 'InService':
    print(scoring_endpoint_details['FailureReason'])
else:
    print(scoring_endpoint_details['EndpointArn'])

## 5. Score the model <a id="score"></a>

In this section you will learn howto score deployed model.

- Prepare sample data for scoring
- Send payload for scoring

### 5.1 Prepare sample data for scoring

You will use data in `csv` format as scoring payload. First column (label) is removed from data. Last 20 training records are selected as scoring payload.

In [None]:
scoring_data_filename = 'credit_risk_scoring_recoded.csv'

In [None]:
with open(train_data_filename) as f_train:
    with open(scoring_data_filename, 'w') as f_score:
        f_score.writelines([','.join(line.split(',')[1:]) for line in f_train.readlines()[-10:]])

### 5.2 Send payload for scoring

In [None]:
sm_runtime = session.client('runtime.sagemaker')

with open(scoring_data_filename) as f_payload:
    scoring_response = sm_runtime.invoke_endpoint(EndpointName = scoring_endpoint,
                                                  ContentType = 'text/csv',
                                                  Body = f_payload.read().encode())
    
    scored_records = scoring_response['Body'].read().decode()
    print(json.loads(scored_records))