# Now, we can start a new training job

We'll send a zip file called **trainingjob.zip**, with the following structure:
 - trainingjob.json (Sagemaker training job descriptor)
 - monitoring.json (Sagemaker monitoring inputs for data capture, baseline and schedule)
 - assets/deploy-model-prd.yml (Cloudformation for deploying our model into Production)
 - assets/deploy-model-dev.yml (Cloudformation for deploying our model into Development)

### Create the training job decriptor

This includes some hyper parameters

In [None]:
hyperparameters = {
    "epochs": 100,
    "batch_size": 128,
}

And the training job image, and name

In [None]:
import time
import sagemaker
import boto3

sts_client = boto3.client("sts")

sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket()

prefix='iris-model'
account_id = sts_client.get_caller_identity()["Account"]
region = boto3.session.Session().region_name
training_image = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account_id, region, prefix)
roleArn = "arn:aws:iam::{}:role/MLOps".format(account_id)
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
job_name = prefix + timestamp

training_params = {}

# Here we set the reference for the Image Classification Docker image, stored on ECR (https://aws.amazon.com/pt/ecr/)
training_params["AlgorithmSpecification"] = {
    "TrainingImage": training_image,
    "TrainingInputMode": "File"
}

# The IAM role with all the permissions given to Sagemaker
training_params["RoleArn"] = roleArn

# Here Sagemaker will store the final trained model
training_params["OutputDataConfig"] = {
    "S3OutputPath": 's3://{}/{}'.format(bucket, prefix)
}

# This is the config of the instance that will execute the training
training_params["ResourceConfig"] = {
    "InstanceCount": 1,
    "InstanceType": "ml.m4.xlarge",
    "VolumeSizeInGB": 30
}

# The job name. You'll see this name in the Jobs section of the Sagemaker's console
training_params["TrainingJobName"] = job_name

for i in hyperparameters:
    hyperparameters[i] = str(hyperparameters[i])
    
# Here you will configure the hyperparameters used for training your model.
training_params["HyperParameters"] = hyperparameters

# Training timeout
training_params["StoppingCondition"] = {
    "MaxRuntimeInSeconds": 360000
}

# The algorithm currently only supports fullyreplicated model (where data is copied onto each machine)
training_params["InputDataConfig"] = [{
    "ChannelName": "training",
    "DataSource": {
        "S3DataSource": {
            "S3DataType": "S3Prefix",
            "S3Uri": 's3://{}/{}/input/training'.format(bucket, prefix),
            "S3DataDistributionType": "FullyReplicated"
        }
    },
    "ContentType": "text/csv",
    "CompressionType": "None"
},{
    "ChannelName": "validation",
    "DataSource": {
        "S3DataSource": {
            "S3DataType": "S3Prefix",
            "S3Uri": 's3://{}/{}/input/validation'.format(bucket, prefix),
            "S3DataDistributionType": "FullyReplicated"
        }
    },
    "ContentType": "text/csv",
    "CompressionType": "None"
}]
training_params["Tags"] = []

###  Upload training data

Validate the training / test sets and upload these

In [None]:
!head -2 input/data/training/train.csv

In [None]:
!head -2 input/data/validation/test.csv

In [None]:
train_loc = sagemaker_session.upload_data(path='input/data/training', key_prefix='iris-model/input/training')
val_loc = sagemaker_session.upload_data(path='input/data/validation', key_prefix='iris-model/input/validation')

print('training: {}\nvalidation: {}'.format(train_loc, val_loc))

## Create monitoring inputs

Set data capture config for endpoints

1. Data Capture log output
2. Baseline input location with file uploaded to s3
3. Baseline results s3 location
4. Schedule resports s3 location

In [None]:
data_capture_uri = 's3://{}/{}/datacapture'.format(bucket, prefix)
print('data capture uri: {}'.format(data_capture_uri))

Use the output predictions from testing for baseline file.  Make sure we have headers on this file

In [None]:
# Inspect the output predictions (NOTE: if using scientific format these will be treated as strings)
baseline_file = 'output/data/predictions.csv'
!head -2 $baseline_file

In [None]:
# Upload the predictions as baseline file
boto3.Session().resource('s3').Bucket(bucket).Object(baseline_file).upload_file(baseline_file)

In [None]:
# copy over the training dataset to Amazon S3 (if you already have it in Amazon S3, you could reuse it)
baseline_prefix = prefix + '/baselining'
baseline_results_prefix = baseline_prefix + '/results'

baseline_data_uri = 's3://{}/{}'.format(bucket,baseline_file)
baseline_results_uri = 's3://{}/{}'.format(bucket, baseline_results_prefix)
print('Baseline data file: {}'.format(baseline_data_uri))
print('Baseline results uri: {}'.format(baseline_results_uri))

Lets define the location for the monitor schedule outputs

In [None]:
monitoring_reports_uri = 's3://{}/{}/monitoring/reports'.format(bucket, prefix)

print('monitoring reports: {}'.format(monitoring_reports_uri))

Set the training job hash so we can force update of deployment

In [None]:
import hashlib
import json

training_hash = hashlib.sha256(json.dumps(training_params).encode('utf-8')).hexdigest()
print('training hash: {}'.format(training_hash))

In [None]:
monitoring_params = {
    'TrainSha256': training_hash,
    'DataCaptureUri': data_capture_uri,
    'MonitoringRoleArn': roleArn,
    'BaselineInputUri': baseline_data_uri,
    'BaselineResultsUri':  baseline_result_uri,
    'ScheduleReportsUri': monitoring_reports_uri,
    'ScheduleMetricName': 'feature_baseline_drift_class_predictions', # alarm on class predictions drift
    'ScheduleMetricThreshold': str(0.4) # Must serialize parameters as string
}

Create a cloudformation template package which includes inputs from trainingjob.json.

Until AutoPublishCodeSha256 support to force Lambda redployment [see PR](https://github.com/awslabs/serverless-application-model/pull/1376) we need to update the lambda zip contents

In [None]:
# TEMP: Write a new file to the API directory to force refresh
with open('../../api/trainingjob.json', 'w') as f:
    json.dump(training_params, f)
with open('../../api/monitoring.json', 'w') as f:
    json.dump(monitoring_params, f)

### Upload deployment artifacts 

Install the [custom resource helper](https://github.com/aws-cloudformation/custom-resource-helper) into the cfn folder

In [None]:
!pip install -t ../../cfn crhelper

Generate the cloud formation template with API serverless endpoints uploading code to sagemaker bucket

In [None]:
!aws cloudformation package --template-file ../../assets/deploy-model-prd.yml \
    --output-template-file ../../assets/template-model-prd.yml --s3-bucket $bucket

Verify the template has been generated correctly

In [None]:
!cat ../../assets/template-model-prd.yml

### Start Deployment

Upload a file to S3 to start the deployment pipeline

In [None]:
import boto3
import io
import zipfile
import json

s3 = boto3.client('s3')
sts_client = boto3.client("sts")

session = boto3.session.Session()

account_id = sts_client.get_caller_identity()["Account"]
region = session.region_name

bucket_name = "mlops-%s-%s" % (region, account_id)
key_name = "training_jobs/iris_model/trainingjob.zip"

zip_buffer = io.BytesIO()
with zipfile.ZipFile(zip_buffer, 'a') as zf:
    zf.writestr('trainingjob.json', open('../../api/trainingjob.json', 'r').read())
    zf.writestr('monitoring.json', open('../../api/monitoring.json', 'r').read()) 
    zf.writestr('assets/deploy-model-prd.yml', open('../../assets/template-model-prd.yml', 'r').read())
    zf.writestr('assets/deploy-model-dev.yml', open('../../assets/deploy-model-dev.yml', 'r').read())
    zf.writestr('assets/wait-training-job.yml', open('../../assets/wait-training-job.yml', 'r').read())

zip_buffer.seek(0)

s3.put_object(Bucket=bucket_name, Key=key_name, Body=bytearray(zip_buffer.read()))

### Ok, now open the AWS console in another tab and go to the CodePipeline console to see the status of our building pipeline

> Finally, click here [NOTEBOOK](04_Check%20Progress%20and%20Test%20the%20endpoint.ipynb) to see the progress and test your endpoint

## Appendix: Suggest Baseline

If you want to create your own baseline you can do it here below, or load one from the workflow

Now that we have the training data ready in S3, let's kick off a job to `suggest` constraints. `DefaultModelMonitor.suggest_baseline(..)` kicks off a `ProcessingJob` using a SageMaker provided Model Monitor container to generate the constraints. Please edit the configurations to fit your needs.

In [None]:
from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor.dataset_format import DatasetFormat
from sagemaker import get_execution_role

role = get_execution_role()

my_default_monitor = DefaultModelMonitor(
    role=role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    volume_size_in_gb=20,
    max_runtime_in_seconds=3600,
)

my_default_monitor.suggest_baseline(
    baseline_dataset=baseline_data_uri,
    dataset_format=DatasetFormat.csv(header=True),
    output_s3_uri=baseline_results_uri,
    wait=True
)
baseline_job = my_default_monitor.latest_baselining_job

In [None]:
# Load a processiong job from name
from sagemaker.processing import ProcessingJob
baseline_job = ProcessingJob.from_processing_name(sagemaker_session, 'mlops-processingjob-2')

In [None]:
status = baseline_job.describe()['ProcessingJobStatus']
if status != 'Completed':
    raise(Exception('Processing job not completed, status: {}'.format(status)))
    
baseline_result_uri  = baseline_job.outputs[0].destination

### Explore the generated constraints and statistics

In [None]:
import pandas as pd

schema_df = pd.io.json.json_normalize(baseline_job.baseline_statistics().body_dict["features"])
schema_df.head(10)

In [None]:
constraints_df = pd.io.json.json_normalize(baseline_job.suggested_constraints().body_dict["features"])
constraints_df.head(10)