# Now, it is time for start an automated ML pipeline using the MLOps environment

We'll do that by putting a zip file, called **trainingjob.zip**, in an S3 bucket. CodePipeline will listen to that bucket and start a job. This zip file has the following structure:
 - trainingjob.json (Sagemaker training job descriptor)
 - environment.json (Instructions to the environment of how to deploy and prepare the endpoints)

### 1.1 Let's start defining the hyperparameters and other attributes

If you ran the previous section **01_CreateAlgorithmContainer** and managed to create a custom container, please change the following variable in the next cell, from:  
```Python
use_xgboost_builtin=True
```
to:
```Python
use_xgboost_builtin=False
```

In [None]:
import sagemaker
import boto3

use_xgboost_builtin=True

sts_client = boto3.client("sts")
account_id = sts_client.get_caller_identity()["Account"]
region = boto3.session.Session().region_name
model_prefix='iris-model'
training_image = None
hyperparameters = None
if use_xgboost_builtin: 
    training_image = sagemaker.image_uris.retrieve('xgboost', boto3.Session().region_name, version='1.0-1')
    hyperparameters = {
        "alpha": 0.42495142279951414,
        "eta": 0.4307531922567607,
        "gamma": 1.8028358018081714,
        "max_depth": 10,
        "min_child_weight": 5.925133573560345,
        "num_class": 3,
        "num_round": 30,
        "objective": "multi:softmax",
        "reg_lambda": 10,
        "silent": 0,
    }
else:
    training_image = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account_id, region, model_prefix)
    hyperparameters = {
        "max_depth": 11,
        "n_jobs": 5,
        "n_estimators": 120
    }
print(training_image)

### 1.2 Then, let's  create the trainingjob descriptor

In [None]:
import time
import sagemaker
import boto3

roleArn = "arn:aws:iam::{}:role/MLOps".format(account_id)
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
job_name = model_prefix + timestamp
sagemaker_session = sagemaker.Session()

training_params = {}

# Here we set the reference for the Image Classification Docker image, stored on ECR (https://aws.amazon.com/pt/ecr/)
training_params["AlgorithmSpecification"] = {
    "TrainingImage": training_image,
    "TrainingInputMode": "File"
}

# The IAM role with all the permissions given to Sagemaker
training_params["RoleArn"] = roleArn

# Here Sagemaker will store the final trained model
training_params["OutputDataConfig"] = {
    "S3OutputPath": 's3://{}/{}'.format(sagemaker_session.default_bucket(), model_prefix)
}

# This is the config of the instance that will execute the training
training_params["ResourceConfig"] = {
    "InstanceCount": 1,
    "InstanceType": "ml.m4.xlarge",
    "VolumeSizeInGB": 30
}

# The job name. You'll see this name in the Jobs section of the Sagemaker's console
training_params["TrainingJobName"] = job_name

for i in hyperparameters:
    hyperparameters[i] = str(hyperparameters[i])
    
# Here you will configure the hyperparameters used for training your model.
training_params["HyperParameters"] = hyperparameters

# Training timeout
training_params["StoppingCondition"] = {
    "MaxRuntimeInSeconds": 360000
}

# The algorithm currently only supports fullyreplicated model (where data is copied onto each machine)
training_params["InputDataConfig"] = []

# Please notice that we're using application/x-recordio for both 
# training and validation datasets, given our dataset is formated in RecordIO

# Here we set training dataset
training_params["InputDataConfig"].append({
    "ChannelName": "train",
    "DataSource": {
        "S3DataSource": {
            "S3DataType": "S3Prefix",
            "S3Uri": 's3://{}/{}/input/train'.format(sagemaker_session.default_bucket(), model_prefix),
            "S3DataDistributionType": "FullyReplicated"
        }
    },
    "ContentType": "text/csv",
    "CompressionType": "None"
})
training_params["InputDataConfig"].append({
    "ChannelName": "validation",
    "DataSource": {
        "S3DataSource": {
            "S3DataType": "S3Prefix",
            "S3Uri": 's3://{}/{}/input/validation'.format(sagemaker_session.default_bucket(), model_prefix),
            "S3DataDistributionType": "FullyReplicated"
        }
    },
    "ContentType": "text/csv",
    "CompressionType": "None"
})
training_params["Tags"] = []

In [None]:
deployment_params = {
    "EndpointPrefix": model_prefix,
    "DevelopmentEndpoint": {
        # we want to enable the endpoint monitoring
        "InferenceMonitoring": True,
        # we will collect 100% of all the requests/predictions
        "InferenceMonitoringSampling": 100,
        "InferenceMonitoringOutputBucket": 's3://{}/{}/monitoring/dev'.format(sagemaker_session.default_bucket(), model_prefix),
        # we don't want to enable A/B tests in development
        "ABTests": False,
        # we'll use a basic instance for testing purposes
        "InstanceType": "ml.t2.large",
        "InitialInstanceCount": 1,
        # we don't want high availability/escalability for development
        "AutoScaling": None
    },
    "ProductionEndpoint": {
        # we want to enable the endpoint monitoring
        "InferenceMonitoring": True,
        # we will collect 100% of all the requests/predictions
        "InferenceMonitoringSampling": 100,
        "InferenceMonitoringOutputBucket": 's3://{}/{}/monitoring/prd'.format(sagemaker_session.default_bucket(), model_prefix),
        # we want to do A/B tests in production
        "ABTests": True,
        # we'll use a better instance for production. CPU optimized
        "InstanceType": "ml.c5.large",
        "InitialInstanceCount": 2,
        "InitialVariantWeight": 0.1,
        # we want elasticity. at minimum 2 instances to support the endpoint and at maximum 10
        # we'll use a threshold of 750 predictions per instance to start adding new instances or remove them
        "AutoScaling": {
            "MinCapacity": 2,
            "MaxCapacity": 10,
            "TargetValue": 200.0,
            "ScaleInCooldown": 30,
            "ScaleOutCooldown": 60,
            "PredefinedMetricType": "SageMakerVariantInvocationsPerInstance"
        }
    }
}

#### The dataset

The dataset was already uploaded in the Exercise: **01 - Creating a Classifier Container**. So, we just need to start a new automated training/deployment job in our MLOps env.

### 1.3 Alright! Now it's time to start the training process

In [None]:
import boto3
import io
import zipfile
import json

s3 = boto3.client('s3')
sts_client = boto3.client("sts")

session = boto3.session.Session()

account_id = sts_client.get_caller_identity()["Account"]
region = session.region_name
print(region, account_id)
bucket_name = "mlops-%s-%s" % (region, account_id)
key_name = "training_jobs/%s/trainingjob.zip" % model_prefix

zip_buffer = io.BytesIO()
with zipfile.ZipFile(zip_buffer, 'a') as zf:
    zf.writestr('trainingjob.json', json.dumps(training_params))
    zf.writestr('deployment.json', json.dumps(deployment_params))
zip_buffer.seek(0)

s3.put_object(Bucket=bucket_name, Key=key_name, Body=bytearray(zip_buffer.read()))

### Ok, now open the AWS console in another tab and go to the CodePipeline console to see the status of our building pipeline

> Now, click on [THIS NOTEBOOK](02_Check%20Progress%20and%20Test%20the%20endpoint.ipynb) to see the progress and test your endpoint

# A/B TESTS

If you take a look on the **deployment** parameters you'll see that we enabled the **Production** endpoint for A/B tests. To try this, just deploy the first model into production, then run the section **1.3** again. Feel free to change some hyperparameter values in the section **1.1** before starting a new training session.

When publishing the second model into **Development**, the endpoint will be updated and the model will be replaced without compromising the user experience. This is the natural behavior of an Endpoint in SageMaker when you update it.

After you approve the deployment into **Production**, the endponint will be updated and a second model will be added to it. Now it's time to execute some **A/B tests**. In the **Progress** Jupyter (link above), execute the last cell (test code) to show which model answered your request. You just need to keep sending some requests to see the **Production** endpoint using both models A and B, respecting the proportion defined by the variable **InitialVariantWeight** in the deployment params.

In a real life scenario you can monitor the performance of both models and then adjust the **Weight** of each model to do the full transition to the new model (and remove the old one) or to rollback the new deployment.

To adjust the weight of each model (Variant Name) in an endpoint, you just need to call the following function: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.update_endpoint_weights_and_capacities