# Lab 5: Create an end to end Machine Learning Workflow using SageMaker Pipeline

## Introduction

In this notebook, you will build a machine learning workflow using SageMaker Pipeline that automates end-to-end process of data preparation, model training, and deploying that detects fraudulent automobile insurance claims. SageMaker Pipelines is a series of interconnected steps that are defined using the [Pipelines SDK](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html). These steps define the actions that the pipeline takes and the relationships between these steps is defined using properties. We will use the pipeline steps to configure and create end to end machine learning workflow.

## Index

---

1. [Prerequisites](#Prerequisites)
1. [Architecture](#Architecture)
1. [Creating a machine learing workflow using SageMaker Pipeline](#Creating-a-machine-learing-workflow-using-SageMaker-Pipeline)
1. [Clean-Up](#Clean-Up)

## Prerequisites

---



### Install required third-party libraries. 

The default SageMaker notebook container already has many of the packages we'll need e.g Pandas, Numpy and Matplotlib. 

The following extra python modules will be installed in the next cell:
1. `sagemaker`: Python SDK to call the SageMaker API
2. `boto3`: Python SDK to call the AWS API

(if you encounter a warning about pip running as root user - don't worry about it. We're running everything in a container, so it's not going to break anything) 

In [None]:
# Cell 1
!python -m pip install -Uq pip
!python -m pip install -q sagemaker==2.117.0 boto3==1.24.62

### Import libraries

In [None]:
# Cell 2
import json
import boto3
import pathlib
import sagemaker
import numpy as np
import pandas as pd
import string
import time

import scripts

from sagemaker.xgboost.estimator import XGBoost
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import CreateModelStep
from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.workflow.step_collections import RegisterModel
from sagemaker.workflow.steps import ProcessingStep, TrainingStep
from sagemaker.workflow.parameters import ParameterInteger, ParameterFloat, ParameterString
from sagemaker.workflow.lambda_step import (
    LambdaStep,
    LambdaOutput,
    LambdaOutputTypeEnum,
)
from sagemaker.lambda_helper import Lambda
from sagemaker import image_uris
from scripts.demo_helpers import delete_project_resources

In order to load the variables stored from previous notebooks, we need to run the following storemagic cell command.

In [None]:
# Cell 3
%store -r

### Set region and boto3 config

In [None]:
# Cell 4
# You can change this to a region of your choice
region = sagemaker.Session().boto_region_name
print("Using AWS Region: {}".format(region))

boto3.setup_default_session(region_name=region)
boto_session = boto3.Session(region_name=region)

s3_client = boto3.client("s3", region_name=region)

sagemaker_boto_client = boto_session.client("sagemaker")
sagemaker_session = sagemaker.session.Session(
    boto_session=boto_session, sagemaker_client=sagemaker_boto_client
)
sagemaker_role = sagemaker.get_execution_role()

account_id = boto3.client("sts").get_caller_identity()["Account"]

# This variables has been stored in the previous notebooks
bucket = sagemaker_session.default_bucket()
print("Bucket: {}".format(bucket))
prefix = "fraud-detect-demo"
print("Prefix: {}".format(prefix))

claims_fg_name = f"{prefix}-claims"
customers_fg_name = f"{prefix}-customers"

In [None]:
# Cell 5
# ======> Tons of output_paths

training_job_output_path = f"s3://{bucket}/{prefix}/training_jobs"
bias_report_output_path = f"s3://{bucket}/{prefix}/clarify-bias"
explainability_output_path = f"s3://{bucket}/{prefix}/clarify-explainability"

train_data_uri = f"s3://{bucket}/{prefix}/data/train/train.csv"
test_data_uri = f"s3://{bucket}/{prefix}/data/test/test.csv"
processing_dir = "/opt/ml/processing"
create_dataset_script_uri = "scripts/create_dataset.py"
pipeline_bias_output_path = f"s3://{bucket}/{prefix}/clarify-output/pipeline/bias"

# ======> variables used for parameterizing the notebook run
flow_instance_count = 1
flow_instance_type = "ml.m5.4xlarge"

train_instance_count = 1
train_instance_type = "ml.m4.xlarge"

deploy_model_instance_type = "ml.m5.xlarge"

## Architecture

---

![End to end pipeline architecture](images/notebooks/e2e-pipeline.png)

## Creating a machine learing workflow using SageMaker Pipeline

---

- [Step 0: Data preparation prerequisite](#Step-0:-Data-preparation-prerequisite)
- [Step 1: Claims Data Wrangler preprocessing step](#Step-1:-Claims-Data-Wrangler-preprocessing-step)
- [Step 2: Customers Data Wrangler preprocessing step](#Step-2:-Customers-Data-Wrangler-preprocessing-step)
- [Step 3: Create dataset and train/test split](#Step-3:-Create-dataset-and-train/test-split)
- [Step 4: Train XGBoost model](#Step-4:-Train-XGBoost-model)
- [Step 5: Model pre-Deployment step](#Step-5:-Model-pre-deployment-step)
- [Step 6: Register the model](#Step-6:-Register-the-model)
- [Step 7: Model deployment](#Step-7:-Model-deployment)
- [Step 8: Combine and run the Pipeline steps](#Step-8:-Combine-and-run-the-Pipeline-steps)



In this section you will create a pipeline which trains a new model, persists the model in SageMaker and then adds the model to the registry.

### Step 0: Data preparation prerequisite
If you have successfully completed labs 1 through 4, then you can skip this step and proceed with [Step 1](#Step-1:-Claims-Data-Wrangler-Preprocessing-Step). 

#### Create Feature Store
If you choose to run this step, it will take approximately 5 mins to complete.

Copy and paste the bucket name and the prefix on the next cell. 

In [None]:
# Cell 6
!python3 scripts/create_feature_store.py --bucket $bucket --region $region --prefix "fraud-detect-demo"

#### Upload raw data to S3

In [None]:
# Cell 7
s3_client.upload_file(
    Filename="data/claims.csv", Bucket=bucket, Key=f"{prefix}/data/raw/claims.csv"
)
s3_client.upload_file(
    Filename="data/customers.csv", Bucket=bucket, Key=f"{prefix}/data/raw/customers.csv"
)

### Step 1: Claims Data Wrangler preprocessing step

#### Pipeline parameters
An important feature of SageMaker Pipelines is the ability to define the steps ahead of time, but also to be able to change the parameters to those steps at execution without having to re-define the pipeline. 

This can be achieved by using `ParameterInteger`, `ParameterFloat` or `ParameterString` to define a value initally which can be modified when you call `pipeline.start(parameters=parameters)` later. Only certain parameters can be defined this way.

In [None]:
# Cell 8
train_instance_param = ParameterString(
    name="TrainingInstance",
    default_value="ml.m4.xlarge",
)

model_approval_status = ParameterString(
    name="ModelApprovalStatus", default_value="PendingManualApproval"
)

#### Update attributes within the .flow file

##### A quick introduction to `.flow` files

We use Amazon SageMaker Data Wrangler to create transforms on datasets. The transforms are exported as a `.flow` file which can be shared between teams. These .flow files are specialized json files that be read in using python's `json` library. Using these files one can apply transforms to a dataset by running a Data Wrangler job. If you wish to learn more about this process, please navigate to the bonus material: Data exploration using Amazon SageMaker Data Wrangler in workshop studio.

We have already provided you with flow templates for the claims and customer dataset. These are `claims_flow_template` and `customers_flow_template` under the `outputs` folder respectively. These files contain references to the S3 bucket used when these files were created.

<font color="orange">You will now need to change the pre-defined S3 bucket references to your own S3 bucket to use these files</font>

Once the cell below is executed, you can open the `claims.flow` and `customers.flow` files and export the data to S3 or you can continue the guide using the provided `data/claims_preprocessed.csv` and `data/customers_preprocessed.csv` files. The latter is the recommended approach.

In [None]:
# Cell 9
claims_flow_template_file = "outputs/claims_flow_template"

# Open claims_flow_template and substitute the new S3 bucket and prefix
with open(claims_flow_template_file, "r") as f:
    variables = {"bucket": bucket, "prefix": prefix}
    template = string.Template(f.read())
    claims_flow = template.substitute(variables)
    claims_flow = json.loads(claims_flow)

# then write it to outputs/claims.flow 
with open("outputs/claims.flow", "w") as f:
    json.dump(claims_flow, f)


customers_flow_template_file = "outputs/customers_flow_template"

# Next do the same thing with customers_flow_template and substitute the new S3 bucket and prefix
with open(customers_flow_template_file, "r") as f:
    variables = {"bucket": bucket, "prefix": prefix}
    template = string.Template(f.read())
    customers_flow = template.substitute(variables)
    customers_flow = json.loads(customers_flow)

# Write it to the outputs/customers.flow file
with open("outputs/customers.flow", "w") as f:
    json.dump(customers_flow, f)

We'll use `claims.flow` and `customers.flow` files. Both these files should now be available in the folder `outputs`.

#### Define the first Data Wrangler step's inputs

Upload the claims.flow file to S3

In [None]:
# Cell 10
s3_client.upload_file(
    Filename="outputs/claims.flow", Bucket=bucket, Key=f"{prefix}/dataprep-notebooks/claims.flow"
)
print(f"Claims flow file uploaded to S3")

In [None]:
# Cell 11
# the path to the S3 claims flow file 
claims_flow_uri = f"s3://{bucket}/{prefix}/dataprep-notebooks/claims.flow"

# read in the claims.flow file
with open("outputs/claims.flow", "r") as f:
    claims_flow = json.load(f)

flow_step_inputs = []

# flow file contains the code for each transformation
flow_file_input = sagemaker.processing.ProcessingInput(
    source=claims_flow_uri, destination=f"{processing_dir}/flow", input_name="flow"
)

flow_step_inputs.append(flow_file_input)

# parse the flow file for S3 inputs to Data Wranger job
for node in claims_flow["nodes"]:
    if "dataset_definition" in node["parameters"]:
        data_def = node["parameters"]["dataset_definition"]
        name = data_def["name"]
        s3_input = sagemaker.processing.ProcessingInput(
            source=data_def["s3ExecutionContext"]["s3Uri"],
            destination=f"{processing_dir}/{name}",
            input_name=name,
        )
        flow_step_inputs.append(s3_input)

#### Define outputs for first Data Wranger step

The flow file can be loaded as a python dictionary. The output name is a string with the variables from the claims.flow file

In [None]:
# Cell 12
claims_output_name = (
    f"{claims_flow['nodes'][-1]['node_id']}.{claims_flow['nodes'][-1]['outputs'][0]['name']}"
)

print(f"Claims output name is {claims_output_name}")

flow_step_outputs = []

# create the ProcessingOutput
flow_output = sagemaker.processing.ProcessingOutput(
    output_name=claims_output_name,
    feature_store_output=sagemaker.processing.FeatureStoreOutput(feature_group_name=claims_fg_name),
    app_managed=True,
)

flow_step_outputs.append(flow_output)

#### Define processor and processing step

Use a processing workflow step to create a job for data processing. For more information on processing jobs, see [Process Data and Evaluate Model](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html).

A processing step requires a processor, a Python script that defines the processing code, outputs for processing, and job arguments.

In [None]:
# Cell 13
# Pulls the latest data-wrangler container tag, i.e. "1.x"
image_uri = image_uris.retrieve(framework="data-wrangler", region=region)

print("image_uri: {}".format(image_uri))

flow_processor = sagemaker.processing.Processor(
    role=sagemaker_role,
    image_uri=image_uri,
    instance_count=flow_instance_count,
    instance_type=flow_instance_type,
    max_runtime_in_seconds=86400,
)

output_content_type = "CSV"

# Output configuration used as processing job container arguments
claims_output_config = {claims_output_name: {"content_type": output_content_type}}

claims_flow_step = ProcessingStep(
    name="ClaimsDataWranglerProcessingStep",
    processor=flow_processor,
    inputs=flow_step_inputs,
    outputs=flow_step_outputs,
    job_arguments=[f"--output-config '{json.dumps(claims_output_config)}'"],
)

### Step 2: Customers Data Wrangler preprocessing step

Similarly upload the customers.flow file to S3

In [None]:
# Cell 14
s3_client.upload_file(
    Filename="outputs/customers.flow", Bucket=bucket, Key=f"{prefix}/dataprep-notebooks/customers.flow"
)

print(f"Customers flow file uploaded to S3")

In [None]:
# Cell 15
# the s3 data uri
customers_flow_uri = f"s3://{bucket}/{prefix}/dataprep-notebooks/customers.flow"

# read in the flow file
with open("outputs/customers.flow", "r") as f:
    customers_flow = json.load(f)

flow_step_inputs = []

# create a ProcessingInput step
# flow file contains the code for each transformation
flow_file_input = sagemaker.processing.ProcessingInput(
    source=customers_flow_uri, destination=f"{processing_dir}/flow", input_name="flow"
)

flow_step_inputs.append(flow_file_input)

# parse the flow file for S3 inputs to Data Wranger job
for node in customers_flow["nodes"]:
    if "dataset_definition" in node["parameters"]:
        data_def = node["parameters"]["dataset_definition"]
        name = data_def["name"]
        s3_input = sagemaker.processing.ProcessingInput(
            source=data_def["s3ExecutionContext"]["s3Uri"],
            destination=f"{processing_dir}/{name}",
            input_name=name,
        )
        flow_step_inputs.append(s3_input)

In [None]:
# Cell 16
# output name is created from variables found within the flow file which can be loaded as a dict.
customers_output_name = (
    f"{customers_flow['nodes'][-1]['node_id']}.{customers_flow['nodes'][-1]['outputs'][0]['name']}"
)

print(f"Customers output name is {claims_output_name}")

flow_step_outputs = []

flow_output = sagemaker.processing.ProcessingOutput(
    output_name=customers_output_name,
    feature_store_output=sagemaker.processing.FeatureStoreOutput(
        feature_group_name=customers_fg_name
    ),
    app_managed=True,
)

flow_step_outputs.append(flow_output)

output_content_type = "CSV"

# Output configuration used as processing job container arguments
customers_output_config = {customers_output_name: {"content_type": output_content_type}}

# define a processing step
customers_flow_step = ProcessingStep(
    name="CustomersDataWranglerProcessingStep",
    processor=flow_processor,
    inputs=flow_step_inputs,
    outputs=flow_step_outputs,
    job_arguments=[f"--output-config '{json.dumps(customers_output_config)}'"],
)

### Step 3: Create dataset and train/test split

Next, we will use SageMaker Pipelines `ProcessingStep` step to split the dataset into training and testing, using `SKLearnProcessor` processor. You can split the dataset without using processing step, but in case of large datasets, it is recommended to use the managed processing jobs.

In [None]:
# Cell 17
create_dataset_processor = SKLearnProcessor(
    framework_version="0.23-1",
    role=sagemaker_role,
    instance_type="ml.m5.xlarge",
    instance_count=1,
    base_job_name="fraud-detection-demo-create-dataset",
    sagemaker_session=sagemaker_session,
)

create_dataset_step = ProcessingStep(
    name="CreateDataset",
    processor=create_dataset_processor,
    outputs=[
        sagemaker.processing.ProcessingOutput(
            output_name="train_data", source="/opt/ml/processing/output/train"
        ),
        sagemaker.processing.ProcessingOutput(
            output_name="test_data", source="/opt/ml/processing/output/test"
        ),
    ],
    job_arguments=[
        "--claims-feature-group-name",
        claims_fg_name,
        "--customers-feature-group-name",
        customers_fg_name,
        "--bucket-name",
        bucket,
        "--bucket-prefix",
        prefix,
        "--region",
        region,
    ],
    code=create_dataset_script_uri,
    depends_on=[claims_flow_step.name, customers_flow_step.name],
)

### Step 4: Train XGBoost model

In this step, we will use Amazon SageMaker's XGBoost Algorithm to train on this dataset. Configure an Estimator for the XGBoost algorithm and the input dataset. A typical training script loads data from the input channels, configures training with hyperparameters, trains a model, and saves a model to model_dir so that it can be hosted later. You also specify the output path where trained model is saved. We will use the ParameterString `train_instance_param` defined at the beginning of the pipeline. Note the instance_type parameter may be used in multiple places in the pipeline. In this case, the instance_type is passed into the estimator.

Finally, use the estimator instance to construct a `TrainingStep` as well as the `properties` of the prior `ProcessingStep` used as input in the `TrainingStep` inputs and the code that's executed when the pipeline invokes the pipeline execution. This is similar to an estimator's fit method in the Python SDK.

Pass in the `S3Uri` of the `"train_data"` output channel to the `TrainingStep`. Also, use the other "test_data" output channel for model evaluation in the pipeline. The properties attribute of a Pipeline step matches the object model of the corresponding response of a describe call. These properties can be referenced as placeholder values and are resolved at runtime. For example, the `ProcessingStep` `properties` attribute matches the object model of the [DescribeTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrainingJob.html)  response object.


In [None]:
# Cell 18
hyperparameters = {
    "max_depth": "3",
    "eta": "0.2",
    "objective": "binary:logistic",
    "num_round": "100",
}

xgb_estimator = XGBoost(
    entry_point="scripts/xgboost_starter_script.py",
    output_path=training_job_output_path,
    code_location=training_job_output_path,
    hyperparameters=hyperparameters,
    role=sagemaker_role,
    instance_count=train_instance_count,
    instance_type=train_instance_param.default_value,
    framework_version="1.0-1",
)

train_step = TrainingStep(
    name="XgboostTrain",
    estimator=xgb_estimator,
    inputs={
        "train": sagemaker.inputs.TrainingInput(
            s3_data=create_dataset_step.properties.ProcessingOutputConfig.Outputs[
                "train_data"
            ].S3Output.S3Uri
        )
    },
)

### Step 5: Model pre-deployment step


Specifically, pass in the `S3ModelArtifacts` from the `TrainingStep`, `step_train` properties. The `TrainingStep` `properties` attribute matches the object model of the [DescribeTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrainingJob.html) response object.

In [None]:
# Cell 19
model = sagemaker.model.Model(
    name="fraud-detection-demo-pipeline-xgboost",
    image_uri=train_step.properties.AlgorithmSpecification.TrainingImage,
    model_data=train_step.properties.ModelArtifacts.S3ModelArtifacts,
    sagemaker_session=sagemaker_session,
    role=sagemaker_role,
)

inputs = sagemaker.inputs.CreateModelInput(instance_type="ml.m4.xlarge")

create_model_step = CreateModelStep(name="ModelPreDeployment", model=model, inputs=inputs)

### Step 6: Register the model
In this step you will use the ParameterString `model_approval_status` defined at the outset of the pipeline code.

Use the estimator instance specified in the training step to construct an instance of `RegisterModel`. The result of executing `RegisterModel` in a pipeline is a model package. A model package is an abstraction of reusable model artifacts that packages all ingredients required for inference. Primarily, it consists of an inference specification that defines the inference image to use along with an optional model weights location.

A model package group is a collection of model packages. A model package group can be created for a specific ML business problem, and new versions of the model packages can be added to it. Typically, customers are expected to create a ModelPackageGroup for a SageMaker pipeline so that model package versions can be added to the group for every SageMaker Pipeline run.

The construction of `RegisterModel` is similar to an estimator instance's `register` method in the Python SDK.

Specifically, pass in the `S3ModelArtifacts` from the `TrainingStep`, step_train properties. The TrainingStep properties attribute matches the object model of the DescribeTrainingJob response object.

In [None]:
# Cell 20
mpg_name = prefix


register_step = RegisterModel(
    name="XgboostRegisterModel",
    estimator=xgb_estimator,
    model_data=train_step.properties.ModelArtifacts.S3ModelArtifacts,
    content_types=["text/csv"],
    response_types=["text/csv"],
    inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
    transform_instances=["ml.m5.xlarge"],
    model_package_group_name=mpg_name,
    approval_status=model_approval_status,
)

### Step 7: Model deployment

Once the model is registered, the next step is deploying the model. In a typical MLOps scenario, model building pipeline is separate from model deployment pipeline. However, using Lambda function step, You can choose to deploy the model as part of the SageMaker Pipelines step.

The SageMaker SDK provides a Lambda helper class that can be used to create a Lambda function. This function is provided to the Lambda step for invocation via the pipeline. Alternatively, a predefined Lambda function can be provided to the Lambda step.

In this step, you will work with a pre-created Lambda function that was already provisioned by the cloudformation script.
The SageMaker Execution Role requires the policy `AmazonSageMakerPipelinesIntegrations` to create the Lambda function, and the Lambda function needs a role with policies allowing creation of a SageMaker endpoint.



#### Copy Lambda ARN from Cloud Formation Outputs

<font color="orange">Note: Please open [CloudFormation console](https://console.aws.amazon.com/cloudformation/home) and copy Lambda ARN from the generated function (under the Output tab). </font>
    
<font color="orange">This is a mandatory step to proceed further.</font>

In [None]:
# Cell 21
function_arn="ADD YOUR ARN HERE"

We will define the Lambda step in the next block. After defining object names, we use the pre-created Lambda function by passing its ARN.

In [None]:
# Cell 22
# Use the current time to define unique names for the resources created
current_time = time.strftime("%m-%d-%H-%M-%S", time.localtime())

model_name = create_model_step.properties.ModelName
endpoint_config_name = "fraud-endpoint-config" + current_time
endpoint_name = "fraud-endpoint-" + current_time
function_name = "sagemaker-fraud-lambda-step" + current_time

# The dictionary retured by the Lambda function is captured by LambdaOutput, each key in the dictionary corresponds to a
# LambdaOutput
output_param_1 = LambdaOutput(output_name="statusCode", output_type=LambdaOutputTypeEnum.String)
output_param_2 = LambdaOutput(output_name="body", output_type=LambdaOutputTypeEnum.String)
output_param_3 = LambdaOutput(output_name="other_key", output_type=LambdaOutputTypeEnum.String)

# The inputs provided to the Lambda function can be retrieved via the `event` object within the `lambda_handler` function
deploy_step = LambdaStep(
    name="ModelDeploy",
    lambda_func=Lambda(
        function_arn=function_arn
    ),
    inputs={
        "model_name": model_name,
        "endpoint_config_name": endpoint_config_name,
        "endpoint_name": endpoint_name,
    },
    outputs=[output_param_1, output_param_2, output_param_3] )

### Step 8: Combine and run the Pipeline steps

Though easier to reason with, the parameters and steps don't need to be in order. The pipeline DAG will parse it out properly.

In this section, we combine the steps into a pipeline so it can be executed.

A pipeline requires a `name`, `parameters`, and `steps`. Names must be unique within an `(account, region)` pair.

**Note**

* All the parameters used in the definitions must be present.
* Steps passed into the pipeline do not have to be listed in the order of execution. The SageMaker Pipeline service resolves the data dependency DAG as steps for the execution to complete.
* Steps must be unique to across the pipeline step list and all condition step if/else lists.</font>

In [None]:
# Cell 23
pipeline_name = f"FraudDetectDemo"

pipeline = Pipeline(
    name=pipeline_name,
    parameters=[train_instance_param, model_approval_status],
    steps=[
        claims_flow_step,
        customers_flow_step,
        create_dataset_step,
        train_step,
        create_model_step,
        register_step,
        deploy_step,
    ],
)

## Submit the pipeline definition to the SageMaker Pipelines service
Note: If an existing pipeline has the same name it will be overwritten.

In [None]:
# Cell 24
pipeline.upsert(role_arn=sagemaker_role)

## View the entire pipeline definition
Viewing the pipeline definition with all the string variables interpolated may help debug pipeline bugs. It is an optional step for this workshop and commented out here. If you want to see how the pipeline is constructed, you can uncomment the code.

In [None]:
# Cell 25
# json.loads(pipeline.describe()["PipelineDefinition"])

## Run the pipeline
Note this will take about 15 minutes to complete. You can watch the progress of the pipeline job in your SageMaker Studio Components panel. Please feel free to explore additional resources section in the workshop studio while this step is executing.

To check the visual status of the pipeline:

1. Click the Home folder pointed by the arrow and click on `Pipelines`.
2. You will see the available pipelines in the table on the right.
3. Click on `FraudDetectDemo`.


![pipeline-list](images/notebooks/pipeline-navigate.png)

Next, you will see the executions listed on the next page. Double-click on the Status `executing` to be taken to the graph representation.


![pipeline-execution](images/notebooks/pipeline-executions.png)

In [None]:
# Cell 26
# Special pipeline parameters can be defined or changed here
parameters = {"TrainingInstance": deploy_model_instance_type}

In [None]:
# Cell 27
start_response = pipeline.start(parameters=parameters)

In [None]:
# Cell 28
start_response.wait(delay=60, max_attempts=500)
start_response.describe()

<pre>
</pre>

After completion it will look something like this

![Pipeline-success](images/notebooks/pipeline-success.png)

## Clean Up
----
After running the lab, you should remove the resources which were created. You can also delete all the objects in the project's S3 directory by passing the keyword argument `delete_s3_objects=True`.

In [None]:
# Cell 29
delete_project_resources(
    sagemaker_boto_client=sagemaker_boto_client,
    pipeline_name=pipeline_name,
    mpg_name=mpg_name,
    prefix=prefix,
    delete_s3_objects=False,
    bucket_name=bucket,
)

**Congratulations!** 

You have successfully completed lab 5.