# Part 4: Create an End to End Pipeline

<div class="alert alert-warning"> <h4><strong>🛑 PRE-REQUISITE</strong></h4>
In order to be able to execute this notebook, you must first execute the first three notebooks included in this project
    <ul>
        <li><a href="./1-data-analysis-prep.ipynb">1-data-analysis-prep.ipynb</a></li>
        <li><a href="./2-afd-model-setup.ipynb">2-afd-model-setup.ipynb</a></li>
        <li><a href="./3-afd-model-train-deploy.ipynb">3-afd-model-train-deploy.ipynb</a></li>
        <li><a href="./4-0-custom-container.ipynb">4-0-custom-container.ipynb</a></li>
    </ul>
    Also ensure that you have the latest version of SageMaker Python SDK before proceeding, by running the code cell below. Once Sagemaker SDK is updated, please restart the kernel using menu "Kernel">"Restart Kernel".
</div>

In [None]:
!pip install --upgrade sagemaker

## Overview <a id='overview'></a>

* [Notebook 1: Data Preparation, Process, and Store Features](./1-data-analysis-prep.ipynb)
* [Notebook 2: Amazon Fraud Detector Model Setup](./2-afd-model-setup.ipynb)
* [Notebook 3: Model training, deployment, real-time and batch inference](./3-afd-model-train-deploy.ipynb)
* **[Notebook 4: Create an end-to-end pipeline](./4-afd-pipeline.ipynb)**
    * [Introduction](#intro)
    * [Setup notebook](#setup)
    * [Setup Pipeline Parameters & Steps](#pipeline)
        * **Step 1:** [Signup attempts Data Wrangler Preprocessing Step](#step1)
        * **Step 2:** [Outcomes Data Wrangler Preprocessing Step](#step2)
        * **Step 3:** [Create Training Data Set Step](#step3)
        * **Step 4:** [Train Amazon Fraud Detector Model Step](#step4)
        * **Step 5:** [Check AUC Metric (Area Under the ROC Curve) Condition](#step5)
        * **Step 6:** [Activate Amazon Fraud Detector Model Step](#step6)        
        * **Step 7:** [Setup Amazon Fraud Detector Model detector Step](#step7)
    * [Combine the Pipeline Steps and Run](#define-pipeline)
    * [Delete Pipeline and Cleanup (Optional)](#delete)

### 1. Introduction <a id="intro"></a>
___
<a href="#overview">overview</a>

In this notebook, we will build a [SageMaker Pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-sdk.html) that automates the entire end to end process. Recall that we initially did all the steps in a manual way, and experimented as a data scientist: testing each segment, hands on, and determine for example, which transformations should be applied to the features, which features should be added to the training data file etc.  Now we will automate these steps, and perhaps pass on the responsibility to an ML Engineer or MLOps role.

<img src="images/nb4.png" width="800" height="800"/>


### 2. Setup <a id="setup"></a>
----
<a href="#overview">overview</a>

As part of setup, we will import necessary libraries.

In [1]:
import json
import boto3
import pathlib
import sagemaker
import numpy as np
import pandas as pd
import awswrangler as wr

from IPython.core.display import display, HTML
from IPython.display import clear_output, JSON

from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import CreateModelStep
from sagemaker.sklearn.processing import SKLearnProcessor, ScriptProcessor
from sagemaker.workflow.step_collections import RegisterModel
from sagemaker.workflow.steps import ProcessingStep, TrainingStep
from sagemaker.workflow.parameters import ParameterInteger, ParameterFloat, ParameterString
from sagemaker.workflow.properties import PropertyFile

#### 2.1 Set region and boto3 config

In [2]:
#You can change this to a region of your choice
import sagemaker
region = sagemaker.Session().boto_region_name
print("Using AWS Region: {}".format(region))

boto3.setup_default_session(region_name=region)
boto_session = boto3.Session(region_name=region)

s3_client = boto3.client('s3', region_name=region)

sagemaker_boto_client = boto_session.client('sagemaker')
sagemaker_session = sagemaker.session.Session(
    boto_session=boto_session,
    sagemaker_client=sagemaker_boto_client)
sagemaker_role = sagemaker.get_execution_role()

account_id = boto3.client('sts').get_caller_identity()["Account"]

Using AWS Region: us-east-2


#### 2.2 Initialize variables

In [3]:
# Lets pull some of the variables from cache
%store -r MODEL_NAME
%store -r DETECTOR_NAME
%store -r S3_FILE_LOC
%store -r DATA_ACCESS_ROLE_ARN

%store -r signups_fg_name 
%store -r outcomes_fg_name
%store -r signup_attempts_table
%store -r signup_outcomes_table
%store -r afd_database_name
%store -r afd_bucket
%store -r afd_prefix

processing_dir = "/opt/ml/processing"
create_dataset_script_uri = f's3://{afd_bucket}/{afd_prefix}/afd-pipeline/code/create_dataset.py'
train_model_script_uri = f's3://{afd_bucket}/{afd_prefix}/afd-pipeline/code/train_afd.py'
activate_model_script_uri = f's3://{afd_bucket}/{afd_prefix}/afd-pipeline/code/activate_afd.py'
setup_detector_script_uri = f's3://{afd_bucket}/{afd_prefix}/afd-pipeline/code/setup_detector.py'

#======> variables used for parameterizing the notebook run
flow_instance_count = 1
flow_instance_type = "ml.m5.4xlarge"

train_instance_count = 1
train_instance_type = "ml.t3.medium"

### 3. Pipeline Parameters <a id="#pipeline"></a>
---

An important feature of SageMaker Pipelines is the ability to define the steps ahead of time, but be able to change the parameters to those steps at execution time without having to re-define the pipeline. This can be achieved by using ParameterInteger, ParameterFloat or ParameterString to define a value upfront which can be modified when you call `pipeline.start(parameters=parameters)` later. Only certain parameters can be defined this way. Check out the [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-parameters.html) to learn more about Pipeline Parameters.

In [4]:

model_name_param = ParameterString(
    name="AFDModelName",
    default_value=MODEL_NAME,
)

detector_name_param = ParameterString(
    name="AFDDetectorName",
    default_value=DETECTOR_NAME,
)

data_role_param = ParameterString(
    name="DataAccessRoleARN",
    default_value=DATA_ACCESS_ROLE_ARN
)

data_path_param = ParameterString(
    name="TrainDataS3Path",
    default_value=f's3://{afd_bucket}/{afd_prefix}/afd-pipeline/train-data'
)

auc_threshold_param = ParameterFloat(
    name="AFDAUCThreshold",
    default_value=0.75
)

### Step 1: Signup attempts Data Wrangler Preprocessing Step <a id='step1'></a>
---
<a href="#overview">overview</a>

Recall that in the first notebook we processed our raw `signup_attempts.csv` file using the `signup_attempts.flow` file. The flow file is a SageMaker data wrangler construct using which we defined all the necessary transformations. Once the flow file was ready we executed it (By dynamically generating a Jupyter notebook using the "Export to S3" option in the flow) to generate our `signup_attempts_preprocessed.csv` file. In this step, we are going to use the same flow file, but instead of manually running it, we will setup so that SageMaker pipeline does the job for us.

#### Upload flow to S3
This will become an input to the first step and, as such, the flow file needs to be in S3. You may use the same S3 location for the flow file from the first Notebook, however, it is recommended that you store Pipeline specific artifacts under a separate bucket or prefix.

In [5]:
s3_client.upload_file(Filename='signup_attempts.flow', Bucket=afd_bucket, Key=f'{afd_prefix}/afd-pipeline/dataprep-notebooks/signup_attempts.flow')
signups_flow_uri = f's3://{afd_bucket}/{afd_prefix}/afd-pipeline/dataprep-notebooks/signup_attempts.flow'
print(f"signup_attempts flow file uploaded to S3")

signup_attempts flow file uploaded to S3


#### Define the first Data Wrangler step's inputs

In [6]:
with open('signup_attempts.flow', 'r') as f:
    signups_flow = json.load(f)

flow_step_inputs_1 = []

# flow file contains the code for each transformation
flow_file_input = sagemaker.processing.ProcessingInput(
    source=signups_flow_uri,            
    destination=f"{processing_dir}/flow", 
    input_name='flow')

flow_step_inputs_1.append(flow_file_input)

# parse the flow file for S3 inputs to Data Wranger job
for node in signups_flow["nodes"]:
    if "dataset_definition" in node["parameters"]:
        data_def = node["parameters"]["dataset_definition"]
        name = data_def["name"]
        s3_input = sagemaker.processing.ProcessingInput(
            source=data_def["s3ExecutionContext"]["s3Uri"], 
            destination=f'{processing_dir}/{name}', 
            input_name=name)
        flow_step_inputs_1.append(s3_input)

#### Define outputs for first Data Wranger step

In [7]:
signups_output_name = f"{signups_flow['nodes'][-1]['node_id']}.{signups_flow['nodes'][-1]['outputs'][0]['name']}"

flow_step_outputs_1 = []

flow_output = sagemaker.processing.ProcessingOutput(
    output_name=signups_output_name,
    feature_store_output=sagemaker.processing.FeatureStoreOutput(
        feature_group_name=signups_fg_name), 
    app_managed=True)

flow_step_outputs_1.append(flow_output)

### Step 2: Outcomes Data Wrangler Preprocessing Step <a id='step2'></a>
---
<a href="#overview">overview</a>

We will repeat the same process for processing the `signup_outcomes.csv` file using the `signup_outcomes.flow` file.

In [8]:
s3_client.upload_file(Filename='signup_outcomes.flow', Bucket=afd_bucket, Key=f'{afd_prefix}/afd-pipeline/dataprep-notebooks/signup_outcomes.flow')
outcomes_flow_uri = f's3://{afd_bucket}/{afd_prefix}/afd-pipeline/dataprep-notebooks/signup_outcomes.flow'
print(f"Customers flow file uploaded to S3")

Customers flow file uploaded to S3


In [9]:
with open('signup_outcomes.flow', 'r') as f:
    outcomes_flow = json.load(f)
    
flow_step_inputs_2 = []

# flow file contains the code for each transformation
flow_file_input = sagemaker.processing.ProcessingInput(
    source=outcomes_flow_uri,            
    destination=f"{processing_dir}/flow", 
    input_name='flow')

flow_step_inputs_2.append(flow_file_input)

# parse the flow file for S3 inputs to Data Wranger job
for node in outcomes_flow["nodes"]:
    if "dataset_definition" in node["parameters"]:
        data_def = node["parameters"]["dataset_definition"]
        name = data_def["name"]
        s3_input = sagemaker.processing.ProcessingInput(
            source=data_def["s3ExecutionContext"]["s3Uri"], 
            destination=f'{processing_dir}/{name}', 
            input_name=name)
        flow_step_inputs_2.append(s3_input)

In [10]:
outcomes_output_name = f"{outcomes_flow['nodes'][-1]['node_id']}.{outcomes_flow['nodes'][-1]['outputs'][0]['name']}"

flow_step_outputs_2 = []

flow_output = sagemaker.processing.ProcessingOutput(
    output_name=outcomes_output_name,
    feature_store_output=sagemaker.processing.FeatureStoreOutput(
        feature_group_name=outcomes_fg_name), 
    app_managed=True)

flow_step_outputs_2.append(flow_output)


#### Finally, Define processor and processing steps for Signups and Outcomes flows

In [11]:
# You can find the proper image uri by exporting your Data Wrangler flow to a pipeline notebook
# =================================
image_uri = "415577184552.dkr.ecr.us-east-2.amazonaws.com/sagemaker-data-wrangler-container:1.0.2"

flow_processor = sagemaker.processing.Processor(
    role=sagemaker_role, 
    image_uri=image_uri, 
    instance_count=flow_instance_count, 
    instance_type=flow_instance_type, 
    max_runtime_in_seconds=86400)

# Signups data flow step
signups_flow_step = ProcessingStep(
    name='Step1SignupsDataWranglerProcessing', 
    processor=flow_processor, 
    inputs=flow_step_inputs_1, 
    outputs=flow_step_outputs_1)

# Outcomes data flow step
outcomes_flow_step = ProcessingStep(
    name='Step2OutcomesDataWranglerProcessing', 
    processor=flow_processor, 
    inputs=flow_step_inputs_2, 
    outputs=flow_step_outputs_2,
    depends_on=['Step1SignupsDataWranglerProcessing'])

### Step 3: Create Training Data Set Step <a id='step3'></a>
---
<a href="#overview">overview</a>

In this step you will query the Feature Store offline datastore to generate the training dataset. Recall that in the first notebook you created two feature groups for the signups and outcomes raw data set using the pre-processed files. We will, lookup the details of those feature groups, such as the Athena database name, table names, column names etc. and will construct an Athena query to generate our final training dataset. This is done using the rovided script [`create_dataset.py`](./scripts/create_dataset.py) under the `scripts` directory in this project. We will upload this script to an S3 location and refer to that location in our `ProcessingStep`.

We will define all the subsequent steps using Sagemaker processing `ScriptProcessor` since we will be running external scripts within each of those steps. For more information on `ScriptProcessor` see [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-container-run-scripts.html).

In [12]:

%store -r CONTAINER_IMAGE_URI

create_dataset_processor = ScriptProcessor(command=['python3'],
                                           image_uri=CONTAINER_IMAGE_URI,
                                           role=sagemaker_role,
                                           instance_count=flow_instance_count,
                                           instance_type=flow_instance_type)


In [13]:
s3_client.upload_file(Filename='./scripts/create_dataset.py', Bucket=afd_bucket, Key=f'{afd_prefix}/afd-pipeline/code/create_dataset.py')

create_dataset_step = ProcessingStep(
    name='Step3CreateAFDTrainingDataset',
    processor=create_dataset_processor,
    outputs=[sagemaker.processing.ProcessingOutput(output_name='train_data', 
                                                   source='/opt/ml/processing/output/train', 
                                                   destination=data_path_param),
             sagemaker.processing.ProcessingOutput(output_name='train_schema', 
                                                   source='/opt/ml/processing/output/schema', 
                                                   destination=f's3://{afd_bucket}/{afd_prefix}/afd-pipeline/train-schema')],
    job_arguments=["--signups-feature-group-name", signups_fg_name,
                   "--outcomes-feature-group-name", outcomes_fg_name,
                   "--region", region,
                   "--bucket-name", afd_bucket,
                   "--bucket-prefix", afd_prefix],
    code=create_dataset_script_uri,
    depends_on=['Step2OutcomesDataWranglerProcessing'])

### Step 4: Train Amazon Fraud Detector Model Step <a id='step4'></a>
---
<a href="#overview">overview</a>

We will create a similar script processor as we did in the previous step. The script in this step will make use of the dataset and the data schema files generated by the previous stem to train our Amazon Fraud Detector model. The model train call is asynchronous so we will wait for the process to complete and finally store the response information of the model training into a file. The response will contain all the information that will be required for avtivating the model in the next step. The Amazon Fraud detector training script is included under the scripts folder [`train_afd.py`](./scripts/train_afd.py). This script, does the following -

* Trains the AFD Model with the data generated in the previous step
* Generates a training response data to be used by subsequent steps
* Generates a property file with the AUC Metric value of the model (more on this later in Step 6)

In [14]:

afd_train_processor = ScriptProcessor(command=['python3'],
                                      image_uri=CONTAINER_IMAGE_URI,
                                      role=sagemaker_role,
                                      instance_count=train_instance_count,
                                      instance_type=train_instance_type)


In [15]:
s3_client.upload_file(Filename='./scripts/train_afd.py', Bucket=afd_bucket, Key=f'{afd_prefix}/afd-pipeline/code/train_afd.py')

#define the property file which will store the AUC metric from the outcome of the model training

training_response = PropertyFile(
    name="AUCPropertyFile",
    output_name="training_auc",
    path="train_auc.json"           # the property file generated by the train_afd.py that the Pipeline will index and keep track of to evaluate later
)

afd_train_processingstep = ProcessingStep(name="Step4AFDModelTrainProcess",
                                          processor=afd_train_processor,
                                          job_arguments=["--region", region,
                                                         "--s3-file-loc", f'{data_path_param}/afd_training_data.csv',
                                                         "--data-access-role", data_role_param,
                                                         "--model-name", model_name_param],
                                          inputs=[sagemaker.processing.ProcessingInput(source=create_dataset_step.properties.ProcessingOutputConfig.Outputs["train_schema"].S3Output.S3Uri,
                                                                                       destination='/opt/ml/processing/schema')],
                                          outputs=[sagemaker.processing.ProcessingOutput(output_name='training_response', 
                                                                                         source='/opt/ml/processing/output',
                                                                                         destination=f's3://{afd_bucket}/{afd_prefix}/afd-pipeline/train-response'),
                                                   sagemaker.processing.ProcessingOutput(output_name='training_auc', 
                                                                                         source='/opt/ml/processing/auc',
                                                                                         destination=f's3://{afd_bucket}/{afd_prefix}/afd-pipeline/train-response')],
                                          property_files=[training_response],
                                          code=train_model_script_uri
                                         )



### Step 5: Check AUC Metric (Area Under the ROC Curve) Condition <a id='step5'></a>
---
<a href="#overview">overview</a>

AFD Training completion generates [training metrics](https://docs.aws.amazon.com/frauddetector/latest/api/API_TrainingMetrics.html) such as the Area Under the ROC Curve. Ideally, an AUC value ranges between `0` and `1`. The closer the AUC value is to `1` the better the model's accuracy. To learn more about ROC/AUC check this [article](https://www.sciencedirect.com/science/article/pii/S1556086415306043). Typically, this would involve a human review of the metric to make a decision on whether the AUC value is acceptable or not in order to activate the model, which is what we did in the 3rd notebook. In this case, we will use Pipeline Conditions to make a decision on whether to progress to the next step and activate the model in an automated manner. Step 4, the training step, generates the AUC metric property file which can be read in this step and evaluated. We are assuming an AUC equal to or above a threshold of `0.75` as acceptable. The high level logic is `if auc >= 0.75 then activate_model`. 

We defined the AUC threshold as a Pipeline parameter `auc_threshold_param` with a default value of `0.75` at the beginning of this notebook. We will use this parameter in our Pipeline Condition to make a decision whether to proceed to the next step i.e. `afd_activate_processingstep` (defined in the previous section - Step 4) which will activate the trained AFD Model.

We created our Activate Model step in Step 5 (`afd_activate_processingstep`) which is ready to execute if the condition evalues to `true`. We will set that up in the following code cell.

In order to create a _condition_ within the SageMaker Pipeline we will use a [ConditionStep](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-condition). You can define, conditions such as `equals to`, `greater than`, `less than`, `less than or equals to`, `greater than or equals to` and so on with a Condition step and then execute other processing step(s) when the conditon is true or false using an `if...else` pattern. In our case we want to use the `ConditionGreaterThanOrEqualTo` condition to check the AUC metric and only activate the model (by running the activate model step) if the metric is greater than or equal to the pre-defined threshold.

In [16]:
from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo
from sagemaker.workflow.condition_step import ( ConditionStep, JsonGet )

condition_gte = ConditionGreaterThanOrEqualTo(
    left=JsonGet(                           #the left value of the evaluation expression
        step=afd_train_processingstep,      #the step from which the property file will be grabbed
        property_file=training_response,    #the property file instance that was created earlier in Step 4
        json_path="auc_metric"              #the JSON path of the property within the property file train_auc.json (refer train_afd.py line 71)
    ),
    right=auc_threshold_param               #the right value of the evaluation expression, i.e. the AUC threshold
)



### Step 6: Activate Amazon Fraud Detector Model Step <a id='step6'></a>
---
<a href="#overview">overview</a>

This step will activate the trained AFD Model in case the AUC threshold is met in the condition step (which we will defined in the previous step). This step uses the script `activate_afd.py` included in the `scripts` directory.

In [17]:

afd_activate_processor = ScriptProcessor(command=['python3'],
                                         image_uri=CONTAINER_IMAGE_URI,
                                         role=sagemaker_role,
                                         instance_count=train_instance_count,
                                         instance_type=train_instance_type)


In [18]:
s3_client.upload_file(Filename='./scripts/activate_afd.py', Bucket=afd_bucket, Key=f'{afd_prefix}/afd-pipeline/code/activate_afd.py')

afd_activate_processingstep = ProcessingStep(name="Step6AFDModelActivateProcess",
                                          processor=afd_activate_processor,
                                          job_arguments=["--region", region],
                                          inputs=[sagemaker.processing.ProcessingInput(source=afd_train_processingstep.properties.ProcessingOutputConfig.Outputs["training_response"].S3Output.S3Uri,
                                                                                       destination='/opt/ml/processing/input')],
                                          outputs=[sagemaker.processing.ProcessingOutput(output_name='activation_response', 
                                                                                         source='/opt/ml/processing/output',
                                                                                         destination=f's3://{afd_bucket}/{afd_prefix}/afd-pipeline/train-response')],
                                          code=activate_model_script_uri
                                         )



Now that we have the model activation step defined, we can setup the condition which we created in Step 5 to check the AUC metric. If the metric is `>= 0.75` then pipeline will kick-off the model activation step, else the pipeline ends there.

In [19]:
#Define the condition step
auc_condition_step = ConditionStep(
    name="Step5CheckAUCThreshold",
    conditions=[condition_gte],             #the greater than equal to condition defined above
    if_steps=[afd_activate_processingstep], #if the condition evaluates to true then Step 5 processing step will be executed
    else_steps=[]                           #there are no else steps so we will keep it empty
)

### Step 7: Setup Amazon Fraud Detector Model detector Step <a id='step7'></a>
---
<a href="#overview">overview</a>

This is the last and final step and it will be executed after the AFD model has been activated based on the condition check. In this step, we will do a few things-
* Setup Outcomes (if they don't exist already)
* Setup rules based on the metrics generated during the model training stage and map them to the outcomes
* Setup a new Detector Version using the new list of rules created above

We will not activate this detector, however, activating the new detector version is certainly a step that can be added on to the pipeline. Note, activating the detector is akin to deploying that detector as the latest version in production (you can only have 1 detector in `ACTIVE` status at a time, activating this detector will de-activate your previous detector version and may inadvertently break your application's code). 

In certain cases, you may not want to activate a new detector version created by this automated pipeline and may want to analyze the results of the training further before activating the new detector version manually, or you may want to wait until your next production release cycle etc. The detector version created in this step will continue to remain in `DRAFT` status until a manual action is taken either via the Amazon Fraud Detector console or using the `update_detector_version_status` API.

Similar to Steps 4 & 5, we will be running a script to setup the new detector version, the script `setup_detector.py` is included in the project `scripts` directory. We will create a new version of the existing detector with the updated Rules. The new version of the detector will remain in `DRAFT` status.

In [20]:

setup_detector_processor = ScriptProcessor(command=['python3'],
                                         image_uri=CONTAINER_IMAGE_URI,
                                         role=sagemaker_role,
                                         instance_count=train_instance_count,
                                         instance_type=train_instance_type)


In [21]:
s3_client.upload_file(Filename='./scripts/setup_detector.py', Bucket=afd_bucket, Key=f'{afd_prefix}/afd-pipeline/code/setup_detector.py')

setup_detector_processingstep = ProcessingStep(name="Step7AFDSetupDetectorProcess",
                                          processor=setup_detector_processor,
                                          job_arguments=["--region", region,
                                                         "--detector-name", detector_name_param],
                                          inputs=[sagemaker.processing.ProcessingInput(source=afd_activate_processingstep.properties.ProcessingOutputConfig.Outputs["activation_response"].S3Output.S3Uri,
                                                                                       destination='/opt/ml/processing/input')],
                                          code=setup_detector_script_uri
                                         )

### 4. Combine the Pipeline Steps and Run <a id='define-pipeline'></a>
---
<a href="#overview">overview</a>

Now that we have defined all the processing and condition steps, we will setup the SageMaker Pipeline. Though easier to reason with, the parameters and steps don't need to be in order. The pipeline DAG will parse it out properly.

In [22]:
afd_pipeline_name = f'AFDPipeline'
%store afd_pipeline_name

pipeline = Pipeline(
    name=afd_pipeline_name,
    parameters=[
        model_name_param, 
        detector_name_param,
        data_role_param,
        data_path_param,
        auc_threshold_param],
    steps=[
        signups_flow_step,
        outcomes_flow_step,
        create_dataset_step,        
        afd_train_processingstep,
        auc_condition_step,
        setup_detector_processingstep
    ])

Stored 'afd_pipeline_name' (str)


#### 4.1 Submit the pipeline definition to the SageMaker Pipeline service
We now have the Pipeline and all of it's steps and condition defined. The next step would be to create or update the Pipeline. Note, If an existing pipeline has the same name it will be overwritten with the `upsert` function.

In [23]:
import botocore

try:
    pipeline.upsert(role_arn=sagemaker_role)
except botocore.exceptions.ClientError as error:
    print(error.response)

Next we will execute the pipeline i.e. start execution of the pipeline. This API action can also be done from, let's say, a Lambda function. You can also override the Pipeline parameters

In [None]:
# Special pipeline parameters can be defined or changed here
# parameters = {'TrainingInstance': 'ml.m5.xlarge'}

In [26]:
# start_response = pipeline.start(parameters=parameters)
start_response = pipeline.start()

Once the Pipleine execution starts, you can view the execution status of the pipeline as shown below

<img src="images/pipeline_exec.png" width="800" height="800"/>

Clicking on the "**Graph**" tab will show the Pipeline DAG (Directed Acyclic Graph). The Pipeline DAG while it's executing will look like below

<img src="images/pipeline.png" width="800" height="800"/>

In [None]:
start_response.wait()
start_response.describe()

### 5. Delete Pipeline [Optional] <a id='delete'></a>
---
<a href="#overview">overview</a>

You may, optionally, delete the pipeline as a cleanup step.

In [None]:
sagemaker_boto_client.delete_pipeline(PipelineName=afd_pipeline_name)