# Training and Registering Models on Amazon SageMaker Pipelines Integrated with Presto


---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. 

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/sagemaker-pipeline-parameterization|parameterized-pipeline.ipynb)

---

Customers can use SageMaker Pipelines to build scalable machine learning pipelines that preprocess data and train machine learning models. With SageMaker Pipelines, customers have a toolkit for every part of the machine learning lifecycle that provides deep customizations and tuning options to fit every organization. Customers have the freedom to customize SageMaker Pipelines to specific use cases, but also to create generic machine learning pipelines that can be reused across different use cases.

From a birds-eye view a machine learning pipeline usually consists of 3 general steps: a preprocess step where the data is transformed, a training step where a machine learning model is trained, and an evaluation step which tests the performance of the trained model. If the model is performing according to the objective metric you’re optimizing for, then that becomes a candidate model for deployment to one or more environments. These candidate models should be registered into SageMaker Model Registry to catalog and store key metadata for that model version.

--- 

These steps have a lot of commonalities, even across different machine learning use cases. Customers that want to create training pipelines that can be re-used in an organization can use SageMaker Pipelines to create parameterized, generic training pipelines. Parameters allow customers to identify specific parameters that can be passed into the pipeline during pipeline execution without having to directly change the pipeline code itself. 

**This notebook** demonstrates how SageMaker Pipelines can be used to create a generic binary classification machine learning pipeline using XGBoost that's reusable across teams, machine learning use cases and even customers in a SaaS system. 


### SageMaker Pipelines
Amazon SageMaker Pipelines is a purpose-built, easy-to-use CI/CD service for machine learning. With SageMaker Pipelines, customers can create machine learning workflows with an easy-to-use Python SDK, and then visualize and manage workflows using Amazon SageMaker Studio.


#### SageMaker Pipeline steps and parameters
SageMaker pipelines works on the concept of steps. The order steps are executed in is inferred from the dependencies each step has. If a step has a dependency on the output from a previous step, it's not executed until after that step has completed successfully.

SageMaker Pipeline Parameters are input parameters specified when triggering a pipeline execution. They need to be explicitly defined when creating the pipeline and contain default values.

To know more about the type of steps and parameters supported, check out the [SageMaker Pipelines Overview](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-sdk.html).

#### SageMaker Pipeline DAG

When creating a SageMaker Pipeline, SageMaker creates a Direct Acyclic Graph, DAG, that customers can visualize in Amazon SageMaker Studio. The DAG can be used to track pipeline executions, outputs and metrics. In this notebook, a SageMaker Pipeline with the following DAG is created:

![SageMaker Pipeline Direct Acyclic Graph](images/sm-pipeline-dag.png "SageMaker Pipeline Direct Acyclic Graph")

## Predict customer orders with Random Forest Classifier

### Data

This notebook uses PrestoDB to extract `tpc-h` data from the `tpc-h connector`, and includes the data extraction, preprocesing, as well as the splitting of data into train, test, and validation datases as a part of the preprocessing step of this sagemaker pipeline. 

***To configure PrestoDB within your EC2 instance view***: [PrestoDB EC2 Connection](https://normanlimxk.com/2020/09/15/creating-a-presto-cluster-on-ec2/)


### Overview 
**Disclaimer** This notebook was created using [Amazon SageMaker Studio](https://aws.amazon.com/sagemaker/studio/) and the `Python3(DataScience) kernel`. SageMaker Studio is required for the visualizations of the DAG and model metrics to work.

The purpose of this notebook is to demonstrate how SageMaker Pipelines can be used to create a generic Scikit-Learn training pipeline that preprocesses, trains, tunes, evaluates and registers new machine learning models with the SageMaker model registry, that is reusable across teams, customers and use cases. All scripts to preprocess the data and evaluate the trained model have been prepared in advance and are available here: 

---

This model is a binary classification model creating using the scikit-learn `RandomForestClassifier`. It categorizes input data into high value/low value order classes. 

Training data: the training data for this model is available via PrestoDB tables and is read into Pandas through the PrestoDB Python client. This data is then read into an Apache Spark dataframe (although the model training happens only using the data in the Pandas dataframe).

* Data is read using queries from PrestoDB and any feature engineering required is done as part of the query itself.

* Note that ingestion of raw data into PrestoDB tables is outside the scope of this project and it is assumed that for the purpose of model training the data can simply be queried from PrestoDB tables.


In [None]:
!pip install -U sagemaker --quiet # Ensure correct version of SageMaker is installed

In [None]:
## Install the necessary boto3 and sagemaker libraries to initialize session
import json
import boto3
import sagemaker
from utils import *
import sagemaker.session
from typing import Dict, List, Optional, Tuple, Union
from sagemaker.workflow.pipeline_context import PipelineSession

In [None]:
## set the logger to track all of the logs as this pipeline runs
logging.basicConfig(format='[%(asctime)s] p%(process)s {%(filename)s:%(lineno)d} %(levelname)s - %(message)s', level=logging.INFO)
logger = logging.getLogger(__name__)

### Load the Config.yml file that contains information that is used across this pipeline

In [None]:
config = load_config('config.yml')
logger.info(json.dumps(config, indent=2))

In [None]:
## initialize the sagemaker session, region, role bucket and pipeline session
session = sagemaker.session.Session()
pipeline_session = PipelineSession()
role = config['aws']['sagemaker_execution_role']
session_bucket = session.default_bucket()

logger.info(f"the pipeline bucket being used for this pipeline execution -> {session_bucket}")
logger.info(f"the sagemaker execution role being used across this pipeline -> {role}")

In [None]:
prefix = config['general']['prefix']  # Prefix to S3 artifacts
pipeline_name = config['general']['pipeline_name']  # SageMaker Pipeline name
order_model_group = config['general']['model_group']

logger.info(f"the prefix for the pipeline name -> {prefix}, pipeline name of this execution -> {pipeline_name}, model group name -> {order_model_group}")

In [None]:
import json
from sagemaker.workflow.parameters import ParameterString, ParameterInteger, ParameterFloat

# Convert your list to a JSON string
training_features_str = json.dumps(config['training_params']['training_features'])
logger.info(f"the training features being used for this pipeline --> {training_features_str}")

# Define new pipeline parameters
host_parameter = ParameterString(name="HostParameter", default_value=config['pipeline_parameters']['presto_host'])
port_parameter = ParameterString(name="PortParameter", default_value=config['pipeline_parameters']['port_parameter'])
user_parameter = ParameterString(name="UserParameter", default_value=config['pipeline_parameters']['user_parameter'])
target_parameter = ParameterString(name="Target", default_value=config['training_params']['training_target'])
feature_parameter = ParameterString(name="Feature", default_value=training_features_str)

# training hyperparameters to use hyperparameter parameters
n_estimators_parameter = ParameterInteger(name="NEstimators", default_value=config['training_args']['n_estimators'])
max_depth_parameter = ParameterInteger(name="MaxDepth", default_value=config['training_args']['max_depth'])
min_samples_split_parameter = ParameterInteger(name="MinSamplesSplit", default_value=config['training_args']['min_samples_split'])
max_features_parameter = ParameterString(name="MaxFeatures", default_value=config['training_args']['max_features'])
model_approval_status = ParameterString(
    name="ModelApprovalStatus", default_value="PendingManualApproval"
)

# Log the feature parameter as an array
logger.info(f"the feature parameter being used for training -> {feature_parameter.expr}")
logger.info(f"the host parameter being used from the presto config -> {host_parameter.expr}")
logger.info(f"the port parameter being used from the presto config -> {port_parameter.expr}")
logger.info(f"the user parameter being used from the presto config -> {user_parameter.expr}")

In [None]:
## initialize the order_preprocess_uri 
order_preprocess_uri = session.upload_data(
    path=config['dir_scripts']['preprocess_script'], key_prefix=prefix + "/preprocess/order"
)

## initialize the evaluation script
evaluate_script_uri = session.upload_data(path=config['dir_scripts']['evaluation_script'], key_prefix=prefix + "/evaluate")

logger.info(f"DSG order preprocessing script uploaded to .... {order_preprocess_uri}")
logger.info(f"DSG Evaluation script uploaded to .... {evaluate_script_uri}")

<a id='parameters'></a>

### Pipeline input parameters

Pipeline Parameters are input parameter when triggering a pipeline execution. They need to be explicitly defined when creating the pipeline and contain default values.

Create parameters for the inputs to the pipeline. In this case, parameters will be used for:
- `ModelGroup` - Which registry to register the trained model with.
- `InputData` - S3 URI to pipeline input data.
- `PreprocessScript` - S3 URI to python script to preprocess the data.
- `EvaluateScript` - S3 URI to python script to evaluate the trained model.
- `MaxiumTrainingJobs` - How many training jobs to allow when hyperparameter tuning the model
- `MaxiumParallelTrainingJobs` - How many training jobs to allow in parallel when hyperparameter tuning the model.
- `AccuracyConditionThreshold` - Only register models with the model registry if the have at least this classification accuracy.
- `ProcessingInstanceType` - What EC2 instance type to use for processing.
- `TrainingInstanceType` - What EC2 instance type to use for training.

In [None]:
from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
    ParameterFloat,
)
# To what Registry to register the model and its versions.
model_registry_package = ParameterString(name="ModelGroup", default_value="default-registry")

# S3 URI to preprocessing script
preprocess_script = ParameterString(
    name="PreprocessScript", default_value=order_preprocess_uri
)

# S3 URI to evaluation script
evaluate_script= ParameterString(
    name="EvaluateScript", default_value=evaluate_script_uri
)

# Maximum amount of training jobs to allow in the HP tuning
max_training_jobs = ParameterInteger(name="MaxiumTrainingJobs", default_value=config['input_params']['maxium_training_jobs'])

# Maximum amount of trainingjobs to allow in the HP tuning
max_parallel_training_jobs = ParameterInteger(name="MaxiumParallelTrainingJobs", default_value=config['input_params']['maxium_parallel_training_jobs'])

# Accuracy threshold to decide whether or not to register the model with Model Registry
accuracy_condition_threshold = ParameterFloat(name="AccuracyConditionThreshold", default_value=config['input_params']['accuracy_condition_threshold'])

# What instance type to use for processing.
processing_instance_type = ParameterString(
    name="ProcessingInstanceType", default_value=config['input_params']['processing_instance_type']
)

# What instance type to use for training.
training_instance_type = ParameterString(name="TrainingInstanceType", default_value=config['input_params']['training_instance_type'])

<a id='preprocess'></a>

## Preprocess data step
In the first step an sklearn processor is created, used in the ProcessingStep.

In [None]:
from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep
from sagemaker.workflow.functions import Join
from sagemaker.workflow.execution_variables import ExecutionVariables

# Create SKlearn processor object,
# The object contains information about what instance type to use, the IAM role to use etc.
# A managed processor comes with a preconfigured container, so only specifying version is required.
sklearn_processor = SKLearnProcessor(
    framework_version="0.23-1", role=role, instance_type=processing_instance_type, instance_count=1, 
    tags=config['pipeline_tags']['sklearn_processor_tags']
)
# Use the sklearn_processor in a SageMaker Pipelines ProcessingStep
# Configure the ProcessingStep
step_preprocess_data = ProcessingStep(
    name="Preprocess-Data",
    processor=sklearn_processor,
    inputs=[],  # No static inputs required as data fetching is part of the script
    outputs=[
        ProcessingOutput(
            output_name="train",
            source="/opt/ml/processing/train",
            destination=Join(
                on="/",
                values=[
                    "s3://{}".format(session_bucket),
                    prefix,
                    ExecutionVariables.PIPELINE_EXECUTION_ID,
                    "train",
                ],
            ),
        ),
        ProcessingOutput(
            output_name="validation",
            source="/opt/ml/processing/validation",
            destination=Join(
                on="/",
                values=[
                    "s3://{}".format(session_bucket),
                    prefix,
                    ExecutionVariables.PIPELINE_EXECUTION_ID,
                    "validation",
                ],
            ),
        ),
        ProcessingOutput(
            output_name="test",
            source="/opt/ml/processing/test",
            destination=Join(
                on="/",
                values=[
                    "s3://{}".format(session_bucket),
                    prefix,
                    ExecutionVariables.PIPELINE_EXECUTION_ID,
                    "test",
                ],
            ),
        ),
    ],
    code = config['dir_scripts']['preprocess_script'],
    job_arguments=[
        "--host", host_parameter,
        "--port", port_parameter,
        "--user", user_parameter,
    ]
)

<a id='train'></a>

## Train model step
In the second step, the train and validation output from the previous processing step are used to train a model. 

In [None]:
from sagemaker.inputs import TrainingInput
from sagemaker.estimator import Estimator
from sklearn.metrics import roc_auc_score
from sagemaker.tuner import HyperparameterTuner, ContinuousParameter, IntegerParameter, CategoricalParameter
from sagemaker.workflow.steps import TuningStep
from sagemaker.sklearn.estimator import SKLearn

FRAMEWORK_VERSION = "0.23-1"

# Fetch container to use for training
image_uri = sagemaker.image_uris.retrieve(
    framework="sklearn",
    region=config['aws']['region'],
    version=FRAMEWORK_VERSION,
    py_version="py3",
    instance_type=config['input_params']['processing_instance_type'],
)
print(image_uri)

sklearn_estimator = SKLearn(
    entry_point=config['dir_scripts']['training_script'],
    role=role,
    instance_count=1,
    instance_type=config['input_params']['processing_instance_type'],
    framework_version=FRAMEWORK_VERSION,
    base_job_name="rf-scikit",
    hyperparameters={
        "n_estimators": config['training_args']['n_estimators'],
        "max_depth": config['training_args']['max_depth'],  
        "features": config['training_params']['training_features'],
        "target": config['training_params']['training_target'],
    },
    tags=config['pipeline_tags']['sklearn_estimator_tags']
)

## objective metric name
metric_definitions = [{'Name': 'validation:auc', 'Regex': 'auc (\S+)'}]

objective_metric_name = "validation:auc"

# Create Hyperparameter tuner object. Ranges from https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost-tuning.html
rf_tuner = HyperparameterTuner(
                estimator=sklearn_estimator,
                objective_metric_name=objective_metric_name,
                hyperparameter_ranges={
                    "n_estimators": IntegerParameter(10, 150),
                    "max_depth": IntegerParameter(3, 20),
                    "min_samples_split": IntegerParameter(2, 10),
                    "max_features": CategoricalParameter(["sqrt", "log2"])
                },
                max_jobs=config['input_params']['maxium_training_jobs'], ## reducing this for testing purposes
                metric_definitions=metric_definitions,
                max_parallel_jobs=config['input_params']['maxium_parallel_training_jobs'], ## reducing this for testing purposes
)


step_tuning = TuningStep(
    name="Train-And-Tune-Model",
    tuner=rf_tuner,
    inputs={
        "train": TrainingInput(
            s3_data=step_preprocess_data.properties.ProcessingOutputConfig.Outputs[
                "train" ## refer to this
            ].S3Output.S3Uri,
            content_type="text/csv",
        ),
        "test": TrainingInput(
        s3_data=step_preprocess_data.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
        content_type="text/csv",
        ),
    },
)

## Evaluate model step
---

When a model is trained, it's common to evaluate the model on unseen data before registering it with the model registry. This ensures the model registry isn't cluttered with poorly performing model versions.

In [None]:
from sagemaker.processing import ScriptProcessor
from sagemaker.workflow.properties import PropertyFile

# Create ScriptProcessor object.
# The object contains information about what container to use, what instance type etc.
evaluate_model_processor = ScriptProcessor(
    image_uri=image_uri,
    command=["python3"],
    instance_type=processing_instance_type,
    instance_count=1,
    role=role,
)

# Create a PropertyFile
# A PropertyFile is used to be able to reference outputs from a processing step, for instance to use in a condition step.
# For more information, visit https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-propertyfile.html
evaluation_report = PropertyFile(
    name="EvaluationReport", output_name="evaluation", path="evaluation.json"
)


step_evaluate_model = ProcessingStep(
    name="Evaluate-Model",
    processor=evaluate_model_processor,
    inputs=[
        ProcessingInput(
            source=step_tuning.get_top_model_s3_uri(top_k=0, s3_bucket=session_bucket),
            destination="/opt/ml/processing/model",
            input_name="model.tar.gz" 
        ),
        ProcessingInput(
            source=step_preprocess_data.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
            destination="/opt/ml/processing/test",
            input_name="test.csv" 
        ),
    ],
    outputs=[
        ProcessingOutput(
            output_name="evaluation",
            source="/opt/ml/processing/evaluation",
            destination=Join(
                on="/",
                values=[
                    "s3://{}".format(session_bucket),
                    prefix,
                    ExecutionVariables.PIPELINE_EXECUTION_ID,
                    "evaluation",
                ]
            )
        )
    ],
    code = config['dir_scripts']['evaluation_script'],
    property_files=[evaluation_report],
    job_arguments=[
        "--target", target_parameter,
        "--features", feature_parameter,
    ]
)

<a id='register'></a>

## Register model step
If the trained model meets the model performance requirements, a new model version is registered with the model registry for further analysis. To attach model metrics to the model version, create a [ModelMetrics](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html) object using the evaluation report created in the evaluation step. Then, create the RegisterModel step.


In [None]:
from sagemaker.model_metrics import MetricsSource, ModelMetrics
from sagemaker.workflow.step_collections import RegisterModel

# Create ModelMetrics object using the evaluation report from the evaluation step
# A ModelMetrics object contains metrics captured from a model.
model_metrics = ModelMetrics(
    model_statistics=MetricsSource(
        s3_uri=Join(
            on="/",
            values=[
                step_evaluate_model.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"][
                    "S3Uri"
                ],
                "evaluation.json",
            ],
        ),
        content_type="application/json",
    )
)



# Crete a RegisterModel step, which registers the model with SageMaker Model Registry.
step_register_model = RegisterModel(
    name="Register-Model",
    estimator=sklearn_estimator,
    model_data=step_tuning.get_top_model_s3_uri(top_k=0, s3_bucket=session_bucket),
    content_types=["text/csv"],
    response_types=["text/csv"],
    inference_instances=["ml.t2.medium", "ml.m5.xlarge", "ml.m5.large"],
    transform_instances=["ml.m5.xlarge"],
    model_package_group_name=model_registry_package,
    approval_status=model_approval_status,
    model_metrics=model_metrics,
    tags=config['pipeline_tags']['register_model_tags']
    
)

<a id='condition'></a>

## Accuracy condition step
Adding conditions to the pipeline is done with a ConditionStep.
In this case, we only want to register the new model version with the model registry if the new model meets an accuracy condition.

In [None]:
from sagemaker.workflow.fail_step import FailStep
from sagemaker.workflow.functions import Join

step_fail = FailStep(
    name="AccuracyThresholdFailed",
    error_message=Join(on=" ", values=["Execution failed due to Accuracy <", accuracy_condition_threshold]),
)

In [None]:
from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.functions import JsonGet

# Create accuracy condition to ensure the model meets performance requirements.
# Models with a test accuracy lower than the condition will not be registered with the model registry.
cond_gte = ConditionGreaterThanOrEqualTo(
    left=JsonGet(
        step_name=step_evaluate_model.name,
        property_file=evaluation_report,
        json_path="binary_classification_metrics.accuracy.value",
    ),
    right=accuracy_condition_threshold,
)

# Create a SageMaker Pipelines ConditionStep, using the condition above.
# Enter the steps to perform if the condition returns True / False.
step_cond = ConditionStep(
    name="Accuracy-Condition",
    conditions=[cond_gte],
    if_steps=[step_register_model],
    else_steps=[step_fail], ## if this fails - add a step here (from the quip)
)

<a id='orchestrate'></a>

## Pipeline Creation: Orchestrate all steps

Now that all pipeline steps are created, a pipeline is created.

In [None]:
from sagemaker.workflow.pipeline import Pipeline

# Create a SageMaker Pipeline.
# Each parameter for the pipeline must be set as a parameter explicitly when the pipeline is created.
# Also pass in each of the steps created above.
# Note that the order of execution is determined from each step's dependencies on other steps,
# not on the order they are passed in below.
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        processing_instance_type,
        training_instance_type,
        preprocess_script,
        evaluate_script,
        accuracy_condition_threshold,
        model_registry_package,
        max_parallel_training_jobs,
        max_training_jobs,
        host_parameter,
        port_parameter,
        user_parameter,
        target_parameter, 
        feature_parameter,
        model_approval_status,
        
    ],
    steps=[
            step_preprocess_data, 
            step_tuning, 
            step_evaluate_model, 
            step_cond],
)

In [None]:
# Submit pipeline
pipeline_upsert_tags = config['pipeline_tags']['pipeline_upsert_tags']
print(pipeline_upsert_tags)
pipeline.upsert(role_arn=role, tags=pipeline_upsert_tags)

## Start pipeline with different parameters.
Now that the pipeline is created, it can be started with custom parameters making the pipeline agnostic to who is triggering it, but also to the scripts and data used. The pipeline can be started using the CLI, the SageMaker Studio UI or the SDK and below there is a screenshot of what it looks like in the SageMaker Studio UI.

#### Starting the pipeline with the SDK
In the examples below, the pipeline is triggered for two machine learning problems, each with different preprocessing scripts and model registry. Each machine learning problem is run with two different sets of parameters.

In [None]:
# Start pipeline with credit data and preprocessing script
execution = pipeline.start(
                execution_display_name=config['pipeline_parameters']['execution_display_name'],
                parameters=dict(
                PreprocessScript=order_preprocess_uri,
                EvaluateScript=evaluate_script_uri,
                AccuracyConditionThreshold=config['input_params']['accuracy_condition_threshold'],
                MaxiumParallelTrainingJobs=config['input_params']['maxium_parallel_training_jobs'],
                MaxiumTrainingJobs=config['input_params']['maxium_training_jobs'],
                ModelGroup=order_model_group,
            ),
        )

In [None]:
execution.describe()

In [None]:
execution.wait()

In [None]:
execution.list_steps()

#### Now that the model is registered, get access to the registered model manually on the sagemaker studio model registry console, or programmatically in the next notebook, approve it and run the second portion of this solution: Batch Transform Step

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/sagemaker-pipeline-parameterization|parameterized-pipeline.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/sagemaker-pipeline-parameterization|parameterized-pipeline.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/sagemaker-pipeline-parameterization|parameterized-pipeline.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/sagemaker-pipeline-parameterization|parameterized-pipeline.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/sagemaker-pipeline-parameterization|parameterized-pipeline.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/sagemaker-pipeline-parameterization|parameterized-pipeline.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/sagemaker-pipeline-parameterization|parameterized-pipeline.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/sagemaker-pipeline-parameterization|parameterized-pipeline.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/sagemaker-pipeline-parameterization|parameterized-pipeline.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/sagemaker-pipeline-parameterization|parameterized-pipeline.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/sagemaker-pipeline-parameterization|parameterized-pipeline.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/sagemaker-pipeline-parameterization|parameterized-pipeline.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/sagemaker-pipeline-parameterization|parameterized-pipeline.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/sagemaker-pipeline-parameterization|parameterized-pipeline.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/sagemaker-pipeline-parameterization|parameterized-pipeline.ipynb)
