# Using SageMaker @step decorator feature convert python functions for creating a custom Bedrock model into a SageMaker pipeline.

---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/sagemaker-pipelines|step-decorator|bedrock-examples|fine_tune_bedrock_step_decorator.ipynb)

---

> *This notebook has been tested with the **`Python 3`** kernel in SageMaker Studio (JupyterLab version).*

We will fine tune the [Amazon Titan Text Lite](#https://docs.aws.amazon.com/bedrock/latest/userguide/titan-text-models.html) model provided by Amazon Bedrock for a summarization use case. It uses a dataset from CNN that includes news articles and their summaries. The dataset called [cnn_dailymail v3.0](https://huggingface.co/datasets/cnn_dailymail) is available from Hugging Face. 

A *config.yaml* file can be found in the same folder as this notebook. This file includes properties that are passed to the @step decorator.

<div class="alert alert-block alert-warning">
<b>Warning:</b> The last section in this notebook does the clean up by removing the resources created during fine tuning and testing. That includes the Bedrock provisioned throughput which is needed to access the fine tuned custom model. Note that you will continue to incur AWS charges, unless you run the cleanup step.
</div>

In [1]:
!pip install -r requirements.txt

Collecting sagemaker<3,>=v2.211.0 (from -r requirements.txt (line 3))
  Downloading sagemaker-2.215.0-py3-none-any.whl.metadata (14 kB)
Collecting docker (from sagemaker<3,>=v2.211.0->-r requirements.txt (line 3))
  Using cached docker-7.0.0-py3-none-any.whl.metadata (3.5 kB)
Downloading sagemaker-2.215.0-py3-none-any.whl (1.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m68.0 MB/s[0m eta [36m0:00:00[0m
[?25hUsing cached docker-7.0.0-py3-none-any.whl (147 kB)
Installing collected packages: docker, sagemaker
  Attempting uninstall: sagemaker
    Found existing installation: sagemaker 2.198.1
    Uninstalling sagemaker-2.198.1:
      Successfully uninstalled sagemaker-2.198.1
Successfully installed docker-7.0.0 sagemaker-2.215.0


In [2]:
# restart kernel for the packages installed above to take effect
from IPython.core.display import HTML

HTML("<script>Jupyter.notebook.kernel.restart()</script>")

In [3]:
from datasets import load_dataset
from itertools import islice
import pandas as pd
import sagemaker
import jsonlines
import warnings

warnings.filterwarnings("ignore")
import json
import os
import sys
import boto3
import time
import pprint
import random
import yaml
from sagemaker.workflow.function_step import step
from sagemaker.workflow.parameters import ParameterString
from sagemaker.workflow.pipeline import Pipeline
from datetime import datetime
from botocore.exceptions import ClientError

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


In [4]:
# Set path to config file "config.yaml"
# The config.yaml file contains the arguments that are passed to the step decorator functions.
os.environ["SAGEMAKER_USER_CONFIG_OVERRIDE"] = os.getcwd()

## Setup

1. This notebook uses the default S3 bucket for the user. The default Amazon S3 bucket follows the naming pattern s3://sagemaker-{Region}-{your-account-id}. It is automatically created if it does not exist.

2. This notebook uses the default IAM role for the user. If your studio user role does not have AWS admininstrator access, you will need to add the necessary permissions to the role. These include:
    - [create a training job](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html#sagemaker-roles-createtrainingjob-perms)
    - [Access to Bedrock models](https://docs.aws.amazon.com/bedrock/latest/userguide/security_iam_id-based-policy-examples.html)
    - [Customize Amazon Bedrock model](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-iam-role.html)
    - [Access to SageMaker Pipelines](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-access.html)


In [5]:
sagemaker_session = sagemaker.session.Session()
region = sagemaker_session.boto_region_name

# get the default bucket and IAM role for the user
bucket_name = sagemaker_session.default_bucket()
role_arn = sagemaker.get_execution_role()

print(f"IAM role: {role_arn}")
print(f"S3 bucket: {bucket_name}")

sagemaker.config INFO - Fetched defaults config from location: /home/sagemaker-user/blog
IAM role: arn:aws:iam::095351214964:role/service-role/AmazonSageMaker-ExecutionRole-20200130T133110
S3 bucket: sagemaker-us-east-1-095351214964


In [6]:
# let's look at the contemts of config.yaml
# The properties in congig.ymk are passed into the @step function.
# print the contents of config.yaml
# Notice that pipeline step runs on ml.c5.2xlarge as specified in the InstanceType property
with open("./config.yaml", "r") as f:
    config = yaml.safe_load(f)
    print(yaml.dump(config, default_flow_style=False))

SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        CustomFileFilter:
          IgnoreNamePatterns:
          - '*.ipynb'
        Dependencies: ./requirements.txt
        IncludeLocalWorkDir: true
        InstanceType: ml.c5.2xlarge
SchemaVersion: '1.0'



In [7]:
from datasets import load_dataset

instruction = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

instruction:

Summarize the news article provided below.

input:

"""


def add_prompt_to_data(dataset):
    # Need to add prompt to the dataset in the format that is
    # required for fine tuning by the Titan test Lite model.
    datapoints = []

    for datapoint in dataset:
        # Add insruction prompt to each CNN article
        # and add prefix 'response:' to the article summary.
        temp_dict = {}
        temp_dict["prompt"] = instruction + datapoint["article"]
        temp_dict["completion"] = "response:\n\n" + datapoint["highlights"]
        datapoints.append(temp_dict)
    return datapoints


#### Define step for downloading the dataset
@step(
    name="data-load-step",
    keep_alive_period_in_seconds=300,
)
def data_load(ds_name: str, ds_version: str) -> tuple:
    dataset = load_dataset(ds_name, ds_version)

    # the dataset includes data for training, validation, and test.
    # The raw dataset includes the article and its summary.
    # We need to format each row with the LLM prompt.
    datapoints_train = add_prompt_to_data(dataset["train"])
    datapoints_valid = add_prompt_to_data(dataset["validation"])
    datapoints_test = add_prompt_to_data(dataset["test"])

    print(f"Number of training rows: {len(datapoints_train)}")
    print(f'\nTraining prompt: {datapoints_train[0]["prompt"]}')
    print(f'\nTraining Completion: {datapoints_train[0]["completion"]}')

    print(f"\nNumber of validation rows: {len(datapoints_valid)}")
    print(f'\nValidation prompt: {datapoints_valid[0]["prompt"]}')
    print(f'\nValidation Completion: {datapoints_valid[0]["completion"]}')

    print(f"\nNumber of test rows: {len(datapoints_test)}")
    print(f'\nTest prompt: {datapoints_test[0]["prompt"]}')
    print(f'\nTest Completion: {datapoints_test[0]["completion"]}')

    return datapoints_train, datapoints_valid, datapoints_test

In [8]:
# Restrict the number of rows and row length
def reduce_dataset_size(data, max_row_length, max_rows):
    datapoints = []
    for datapoint in data:
        if len(datapoint["prompt"] + datapoint["completion"]) <= max_row_length:
            datapoints.append(datapoint)
    random.shuffle(datapoints)
    datapoints = datapoints[:max_rows]
    print(f"\nData set size: {len(datapoints)}")

    return datapoints


#### Define step for splitting the dataset into training, validation, and testing
# restrict the size of each row to 3000 words
# We also select 100 rows for training, 10 for validation, and 5 for testing
# to keep computation costs low for this example
@step(
    name="data-split-step",
    keep_alive_period_in_seconds=300,
)
def data_split(step_load_result: tuple) -> tuple:
    train_lines = reduce_dataset_size(step_load_result[0], 3000, 100)
    validation_lines = reduce_dataset_size(step_load_result[1], 3000, 10)
    test_lines = reduce_dataset_size(step_load_result[2], 3000, 5)

    print(f"\nNumber of training rows: {len(train_lines)}")
    print(f"\nNumber of training rows: {len(validation_lines)}")
    print(f"\nNumber of training rows: {len(test_lines)}")

    return train_lines, validation_lines, test_lines

In [9]:
# Upload the training, validation, and test files to S3
def upload_file_to_s3(bucket_name: str, file_names: tuple, s3_key_names: tuple):
    import boto3

    s3_client = boto3.client("s3")
    for i in range(len(file_names)):
        s3_client.upload_file(file_names[i], bucket_name, s3_key_names[i])


# Save the training, validation, and test files in jsonl format
# to the local file system
def write_jsonl_file(abs_path: str, file_name: str, data) -> str:
    saved_file_path = f"{abs_path}/{file_name}"

    with jsonlines.open(saved_file_path, "w") as writer:
        for line in data:
            writer.write(line)

    return saved_file_path


# Save the s3 uri for test data in SSM.
def save_s3_uri_in_SSM(parameter_name, parameter_value):
    ssm_client = boto3.client("ssm")
    response = ssm_client.put_parameter(
        Name=parameter_name, Value=parameter_value, Type="String", Overwrite=True
    )


#### Define step for uploading the training, validation, and test data to S3
@step(
    name="data-upload-to-s3-step",
    keep_alive_period_in_seconds=300,
)
# Convert the data to jsonl format and upload to S3.
def data_upload_to_s3(data_split_response: tuple, bucket_name: str) -> tuple:
    dataset_folder = "fine-tuning-datasets"

    if not os.path.exists(dataset_folder):
        # Create the directory
        os.makedirs(dataset_folder)
        print(f"Directory {dataset_folder} created successfully!")
    else:
        print(f"Directory  {dataset_folder} already exists!")

    abs_path = os.path.abspath(dataset_folder)
    print(f"\nDataset folder path: {abs_path}")

    print(type(data_split_response[0]))
    train_file = write_jsonl_file(abs_path, "train-cnn.jsonl", data_split_response[0])
    val_file = write_jsonl_file(abs_path, "validation-cnn.jsonl", data_split_response[1])
    test_file = write_jsonl_file(abs_path, "test-cnn.jsonl", data_split_response[2])

    file_names = train_file, val_file, test_file

    s3_keys = (
        f"{dataset_folder}/train/train-cnn.jsonl",
        f"{dataset_folder}/validation/validation-cnn.jsonl",
        f"{dataset_folder}/test/test-cnn.jsonl",
    )
    print(s3_keys)

    upload_file_to_s3(bucket_name, file_names, s3_keys)

    # save test file S3 uri for use later while testing the model
    save_s3_uri_in_SSM("s3_test_uri", f"s3://{bucket_name}/{s3_keys[2]}")

    # return the s3 uris for data files
    return (
        f"s3://{bucket_name}/{s3_keys[0]}",
        f"s3://{bucket_name}/{s3_keys[1]}",
        f"s3://{bucket_name}/{s3_keys[2]}",
    )

In [10]:
#### Define step for custom training the model
@step(
    name="model-training-step",
    keep_alive_period_in_seconds=300,
)
def train(
    custom_model_name: str, training_job_name: str, step_data_upload_to_s3_result: tuple
) -> str:
    # Define the hyperparameters for fine-tuning Titan text model
    hyper_parameters = {
        "epochCount": "2",
        "batchSize": "1",
        "learningRate": "0.00003",
    }

    # Specify your data path for training, validation(optional) and output
    training_data_config = {"s3Uri": step_data_upload_to_s3_result[0]}
    print(f"Training data config: {training_data_config}")

    validation_data_config = {
        "validators": [
            {
                # "name": "validation",
                "s3Uri": step_data_upload_to_s3_result[1]
            }
        ]
    }
    print(f"Validation data config: {validation_data_config}")

    output_data_config = {
        "s3Uri": f"s3://{bucket_name}/fine-tuning-datasets/outputs/output-{custom_model_name}"
    }

    bedrock = boto3.client(service_name="bedrock")

    print("Start training....")

    # Create the customization job
    training_job_response = bedrock.create_model_customization_job(
        customizationType="FINE_TUNING",
        jobName=training_job_name,
        customModelName=custom_model_name,
        roleArn=role_arn,
        baseModelIdentifier="amazon.titan-text-lite-v1:0:4k",
        hyperParameters=hyper_parameters,
        trainingDataConfig=training_data_config,
        validationDataConfig=validation_data_config,
        outputDataConfig=output_data_config,
    )
    print(training_job_response)

    job_status = bedrock.get_model_customization_job(jobIdentifier=training_job_name)["status"]
    print(job_status)

    while job_status == "InProgress":
        time.sleep(60)
        job_status = bedrock.get_model_customization_job(jobIdentifier=training_job_name)["status"]
        print(job_status)

    fine_tune_job = bedrock.get_model_customization_job(jobIdentifier=training_job_name)
    pprint.pp(fine_tune_job)
    output_job_name = "model-customization-job-" + fine_tune_job["jobArn"].split("/")[-1]
    print(f"output_job_name: {output_job_name}")

    model_id = bedrock.get_custom_model(modelIdentifier=custom_model_name)["modelArn"]

    print(f"Model id: {model_id}")
    return model_id

In [11]:
#### Define step for creating Provisioned throughput for the custom model
@step(
    name="create-provisioned-throughput-step",
    keep_alive_period_in_seconds=300,
)
def create_prov_thruput(model_id: str, provisioned_model_name: str) -> str:
    bedrock = boto3.client(service_name="bedrock")

    provisioned_model_id = bedrock.create_provisioned_model_throughput(
        modelUnits=1, provisionedModelName=provisioned_model_name, modelId=model_id
    )["provisionedModelArn"]

    status = bedrock.get_provisioned_model_throughput(provisionedModelId=provisioned_model_id)[
        "status"
    ]

    print(status)

    while status == "Creating":
        time.sleep(60)
        status = bedrock.get_provisioned_model_throughput(provisionedModelId=provisioned_model_id)[
            "status"
        ]
        print(status)
        time.sleep(60)

    return provisioned_model_id

In [12]:
# Test the custom model


def get_ssm_parameter(parameter_name):
    ssm_client = boto3.client("ssm")
    response = ssm_client.get_parameter(Name=parameter_name, WithDecryption=True)

    return response["Parameter"]["Value"]


#### Define step for tesiing the custom model
@step(
    name="model-testing-step",
    keep_alive_period_in_seconds=300,
)
def test_model(provisioned_model_id: str) -> tuple:
    s3_uri = get_ssm_parameter("s3_test_uri")

    # Split the s3 uri into bucket name and key
    s3_bucket = s3_uri.split("/")[2]
    s3_key = "/".join(s3_uri.split("/")[3:])
    print(f"s3_bucket : {s3_bucket}, s3_key: {s3_key}")

    # down load the test file
    s3 = boto3.client("s3")

    s3.download_file(s3_bucket, s3_key, "test-cnn.jsonl")

    # Invoke the model
    with open("test-cnn.jsonl") as f:
        lines = f.read().splitlines()

    test_prompt = json.loads(lines[0])["prompt"]
    reference_summary = json.loads(lines[0])["completion"]
    pprint.pp(test_prompt)
    print(reference_summary)

    prompt = f"""
            {test_prompt}
            """
    body = json.dumps(
        {
            "inputText": prompt,
            "textGenerationConfig": {
                "maxTokenCount": 2048,
                "stopSequences": ["User:"],
                "temperature": 0,
                "topP": 0.9,
            },
        }
    )

    accept = "application/json"
    contentType = "application/json"

    bedrock_runtime = boto3.client(service_name="bedrock-runtime")

    fine_tuned_response = bedrock_runtime.invoke_model(
        body=body, modelId=provisioned_model_id, accept=accept, contentType=contentType
    )

    fine_tuned_response_body = json.loads(fine_tuned_response.get("body").read())
    summary = fine_tuned_response_body["results"][0]["outputText"]

    print("Fine tuned model response:", summary)
    print("\nReference summary from test data: ", reference_summary)
    return prompt, summary

In [13]:
#### Create the SageMaker pipeline
# You can see the multi-step directed acyclic graph (DAG) in the Studio UI as a pipeline

pipeline_name = "bedrock-fine-tune-pipeline"

ts = datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
custom_model_name = f"finetuned-model-{ts}"
training_job_name = f"model-finetune-job-{ts}"
provisioned_model_name = f"summarization-model-{ts}"

param1 = ParameterString(name="ds_name", default_value="cnn_dailymail")
param2 = ParameterString(name="ds_version", default_value="3.0.0")

data_load_response = data_load(param1, param2)

data_split_response = data_split(data_load_response)

data_upload_to_s3_response = data_upload_to_s3(data_split_response, bucket_name)

train_response = train(custom_model_name, training_job_name, data_upload_to_s3_response)

create_prov_thruput_response = create_prov_thruput(train_response, provisioned_model_name)

test_model_response = test_model(create_prov_thruput_response)

pipeline = Pipeline(name=pipeline_name, steps=[test_model_response], parameters=[param1, param2])

In [14]:
pipeline.upsert(role_arn)

sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.Dependencies
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.IncludeLocalWorkDir
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.CustomFileFilter.IgnoreNamePatterns
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.InstanceType


2024-04-12 21:36:37,429 sagemaker.remote_function INFO     Uploading serialized function code to s3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/model-testing-step/2024-04-12-21-36-35-895/function
2024-04-12 21:36:37,547 sagemaker.remote_function INFO     Uploading serialized function arguments to s3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/model-testing-step/2024-04-12-21-36-35-895/arguments
2024-04-12 21:36:37,851 sagemaker.remote_function INFO     Copied dependencies file at './requirements.txt' to '/tmp/tmpv41q6gtg/requirements.txt'
2024-04-12 21:36:37,912 sagemaker.remote_function INFO     Successfully uploaded dependencies and pre execution scripts to 's3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/model-testing-step/2024-04-12-21-36-35-895/pre_exec_script_and_dependencies'
2024-04-12 21:36:38,000 sagemaker.remote_function INFO     Copied user workspace to '/tmp/tmpse97qmlu/temp_workspace/sagemaker_remote_function_workspace'

sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.Dependencies
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.IncludeLocalWorkDir
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.CustomFileFilter.IgnoreNamePatterns
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.InstanceType


2024-04-12 21:36:40,298 sagemaker.remote_function INFO     Uploading serialized function code to s3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/create-provisioned-throughput-step/2024-04-12-21-36-35-895/function
2024-04-12 21:36:40,411 sagemaker.remote_function INFO     Uploading serialized function arguments to s3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/create-provisioned-throughput-step/2024-04-12-21-36-35-895/arguments
2024-04-12 21:36:40,487 sagemaker.remote_function INFO     Copied dependencies file at './requirements.txt' to '/tmp/tmpnj3veih_/requirements.txt'
2024-04-12 21:36:40,519 sagemaker.remote_function INFO     Successfully uploaded dependencies and pre execution scripts to 's3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/create-provisioned-throughput-step/2024-04-12-21-36-35-895/pre_exec_script_and_dependencies'


sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.Dependencies
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.IncludeLocalWorkDir
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.CustomFileFilter.IgnoreNamePatterns
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.InstanceType


2024-04-12 21:36:41,695 sagemaker.remote_function INFO     Uploading serialized function code to s3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/model-training-step/2024-04-12-21-36-35-895/function
2024-04-12 21:36:41,792 sagemaker.remote_function INFO     Uploading serialized function arguments to s3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/model-training-step/2024-04-12-21-36-35-895/arguments
2024-04-12 21:36:41,912 sagemaker.remote_function INFO     Copied dependencies file at './requirements.txt' to '/tmp/tmpzxpgiqlm/requirements.txt'
2024-04-12 21:36:41,983 sagemaker.remote_function INFO     Successfully uploaded dependencies and pre execution scripts to 's3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/model-training-step/2024-04-12-21-36-35-895/pre_exec_script_and_dependencies'


sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.Dependencies
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.IncludeLocalWorkDir
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.CustomFileFilter.IgnoreNamePatterns
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.InstanceType


2024-04-12 21:36:43,162 sagemaker.remote_function INFO     Uploading serialized function code to s3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/data-upload-to-s3-step/2024-04-12-21-36-35-895/function
2024-04-12 21:36:43,346 sagemaker.remote_function INFO     Uploading serialized function arguments to s3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/data-upload-to-s3-step/2024-04-12-21-36-35-895/arguments
2024-04-12 21:36:43,465 sagemaker.remote_function INFO     Copied dependencies file at './requirements.txt' to '/tmp/tmp7ujlj15s/requirements.txt'
2024-04-12 21:36:43,528 sagemaker.remote_function INFO     Successfully uploaded dependencies and pre execution scripts to 's3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/data-upload-to-s3-step/2024-04-12-21-36-35-895/pre_exec_script_and_dependencies'


sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.Dependencies
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.IncludeLocalWorkDir
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.CustomFileFilter.IgnoreNamePatterns
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.InstanceType


2024-04-12 21:36:44,700 sagemaker.remote_function INFO     Uploading serialized function code to s3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/data-split-step/2024-04-12-21-36-35-895/function
2024-04-12 21:36:44,781 sagemaker.remote_function INFO     Uploading serialized function arguments to s3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/data-split-step/2024-04-12-21-36-35-895/arguments
2024-04-12 21:36:44,891 sagemaker.remote_function INFO     Copied dependencies file at './requirements.txt' to '/tmp/tmp9p4gw5b6/requirements.txt'
2024-04-12 21:36:44,919 sagemaker.remote_function INFO     Successfully uploaded dependencies and pre execution scripts to 's3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/data-split-step/2024-04-12-21-36-35-895/pre_exec_script_and_dependencies'


sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.Dependencies
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.IncludeLocalWorkDir
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.CustomFileFilter.IgnoreNamePatterns
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.RemoteFunction.InstanceType


2024-04-12 21:36:46,092 sagemaker.remote_function INFO     Uploading serialized function code to s3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/data-load-step/2024-04-12-21-36-35-895/function
2024-04-12 21:36:46,213 sagemaker.remote_function INFO     Uploading serialized function arguments to s3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/data-load-step/2024-04-12-21-36-35-895/arguments
2024-04-12 21:36:46,292 sagemaker.remote_function INFO     Copied dependencies file at './requirements.txt' to '/tmp/tmpx3pmqpqv/requirements.txt'
2024-04-12 21:36:46,319 sagemaker.remote_function INFO     Successfully uploaded dependencies and pre execution scripts to 's3://sagemaker-us-east-1-095351214964/bedrock-fine-tune-pipeline/data-load-step/2024-04-12-21-36-35-895/pre_exec_script_and_dependencies'


{'PipelineArn': 'arn:aws:sagemaker:us-east-1:095351214964:pipeline/bedrock-fine-tune-pipeline',
 'ResponseMetadata': {'RequestId': '8de6e516-fdbf-4d34-bc19-4b61a6cb6474',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '8de6e516-fdbf-4d34-bc19-4b61a6cb6474',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '94',
   'date': 'Fri, 12 Apr 2024 21:36:46 GMT'},
  'RetryAttempts': 0}}

In [15]:
execution = pipeline.start()

In [16]:
execution.describe()

{'PipelineArn': 'arn:aws:sagemaker:us-east-1:095351214964:pipeline/bedrock-fine-tune-pipeline',
 'PipelineExecutionArn': 'arn:aws:sagemaker:us-east-1:095351214964:pipeline/bedrock-fine-tune-pipeline/execution/l040kjgtiq4n',
 'PipelineExecutionDisplayName': 'execution-1712957806959',
 'PipelineExecutionStatus': 'Executing',
 'CreationTime': datetime.datetime(2024, 4, 12, 21, 36, 46, 908000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2024, 4, 12, 21, 36, 46, 908000, tzinfo=tzlocal()),
 'CreatedBy': {'UserProfileArn': 'arn:aws:sagemaker:us-east-1:095351214964:user-profile/d-ndkfwlyrojeq/blog',
  'UserProfileName': 'blog',
  'DomainId': 'd-ndkfwlyrojeq',
  'IamIdentity': {'Arn': 'arn:aws:sts::095351214964:assumed-role/AmazonSageMaker-ExecutionRole-20200130T133110/SageMaker',
   'PrincipalId': 'AROARMM3ACN2NE2XC3HPY:SageMaker'}},
 'LastModifiedBy': {'UserProfileArn': 'arn:aws:sagemaker:us-east-1:095351214964:user-profile/d-ndkfwlyrojeq/blog',
  'UserProfileName': 'blog',
  'D

In [17]:
%%time
execution.wait(delay=60, max_attempts=250)

CPU times: user 1.44 s, sys: 87.4 ms, total: 1.53 s
Wall time: 1h 31min 17s


In [18]:
execution.list_steps()

[{'StepName': 'model-testing-step',
  'StepDisplayName': '__main__.test_model',
  'StartTime': datetime.datetime(2024, 4, 12, 23, 4, 43, 688000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2024, 4, 12, 23, 7, 33, 776000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {'TrainingJob': {'Arn': 'arn:aws:sagemaker:us-east-1:095351214964:training-job/pipelines-l040kjgtiq4n-model-testing-step-pr4gGsj2Rt'}},
  'AttemptCount': 1},
 {'StepName': 'create-provisioned-throughput-step',
  'StepDisplayName': '__main__.create_prov_thruput',
  'StartTime': datetime.datetime(2024, 4, 12, 22, 49, 35, 654000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2024, 4, 12, 23, 4, 42, 774000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'Metadata': {'TrainingJob': {'Arn': 'arn:aws:sagemaker:us-east-1:095351214964:training-job/pipelines-l040kjgtiq4n-create-provisioned-t-xDN4wVqlsC'}},
  'AttemptCount': 1},
 {'StepName': 'model-training-step',
  'StepDisplayName': '__main__.train',
  

In [19]:
print(execution.result(step_name="model-testing-step"))

('\n            Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\ninstruction:\n\nSummarize the news article provided below.\n\ninput:\n\n(CNN)Remains of up to nearly 400 unaccounted for service members tied to the USS Oklahoma at Pearl Harbor will be exhumed this year, the Defense Department announced Tuesday. The hope is that most of the battleship\'s sailors and Marines can be identified. "The secretary of defense and I will work tirelessly to ensure your loved one\'s remains will be recovered, identified, and returned to you as expeditiously as possible, and we will do so with dignity, respect and care," Deputy Secretary of Defense Bob Work said in a statement. "While not all families will receive an individual identification, we will strive to provide resolution to as many families as possible." The USS Oklahoma sank when it was hit by torpedoes on December 7, 1941, durin

## Cleanup
Delete the resources that were created to stop incurring charges.

In [20]:
bedrock = boto3.client(service_name="bedrock")

# delete Bedrock provisioned throughput
provisioned_model_id = execution.result(step_name="create-provisioned-throughput-step")
try:
    bedrock.delete_provisioned_model_throughput(provisionedModelId=provisioned_model_id)
except ClientError as e:
    print(e.response["Error"]["Code"])

print(f"Provisoned throughput deleted for model: {provisioned_model_id}")

# delete the custom model
custom_model_id = execution.result(step_name="model-training-step")
try:
    bedrock.delete_custom_model(modelIdentifier=custom_model_id)
except ClientError as e:
    print(e.response["Error"]["Code"])

print(f"Custom model {custom_model_id} deleted.")

Provisoned throughput deleted for model: arn:aws:bedrock:us-east-1:095351214964:provisioned-model/fj8dou88yq5q
Custom model arn:aws:bedrock:us-east-1:095351214964:custom-model/amazon.titan-text-lite-v1:0:4k/2zefi5rp4ez1 deleted.


In [21]:
# delete the SSM parameter
ssm_client = boto3.client("ssm")
ssm_client.delete_parameter(Name="s3_test_uri")

{'ResponseMetadata': {'RequestId': '4a830460-d5d5-48bd-94fa-729f0b5dbfcd',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Fri, 12 Apr 2024 23:08:07 GMT',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '2',
   'connection': 'keep-alive',
   'x-amzn-requestid': '4a830460-d5d5-48bd-94fa-729f0b5dbfcd'},
  'RetryAttempts': 0}}

In [22]:
# Delete the SageMaker pipeline
response = pipeline.delete()
print(f'Deleted pipeline {response["PipelineArn"]}')

INFO:sagemaker.workflow.pipeline:If triggers have been setup for this target, they will become orphaned.You will need to clean them up manually via the CLI or EventBridge console.


Deleted pipeline arn:aws:sagemaker:us-east-1:095351214964:pipeline/bedrock-fine-tune-pipeline


In [23]:
# delete objects in S3
def delete_objects_with_prefix(bucket_name, prefix):
    s3 = boto3.client("s3")

    response = s3.list_objects_v2(Bucket=bucket_name, Delimiter="/", Prefix=prefix)

    if "Contents" in response:
        contents = response["Contents"]
        for obj in contents:
            s3.delete_object(Bucket=bucket_name, Key=obj["Key"])

    while response["IsTruncated"]:
        response = s3.list_objects_v2(
            Bucket=bucket_name,
            Delimiter="/",
            Prefix=prefix,
            ContinuationToken=response["NextContinuationToken"],
        )
        if "Contents" in response:
            contents = response["Contents"]
            for obj in contents:
                s3.delete_object(Bucket=bucket_name, Key=obj["Key"])


delete_objects_with_prefix(bucket_name, "fine-tuning-datasets")
delete_objects_with_prefix(bucket_name, pipeline_name)

print(f"Objects in Bucket {bucket_name} have been deleted.")

Objects in Bucket sagemaker-us-east-1-095351214964 have been deleted.


## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.


![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/sagemaker-pipelines|step-decorator|bedrock-examples|fine_tune_bedrock_step_decorator.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/sagemaker-pipelines|step-decorator|bedrock-examples|fine_tune_bedrock_step_decorator.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/sagemaker-pipelines|step-decorator|bedrock-examples|fine_tune_bedrock_step_decorator.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/sagemaker-pipelines|step-decorator|bedrock-examples|fine_tune_bedrock_step_decorator.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/sagemaker-pipelines|step-decorator|bedrock-examples|fine_tune_bedrock_step_decorator.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/sagemaker-pipelines|step-decorator|bedrock-examples|fine_tune_bedrock_step_decorator.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/sagemaker-pipelines|step-decorator|bedrock-examples|fine_tune_bedrock_step_decorator.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/sagemaker-pipelines|step-decorator|bedrock-examples|fine_tune_bedrock_step_decorator.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/sagemaker-pipelines|step-decorator|bedrock-examples|fine_tune_bedrock_step_decorator.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/sagemaker-pipelines|step-decorator|bedrock-examples|fine_tune_bedrock_step_decorator.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/sagemaker-pipelines|step-decorator|bedrock-examples|fine_tune_bedrock_step_decorator.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/sagemaker-pipelines|step-decorator|bedrock-examples|fine_tune_bedrock_step_decorator.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/sagemaker-pipelines|step-decorator|bedrock-examples|fine_tune_bedrock_step_decorator.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/sagemaker-pipelines|step-decorator|bedrock-examples|fine_tune_bedrock_step_decorator.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/sagemaker-pipelines|step-decorator|bedrock-examples|fine_tune_bedrock_step_decorator.ipynb)
