# Deploy the Model

The pipeline that was executed created a Model Package version within the specified Model Package Group. Of particular note, the registration of the model/creation of the Model Package was done so with approval status as `PendingManualApproval`.

As part of SageMaker Pipelines, data scientists can register the model with approved/pending manual approval as part of the CI/CD workflow.

We can also approve the model using the SageMaker Studio UI or programmatically as shown below.

In [2]:
import psutil

notebook_memory = psutil.virtual_memory()

if notebook_memory.total < 32 * 1024 * 1024:
    print('*******************************************')    
    print('YOU ARE NOT USING THE CORRECT INSTANCE TYPE')
    print('PLEASE CHANGE INSTANCE TYPE TO  m5.2xlarge ')
    print('*******************************************')
else:
    correct_instance_type=True
    print(notebook_memory)

svmem(total=32890294272, available=29473607680, percent=10.4, used=3056504832, free=3299409920, active=1723940864, inactive=24283176960, buffers=2768896, cached=26531610624, shared=1163264, slab=2821099520)


In [3]:
from botocore.exceptions import ClientError

import os
import sagemaker
import logging
import boto3
import sagemaker
import pandas as pd

sess = sagemaker.Session()
bucket = sess.default_bucket()
region = boto3.Session().region_name

import botocore.config

config = botocore.config.Config(
    user_agent_extra='dsoaws/1.0'
)

sm = boto3.Session().client(service_name="sagemaker", 
                            region_name=region,
                            config=config)

In [4]:
%store -r role

# List Pipeline Execution Steps


In [5]:
%store -r pipeline_name

In [6]:
print(pipeline_name)

GPT3-pipeline-1677353486


In [7]:
%%time

import time
from pprint import pprint

executions_response = sm.list_pipeline_executions(PipelineName=pipeline_name)["PipelineExecutionSummaries"]
pipeline_execution_status = executions_response[0]["PipelineExecutionStatus"]
print(pipeline_execution_status)

while pipeline_execution_status == "Executing":
    try:
        executions_response = sm.list_pipeline_executions(PipelineName=pipeline_name)["PipelineExecutionSummaries"]
        pipeline_execution_status = executions_response[0]["PipelineExecutionStatus"]
    except Exception as e:
        print("Please wait...")
        time.sleep(30)

pprint(executions_response)

Succeeded
[{'PipelineExecutionArn': 'arn:aws:sagemaker:us-east-1:079002598131:pipeline/gpt3-pipeline-1677353486/execution/vidmuif2vxzd',
  'PipelineExecutionDisplayName': 'execution-1677353494624',
  'PipelineExecutionStatus': 'Succeeded',
  'StartTime': datetime.datetime(2023, 2, 25, 19, 31, 34, 532000, tzinfo=tzlocal())}]
CPU times: user 14.3 ms, sys: 0 ns, total: 14.3 ms
Wall time: 246 ms


In [8]:
pipeline_execution_status = executions_response[0]["PipelineExecutionStatus"]
print(pipeline_execution_status)

Succeeded


In [9]:
pipeline_execution_arn = executions_response[0]["PipelineExecutionArn"]
print(pipeline_execution_arn)

arn:aws:sagemaker:us-east-1:079002598131:pipeline/gpt3-pipeline-1677353486/execution/vidmuif2vxzd


In [10]:
from pprint import pprint

steps = sm.list_pipeline_execution_steps(PipelineExecutionArn=pipeline_execution_arn)

pprint(steps)

{'PipelineExecutionSteps': [{'AttemptCount': 0,
                             'EndTime': datetime.datetime(2023, 2, 25, 19, 58, 46, 237000, tzinfo=tzlocal()),
                             'Metadata': {'Model': {'Arn': 'arn:aws:sagemaker:us-east-1:079002598131:model/pipelines-vidmuif2vxzd-createmodel-rxzbsqpj0k'}},
                             'StartTime': datetime.datetime(2023, 2, 25, 19, 58, 44, 405000, tzinfo=tzlocal()),
                             'StepName': 'CreateModel',
                             'StepStatus': 'Succeeded'},
                            {'AttemptCount': 0,
                             'EndTime': datetime.datetime(2023, 2, 25, 19, 58, 45, 706000, tzinfo=tzlocal()),
                             'Metadata': {'RegisterModel': {'Arn': 'arn:aws:sagemaker:us-east-1:079002598131:model-package/gpt3-reviews-1677353489/1'}},
                             'StartTime': datetime.datetime(2023, 2, 25, 19, 58, 44, 405000, tzinfo=tzlocal()),
                             'StepNam

# View Registered Model

In [11]:
for execution_step in steps["PipelineExecutionSteps"]:
    if execution_step["StepName"] == "RegisterModel-RegisterModel":
        model_package_arn = execution_step["Metadata"]["RegisterModel"]["Arn"]
        break
print(model_package_arn)

arn:aws:sagemaker:us-east-1:079002598131:model-package/gpt3-reviews-1677353489/1


In [12]:
model_package_update_response = sm.update_model_package(
    ModelPackageArn=model_package_arn,
    ModelApprovalStatus="Approved",  # Other options are Rejected and PendingManualApproval
)

# View Created Model

In [13]:
for execution_step in steps["PipelineExecutionSteps"]:
    if execution_step["StepName"] == "CreateModel":
        model_arn = execution_step["Metadata"]["Model"]["Arn"]
        break
print(model_arn)

pipeline_model_name = model_arn.split("/")[-1]
print(pipeline_model_name)

arn:aws:sagemaker:us-east-1:079002598131:model/pipelines-vidmuif2vxzd-createmodel-rxzbsqpj0k
pipelines-vidmuif2vxzd-createmodel-rxzbsqpj0k


# Create Model Endpoint from Model Registry
More details here:  https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-deploy.html


In [14]:
import time

timestamp = int(time.time())

model_from_registry_name = "gpt3-model-from-registry-{}".format(timestamp)
print("Model from registry name : {}".format(model_from_registry_name))

model_registry_package_container = {
    "ModelPackageName": model_package_arn,
}

Model from registry name : gpt3-model-from-registry-1677375784


In [15]:
from pprint import pprint

create_model_from_registry_response = sm.create_model(
    ModelName=model_from_registry_name, ExecutionRoleArn=role, PrimaryContainer=model_registry_package_container
)
pprint(create_model_from_registry_response)

{'ModelArn': 'arn:aws:sagemaker:us-east-1:079002598131:model/gpt3-model-from-registry-1677375784',
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '97',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Sun, 26 Feb 2023 01:43:08 GMT',
                                      'x-amzn-requestid': 'd6eb47b4-b990-4c43-940c-5a4827b9fefd'},
                      'HTTPStatusCode': 200,
                      'RequestId': 'd6eb47b4-b990-4c43-940c-5a4827b9fefd',
                      'RetryAttempts': 0}}


In [16]:
model_from_registry_arn = create_model_from_registry_response["ModelArn"]
model_from_registry_arn

'arn:aws:sagemaker:us-east-1:079002598131:model/gpt3-model-from-registry-1677375784'

In [17]:
endpoint_config_name = "gpt3-model-from-registry-epc-{}".format(timestamp)
print(endpoint_config_name)

create_endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "InstanceType": "ml.m5.4xlarge",
            "InitialVariantWeight": 1,
            "InitialInstanceCount": 1,
            "ModelName": pipeline_model_name,
            "VariantName": "AllTraffic",
        }
    ],
)

gpt3-model-from-registry-epc-1677375784


In [18]:
%store -d pipeline_endpoint_name

In [19]:
# Pick up the existing `pipeline_endpoint_name` if it was already created

%store -r pipeline_endpoint_name

no stored variable or alias pipeline_endpoint_name


In [20]:
# try:
#     print("Using existing Pipeline EndpointName: {}".format(pipeline_endpoint_name))
# except NameError:
timestamp = int(time.time())
pipeline_endpoint_name = "gpt3-model-from-registry-ep-{}".format(timestamp)
print("Created Pipeline EndpointName={}".format(pipeline_endpoint_name))

create_endpoint_response = sm.create_endpoint(
    EndpointName=pipeline_endpoint_name, EndpointConfigName=endpoint_config_name
)
print(create_endpoint_response["EndpointArn"])

Created Pipeline EndpointName=gpt3-model-from-registry-ep-1677375788
arn:aws:sagemaker:us-east-1:079002598131:endpoint/gpt3-model-from-registry-ep-1677375788


In [21]:
%store pipeline_endpoint_name

Stored 'pipeline_endpoint_name' (str)


In [22]:
from IPython.core.display import display, HTML

display(
    HTML(
        '<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/endpoints/{}">SageMaker REST Endpoint</a></b>'.format(
            region, pipeline_endpoint_name
        )
    )
)

# _Wait Until the Endpoint is Deployed_
_Note:  This will take a few minutes.  Please be patient._

In [23]:
%%time

waiter = sm.get_waiter("endpoint_in_service")
waiter.wait(EndpointName=pipeline_endpoint_name)

CPU times: user 63.9 ms, sys: 18.9 ms, total: 82.8 ms
Wall time: 3min


# _Wait Until the Endpoint ^^ Above ^^ is Deployed_

# Generate a sample review

In [24]:
print(pipeline_endpoint_name)

gpt3-model-from-registry-ep-1677375788


In [25]:
import json

from sagemaker import Predictor

predictor = Predictor(
    endpoint_name=pipeline_endpoint_name,
    sagemaker_session=sess,
)

### 5. Advanced features

***
This model also supports many advanced parameters while performing inference. They include:

* **max_length:** Model generates text until the output length (which includes the input context length) reaches `max_length`. If specified, it must be a positive integer.
* **num_return_sequences:** Number of output sequences returned. If specified, it must be a positive integer.
* **num_beams:** Number of beams used in the greedy search. If specified, it must be integer greater than or equal to `num_return_sequences`.
* **no_repeat_ngram_size:** Model ensures that a sequence of words of `no_repeat_ngram_size` is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **early_stopping:** If True, text generation is finished when all beam hypotheses reach the end of stence token. If specified, it must be boolean.
* **do_sample:** If True, sample the next word as per the likelyhood. If specified, it must be boolean.
* **top_k:** In each step of text generation, sample from only the `top_k` most likely words. If specified, it must be a positive integer.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **seed:** Fix the randomized state for reproducibility. If specified, it must be an integer.

We may specify any subset of the parameters mentioned above while invoking an endpoint. Next, we show an example of how to invoke endpoint with these arguments

***

In [None]:
import json

prompt =  '{"text_inputs": "Write a review for Norton Antivirus", "max_length": 100, "top_k": 50, "top_p": 0.9, "do_sample": true}'
            
response = predictor.predict(prompt,
        {
            "ContentType": "application/json",
            "Accept": "application/json",
        },
)

print("Response: {}".format(response.decode('utf-8')))

In [63]:
import json

prompt =  '{"text_inputs": "Write a review for Turbo Tax", "max_length": 100, "top_k": 50, "top_p": 0.9, "do_sample": true}'
            
response = predictor.predict(prompt,
        {
            "ContentType": "application/json",
            "Accept": "application/json",
        },
)

print("Response: {}".format(response.decode('utf-8')))

Response: {"generated_texts": ["Write a review for Turbo Tax Software and I received a copy of the free trial.  I was pleased with how much more comprehensive the software was.  I wanted to use it more on my website and the site is perfect, and the Turbo Tax Software is really easy to use.<br /><br />Quick and easy.  Easy to use, easy to download.  Great price!<br /><br />This product worked very well, without any problems.  I will be using it again.I bought this from Amazon and it"]}


# Release Resources

In [27]:
# sm.delete_endpoint(
#      EndpointName=pipeline_endpoint_name
# )

In [28]:
# %%html

# <p><b>Shutting down your kernel for this notebook to release resources.</b></p>
# <button class="sm-command-button" data-commandlinker-command="kernelmenu:shutdown" style="display:none;">Shutdown Kernel</button>

# <script>
# try {
#     els = document.getElementsByClassName("sm-command-button");
#     els[0].click();
# }
# catch(err) {
#     // NoOp
# }
# </script>