# Deploy the Model

The pipeline that was executed created a Model Package version within the specified Model Package Group. Of particular note, the registration of the model/creation of the Model Package was done so with approval status as `PendingManualApproval`.

As part of SageMaker Pipelines, data scientists can register the model with approved/pending manual approval as part of the CI/CD workflow.

We can also approve the model using the SageMaker Studio UI or programmatically as shown below.

![Pipeline](img/generative_ai_pipeline_rlhf_plus.png)

In [None]:
import psutil

notebook_memory = psutil.virtual_memory()
print(notebook_memory)

if notebook_memory.total < 32 * 1000 * 1000 * 1000:
    print('*******************************************')    
    print('YOU ARE NOT USING THE CORRECT INSTANCE TYPE')
    print('PLEASE CHANGE INSTANCE TYPE TO  m5.2xlarge ')
    print('*******************************************')
else:
    correct_instance_type=True

In [None]:
from botocore.exceptions import ClientError

import os
import sagemaker
import logging
import boto3
import sagemaker
import pandas as pd

sess = sagemaker.Session()
bucket = sess.default_bucket()
region = boto3.Session().region_name

import botocore.config

config = botocore.config.Config(
    user_agent_extra='dsoaws/2.0'
)

sm = boto3.Session().client(service_name="sagemaker", 
                            region_name=region,
                            config=config)

# Retrieve model endpoint


In [None]:
%store -r pipeline_endpoint_name

In [None]:
try:
    pipeline_endpoint_name
except NameError:
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] Please run previous notebooks before you continue.")
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")

In [None]:
print(pipeline_endpoint_name)

In [None]:
from IPython.core.display import display, HTML

display(
    HTML(
        '<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/endpoints/{}">SageMaker HTTPS Endpoint</a></b>'.format(
            region, pipeline_endpoint_name
        )
    )
)

# _Wait Until the Endpoint is Deployed_
_Note:  This will take a few minutes.  Please be patient._

In [None]:
%%time

waiter = sm.get_waiter("endpoint_in_service")
waiter.wait(EndpointName=pipeline_endpoint_name)

# _Wait Until the Endpoint ^^ Above ^^ is Deployed_

# Zero Shot Inference

In [None]:
import json
from sagemaker import Predictor

zero_shot_prompt = """Summarize the following conversation.

#Person1#: Tom, I've got good news for you.
#Person2#: What is it?
#Person1#: Haven't you heard that your novel has won The Nobel Prize?
#Person2#: Really? I can't believe it. It's like a dream come true. I never expected that I would win The Nobel Prize!
#Person1#: You did a good job. I'm extremely proud of you.
#Person2#: Thanks for the compliment.
#Person1#: You certainly deserve it. Let's celebrate!

Summary:"""
predictor = Predictor(
    endpoint_name=pipeline_endpoint_name,
    sagemaker_session=sess,
)
response = predictor.predict(zero_shot_prompt,
        {
            "ContentType": "application/x-text",
            "Accept": "application/json",
        },
)
response_json = json.loads(response.decode('utf-8'))
print(response_json)

# Make many predictions and find the range of labels returned from this probabilistic (non-deterministic) generative model

## _THIS MAY TAKE A FEW MINUTES.  PLEASE BE PATIENT._

In [None]:
# set_of_responses_for_prompt = {}
# set_of_responses_for_prompt[zero_shot_prompt] = set()

# for i in range(100):
#     response = predictor.predict(zero_shot_prompt,
#             {
#                 "ContentType": "application/x-text",
#                 "Accept": "application/json",
#             },
#     )

#     response_json = json.loads(response.decode('utf-8'))
#     response_label = response_json['generated_text']
# #    print(response_label)
# #    print('** EXPECTED RESPONSE **: {}'.format(prompt2['label']))
    
#     set_of_responses_for_prompt[zero_shot_prompt].add(response_label)

# print('Total responses from the model for prompt: {}'.format(zero_shot_prompt))
# print(set_of_responses_for_prompt[zero_shot_prompt])
# print('\n')

In [None]:
# %store set_of_responses_for_prompt

# Advanced inference parameters

* **max_length:** Model generates text until the output length (which includes the input context length) reaches `max_length`. If specified, it must be a positive integer.
* **num_return_sequences:** Number of output sequences returned. If specified, it must be a positive integer.
* **num_beams:** Number of beams used in the greedy search. If specified, it must be integer greater than or equal to `num_return_sequences`.
* **no_repeat_ngram_size:** Model ensures that a sequence of words of `no_repeat_ngram_size` is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **early_stopping:** If True, text generation is finished when all beam hypotheses reach the end of stence token. If specified, it must be boolean.
* **do_sample:** If True, sample the next word as per the likelyhood. If specified, it must be boolean.
* **top_k:** In each step of text generation, sample from only the `top_k` most likely words. If specified, it must be a positive integer.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **seed:** Fix the randomized state for reproducibility. If specified, it must be an integer.

We may specify any subset of the parameters mentioned above while invoking an endpoint. Next, we show an example of how to invoke endpoint with these arguments

***

In [None]:
# import json

# payload = {
#     "text_inputs": zero_shot_prompt,
#     "num_return_sequences": 1,
#     "top_k": 50,
#     "top_p": 0.9,
#     "do_sample": True,
# }


# def query_endpoint_with_json_payload(predictor, payload):
#     """Query the model predictor with json payload."""

#     encoded_payload = json.dumps(payload).encode("utf-8")

#     query_response = predictor.predict(
#         encoded_payload,
#         {
#             "ContentType": "application/json",
#             "Accept": "application/json",
#         },
#     )
#     return query_response


# def parse_response_multiple_texts(query_response):
#     """Parse response and return the generated texts."""

#     model_predictions = json.loads(query_response)
#     generated_texts = model_predictions["generated_texts"]
#     return generated_texts


# query_response = query_endpoint_with_json_payload(predictor, payload)
# generated_texts = parse_response_multiple_texts(query_response)

# newline, bold, unbold = "\n", "\033[1m", "\033[0m"
# print(f"Input text: {zero_shot_prompt}{newline}" f"Generated text: {bold}{generated_texts}{unbold}{newline}")

# print('** EXPECTED RESPONSE **: {}'.format(prompt0['label']))

# Release Resources

In [None]:
# sm.delete_endpoint(
#      EndpointName=pipeline_endpoint_name
# )