###### 0

<div style="background-image: linear-gradient(145deg, rgba(35, 47, 62, 1) 0%, rgba(0, 49, 129, 1) 40%, rgba(32, 116, 213, 1) 60%, rgba(244, 110, 197, 1) 85%, rgba(255, 173, 151, 1) 100%); padding: 1rem 2rem; width: 95%"><img style="width: 60%;" src="images/MLU_logo.png"></div>

# MLU Operationalizing Generative AI with LLMOps 
# <a name="p0">Lab 1: Interacting with the LLM-powered application</a>

In this lab you will familiarize yourself with the [GenAI with Bedrock](https://create.hub.amazon.dev/source-applications/genai-python-lambda) service that you have created via [BuilderHub Create](https://create.hub.amazon.dev). You will send requests with various questions to the API Gateway / AWS Lambda service and get answers back to gain an initial understanding of what the service is capable of. 

Next, you will experiment with Amazon Bedrock directly from this notebook, which you can use as your local playground. This will allow you to test different models and inference settings available in Bedrock, try out prompts and prompting techniques, and obtain responses from the system. In a real development scenario, **you can use notebooks for rapid and easy iterations** prior to deploying changes to the main package. 

Finally, you will implement changes to the main package and deploy them via the CDK CLI of your CDK package. This will update the code in the AWS Lambda function, allowing you to modify the behavior of the live system. Once your changes have been deployed, you will be able to check that your modifications have indeed updated the LLM-powered service. 



## Table of Contents
1. [Familiarize yourself with the deployed service](#1)
2. [Explore LLM capabilities via prompting in Amazon Bedrock](#2)
3. [Implement changes to the system and deploy with CDK](#3)

<br/>
<div style="display: flex; align-items: center; justify-content: left; background-color:#330066; width:99%;"> 
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_robot.png" alt="MLU robot" width="100" height="100"/>
    <span style="color: white; padding-left: 10px; align: left; margin: 15px;">
        This notebook assumes that you have already completed the following tasks, as described in the Lab Setup Instructions document:<br/>
        <ul>
            <li>Created a <a style="color: lightskyblue;" href="https://create.hub.amazon.dev/source-applications/genai-python-lambda">GenAI with Bedrock</a> application with <code style="color: lightcoral;">{Alias}MLUCourseLLMOps</code> as Clone Name.</li><br/>
            <li>Enabled model access to Claude 3 Haiku and Sonnet, and Mistral 7B Instruct on Amazon Bedrock in your AWS account.</li><br/>
            <li>Created a workspace on your development environment and pulled all packages included in the application.</li><br/>
            <li>Built your Experiment package in your development environment with <code style="color: lightcoral;">brazil-build release</code>.</li>
        </ul>
    </span>
</div>


### Select the Jupyter Kernel

In order to run this notebook, you need to use a Kernel. We will use the Kernel from the Python virtual environment provided with this package. 

**In VSCode**
  - Click on "Select Kernel" on the top right of this window.
  - Click on "Python Environments" on the text input bar at the top of this window.
  - Select the `.venv (Python 3.12.x)` Virtual env, located in `{WorkspaceRoot}/src/{Alias}MLUCourseLLMOpsExperiment/.venv/bin/python`
  - Double check that the Kernel shown on the top right of this window reads `.venv (Python 3.12.x)`


### A note about your development environment

One of the key goals of this lab is to give you an experience as similar as possible to a real LLM operationalization process. 

Besides the data science environment based on Jupyter notebooks, where you can experiment with LLM models, you will also be required to run commands in a terminal window.

Every time you see  an instruction to open a terminal and run a command, feel free to choose the terminal you are more comfortable with: 
+ A Terminal window with SSH to your Cloud Desktop, or
+ The VS Code Terminal window. In this case, make sure to open the terminal in the same window that is connected to your Cloud Destop environment.

***

###### 1
## <a>Part 1 - Familiarize yourself with the deployed service</a>
([Go to top](#0))

Once your pipeline has deployed successfully to the alpha stage, you have an API endpoint that can be used to send prompts to a Large Language Model (LLM) hosted on Amazon Bedrock. This application uses [Retrieval Augmented Generation (RAG)](https://aws.amazon.com/what-is/retrieval-augmented-generation/), a technique that pulls from a vector store to retrieve relevant documents. The retrieved information is inserted into the context that's sent to the model via the prompt, which enhances the quality of the model's response. 

During application creation, we have stored documentation about AWS Lambda in Amazon Kendra, which turns the initial deployed service into an expert on questions related to AWS Lambda.

The diagram below clarifies the components of the application's architecture:

![System_Architecture Image depicting the steps 1-8 described in the text below.](images/BasicGenAI-Tutorial.drawio.png)

1. Client makes an **API request to an API Gateway endpoint**. The request is signed with SigV4. 
2. API Gateway receives the request and invokes the configured **AWS Lambda function** with the request payload.
3. Lambda function handler invokes **Amazon Kendra API** with the input query. 
4. Amazon Kendra returns the list of **relevant document chunks**.
5. Lambda function handler builds the **prompt with the input query and the context** (i.e. relevant document chunks) and invokes Bedrock API. 
6. Bedrock API performs the **model inference** on the prompt and returns the results to the function handler.
7. The function builds the **response** structure and returns to API Gateway. 
8. API Gateway returns the **API response to the client**.

Independently from this workflow, Amazon Kendra asynchronously imports the knowledge base from a designated S3 location to its index. 

### Getting credentials to connect to the AWS account

Let's test how the deployed application responds to your questions by sending requests to the API running in your alpha AWS account. The API is protected by IAM authentication, which means that we require a credential for an IAM role to sign the request. 

You will use the `MLU-LLMOps-Burner` profile that you have created with [Ada (Authorization and Auditing for Networking)](https://w.amazon.com/bin/view/Public/Ada) during the Lab Setup process. 

Take a look at your `~/.config/ada/profile.json` that must contain a profile named `MLU-LLMOps-Burner`:

In [1]:
!cat $HOME/.config/ada/profile.json

{
  "Profiles": [
    {
      "Provider": "conduit",
      "Account": "654654505854",
      "Role": "IibsAdminAccess-DO-NOT-DELETE",
      "Partition": "aws",
      "Profile": "aqd"
    },
    {
      "Provider": "conduit",
      "Account": "961341554577",
      "Region": "us-west-2",
      "Role": "IibsAdminAccess-DO-NOT-DELETE",
      "Partition": "aws",
      "Profile": "MLU-LLMOps-Burner"
    }
  ]
}

If the output above does not contain an entry for a `conduit` profile named `MLU-LLMOps-Burner`, please double check that you have properly created a `MLU-LLMOps-Burner` profile with `ada profile add`. 

Go back to the Lab Setup instructions, visit the `Bootstrap the AWS Burner Account` section and review the steps. 

Let's double check that the created profile can correctly connect to your AWS Burner account. 

**Run the cell below. If you receive a response containing `UserId`, `Account` and `Arn` corresponding to your AWS Burner account, your AWS account access has been properly set up.**

In [2]:
!aws --profile=MLU-LLMOps-Burner sts get-caller-identity

{
    "UserId": "AROA57VDL76IRL6OEPC4F:koachang@MIDWAY.AMAZON.COM",
    "Account": "961341554577",
    "Arn": "arn:aws:sts::961341554577:assumed-role/IibsAdminAccess-DO-NOT-DELETE/koachang@MIDWAY.AMAZON.COM"
}


**Boto3 version**

Before we proceed with the rest of the notebook, let's check the `boto3` version that's available through your virtual environment. The GenAI application runs with `"boto3 >=1.34.143"`. 

In [1]:
# Import boto3 and check version
import boto3 

boto3.__version__

'1.34.153'

<div style="display: flex; align-items: center; justify-content: left; background-color:#330066; width:99%;"> 
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_robot.png" alt="MLU robot" width="100" height="100"/>
    <span style="color: white; padding-left: 10px; align: left; margin: 15px;">
        If the <code style="color: lightcoral;">boto3</code> version displayed above is equal or higher than <code style="color: lightcoral;">1.34.143</code>, you can ignore this box. If it's lower than <code style="color: lightcoral;">1.34.143</code>, you should follow the steps below to update it:<br/>
        <ul>
            <li>Open file <code style="color: lightcoral;">pyproject.toml</code> located inside your Experiment package.</li><br/>
            <li>Edit section <code style="color: lightcoral;">[project.optional-dependencies]</code> and replace the <code style="color: lightcoral;">boto3</code> line with <code style="color: lightcoral;">"boto3 >=1.34.143"</code>.</li><br/>
            <li>Delete file <code style="color: lightcoral;">requirements.txt</code> from the root folder of the Experiment package by running <code style="color: lightcoral;">rm requirements.txt</code>. Don't worry, this file will be regenerated later.</li><br/>
            <li>From a Terminal in your Cloud Desktop, navigate to your Experiment package and activate the virtual environment by running <code style="color: lightcoral;">source .venv/bin/activate</code>.</li><br/>
            <li>From that same Terminal in your Cloud Desktop, install <code style="color: lightcoral;">pip-tools</code> by running <code style="color: lightcoral;">python -m pip install pip-tools</code>. Then regenerate the lock file <code style="color: lightcoral;">requirements.txt</code> by running <code style="color: lightcoral;">pip-compile --extra=dev</code>. This step might take a few minutes to complete.</li><br/>
            <li>Build the Experiment package again by running<code style="color: lightcoral;">brazil-build release</code>.</li><br/>
            <li>Restart the Kernel of this notebook and re-run all cells up to this point. The <code style="color: lightcoral;">boto3</code> version should now be <code style="color: lightcoral;">>=1.34.143</code>. You can then continue running the cells below.</li>
        </ul>
    </span>
</div>


### Find out the URL of your deployed service

To make a request to the deployed GenAI application, you first need to retrieve the live API endpoint URL. We will use [AWS Signature Version 4](https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html) to authenticate the request. Python package [requests-auth-aws-sigv4](https://pypi.org/project/requests-auth-aws-sigv4/) provides an authentication class to add AWS Signature Version 4 authentication information. 

In [2]:
# Install and import the library that support AWS SigV4 requests
!pip3 install -q requests-auth-aws-sigv4

import json
import requests
from requests_auth_aws_sigv4 import AWSSigV4

To retrieve the API endpoing URL, you need to look into the CloudFormation Stacks that BuilderHub Create has created when it cloned the GenAI application. Take a quick look at your [CloudFormation stacks](https://us-west-2.console.aws.amazon.com/cloudformation/home?region=us-west-2#/stacks?filteringText=&filteringStatus=active&viewNested=true) in your AWS account, and you will find several entries named after your cloned package. 

In the next cell you set variable `MAIN_PACKAGE_NAME` to the name of your cloned GenAI application, using the naming convention for this course. Then you make a request to the CloudFormation service to retrieve information about the stack that contains the API endpoint. That is the important piece of information that you will need to make requests in the next section.

In [3]:
# Start boto3 session using the credentials from your AWS Burner account
session = boto3.Session(profile_name="MLU-LLMOps-Burner")

# Retrieve your alias from your $USER variable and assemble the name of your cloned application
# Notice this will only work if you have adhered to the previously-specified naming convention for the cloned application
alias = %env USER
MAIN_PACKAGE_NAME = f"{alias.capitalize()}MLUCourseLLMOps"

# Set up authentication
aws_auth = AWSSigV4("cloudformation", region="us-west-2", session=session)

# URL to request information about a particular stack in CloudFormation
url = f"https://cloudformation.us-west-2.amazonaws.com?Action=DescribeStacks&StackName={MAIN_PACKAGE_NAME}-Service-alpha"

# Request response in JSON format
headers = {"Accept": "application/json"}

# Send request to CloudFormation
r = requests.request("GET", url, auth=aws_auth, headers=headers)

# Locate the relevant key in the response that contains the API URL
outputs = r.json()["DescribeStacksResponse"]["DescribeStacksResult"]["Stacks"][0]["Outputs"]
api_endpoint = [output for output in outputs if output["ExportName"]==f"{MAIN_PACKAGE_NAME}-ApiUrl"][0]["OutputValue"]
display(api_endpoint)

'https://denx1se8e0.execute-api.us-west-2.amazonaws.com/live/'

<div style="align: left; border: 4px solid lightcoral; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; margin: 15px;" src="images/MLU_question.png" alt="MLU question" width="100" height="100"/>
    <span style="padding: 20px; align: left;">
        <p><b>Troubleshooting</b><p/>
        <p>If the cell above throws an error, double check that your AWS credentials are valid. Remember that you're accessing AWS service via your AWS Burner account and the credentials are managed by profile <code>MLU-LLMOps-Burner</code> that was created with <code>ada</code>.</p>
        <br/>
    </span>
</div>

<div style="display: flex; align-items: center; justify-content: left; background-color:#330066; width:99%;"> 
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_robot.png" alt="MLU robot" width="100" height="100"/>
    <span style="color: white; padding-left: 10px; align: left; margin: 15px;">
        The same information about the API URL can be retrieved from the command line with <code style="color: lightcoral;">aws</code> in a Cloud Desktop terminal window.
    <br/><br/>
        Try from the Terminal in your Cloud Desktop! 
    <br/><br/>
        Execute the cell below to get the exact commands that you can run in the command line to access the value of your API endpoint URL.
    </span>
    <br/>
</div>

In [4]:
from IPython.display import Markdown, display

display(Markdown("### Copy this command and run it on your Cloud Desktop terminal to output your API endpoint"))

print(f"aws cloudformation describe-stacks \\\n --profile MLU-LLMOps-Burner \\\n --stack-name {MAIN_PACKAGE_NAME}-Service-alpha \\\n --query 'Stacks[0].Outputs[?ExportName==`{MAIN_PACKAGE_NAME}-ApiUrl`].OutputValue' \\\n --output text | cat ")

### Copy this command and run it on your Cloud Desktop terminal to output your API endpoint

aws cloudformation describe-stacks \
 --profile MLU-LLMOps-Burner \
 --stack-name KoachangMLUCourseLLMOps-Service-alpha \
 --query 'Stacks[0].Outputs[?ExportName==`KoachangMLUCourseLLMOps-ApiUrl`].OutputValue' \
 --output text | cat 


### Make a first request to the deployed API

You can now interact with your GenAI service and send questions to the LLM using Python library `awscurl`. 

In [5]:
# Import library to make API requests to the deployed service
from awscurl.awscurl import make_request

Let's send a first request to the system, along with a question about AWS Lambda, and observe how the LLM is able to return a correct answer.

In [6]:
question = "What is the maximum size of the ephemeral storage allowed by AWS Lambda?"

credentials = session.get_credentials()

response = make_request(
    uri=api_endpoint,
    headers=headers,
    method="POST",
    service="execute-api",
    data=json.dumps(
        {"question": question}
    ),
    region="us-west-2",
    access_key=credentials.access_key,
    secret_key=credentials.secret_key,
    security_token=credentials.token,
    data_binary=False,
)

display(Markdown(f"**{question}**"))
Markdown(response.json().get("answer", response))

**What is the maximum size of the ephemeral storage allowed by AWS Lambda?**

According to the context provided, the maximum size of the ephemeral storage (the /tmp directory) allowed by AWS Lambda is 10,240 MB (10 GB). The context states that you can configure the size of a function's /tmp directory in the Lambda console, and set a whole number value between 512 MB and 10,240 MB, in 1-MB increments.

Compare the LLM response with the actual answer from the [AWS Lambda documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-lambda-function-ephemeralstorage.html).

<div style="display: flex; align-items: center; justify-content: left; background-color:#330066; width:99%;"> 
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_robot.png" alt="MLU robot" width="100" height="100"/>
    <span style="color: white; padding-left: 10px; align: left; margin: 15px;">
        The same response can be retrieved from the command line in the Terminal with <code style="color: lightcoral;">awscurl</code> in a Cloud Desktop terminal window.
    <br/><br/>
    Try from the Terminal in your Cloud Desktop! 
    <br/><br/>
    You can execute the next cell to get the exact command that you can run in the command line to send a request to the live system. 
    </span>
</div>

In [7]:
display(Markdown("### Copy these commands and run them on your Cloud Desktop terminal to send a question to the system"))

print("aws_json=$(ada credentials print --profile MLU-LLMOps-Burner) export AWS_ACCESS_KEY_ID=$(echo $aws_json | jq -r .AccessKeyId) AWS_SECRET_ACCESS_KEY=$(echo $aws_json | jq -r .SecretAccessKey) AWS_SESSION_TOKEN=$(echo $aws_json | jq -r .SessionToken) ")

question_json = json.dumps({"question": "What is the maximum size of the ephemeral storage allowed by AWS Lambda?"})

print(f"awscurl {api_endpoint} \\\n --region us-west-2 \\\n --service execute-api \\\n -X POST \\\n --access_key $AWS_ACCESS_KEY_ID \\\n --secret_key $AWS_SECRET_ACCESS_KEY \\\n --security_token $AWS_SESSION_TOKEN \\\n -d '{question_json}' ")

### Copy these commands and run them on your Cloud Desktop terminal to send a question to the system

aws_json=$(ada credentials print --profile MLU-LLMOps-Burner) export AWS_ACCESS_KEY_ID=$(echo $aws_json | jq -r .AccessKeyId) AWS_SECRET_ACCESS_KEY=$(echo $aws_json | jq -r .SecretAccessKey) AWS_SESSION_TOKEN=$(echo $aws_json | jq -r .SessionToken) 
awscurl https://denx1se8e0.execute-api.us-west-2.amazonaws.com/live/ \
 --region us-west-2 \
 --service execute-api \
 -X POST \
 --access_key $AWS_ACCESS_KEY_ID \
 --secret_key $AWS_SECRET_ACCESS_KEY \
 --security_token $AWS_SESSION_TOKEN \
 -d '{"question": "What is the maximum size of the ephemeral storage allowed by AWS Lambda?"}' 


Next, let's define a function to call the live GenAI service with any question. This function will return a generic error string if the request comes back from Lambda with an error message, i.e. a status code other than 200.

In [9]:
def make_live_request(question):
    credentials = session.get_credentials() 
    response = make_request(
        uri=api_endpoint,
        headers=headers,
        method="POST",
        service="execute-api",
        data=json.dumps({"question": question}),
        region="us-west-2",
        access_key=credentials.access_key,
        secret_key=credentials.secret_key,
        security_token=credentials.token,
        data_binary=False,
    )
    if response.status_code != 200:
        return "Request failed. Check the CloudWatch logs in your Lambda application."
    else:
        return response.json().get("answer")

### Exercise 1


<div style="align: left; border: 4px solid cornflowerblue; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_challenge.png" alt="MLU challenge" width=15% height=15%/>
    <span style="padding: 20px; align: left;">
        <p><b>Try it yourself!</b><p/>
        <p><b>Exercise 1.</b> It is now your turn to <b>interact with the deployed system</b>:</p>
            <ol>
                <li>Try asking other questions about <b>AWS Lambda</b>. Are the answers correct, to the best of your knowledge?</li>
                <li>Ask questions about <b>other AWS systems</b>. How good are the provided responses?</li>
                <li>Inquire about general <b>topics unrelated to AWS</b>. Is the system able to generate correct answers?</li>
            </ol>
        <br/>
    </span>
</div>

In [10]:
############## CODE HERE ####################

make_live_request("When was Amazon Bedrock created?")


############# END OF CODE ###################

"I don't know when Amazon Bedrock was created. The information provided does not contain any details about Amazon Bedrock."

In [11]:
make_live_request("When was Elon Musk born?")

'Request failed. Check the CloudWatch logs in your Lambda application.'

<div style="display: flex; align-items: center; justify-content: left; background-color:#330066; width:99%;"> 
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_robot.png" alt="MLU robot" width="100" height="100"/>
    <span style="color: white; padding-left: 10px; align: left; margin: 15px;">
        If you get <code style="color: lightcoral;">Request failed</code> responses, your deployed AWS Lambda function has encountered an error. 
    <br/><br/>
        You can inspect the errors from your live invocations to AWS Lambda in CloudWatch.<br/>
        Run the cell below to get a direct link to CloudWatch Log Groups for your deployed application.<br/>
        You need to be logged into your Burner account to access them.
    <br/><br/>Locate the latest log stream and inspect the error messages. Can you troubleshoot their cause?
    </span>
</div>


In [12]:
display(Markdown("### Go to the following link to access CloudWatch and inspect potential errors from your application."))

display(
    f"https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logsV2:log-groups$3FlogGroupNameFilter$3D{MAIN_PACKAGE_NAME}-Service-alpha-{MAIN_PACKAGE_NAME}"
)

### Go to the following link to access CloudWatch and inspect potential errors from your application.

'https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logsV2:log-groups$3FlogGroupNameFilter$3DKoachangMLUCourseLLMOps-Service-alpha-KoachangMLUCourseLLMOps'

<div style="align: left; border: 4px solid lightcoral; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_question.png" alt="MLU solution" width=15% height=15%/>
    <span style="padding: 20px; align: left;">
        <p><b>Challenge Help</b><p/>
        <p><b>Exercise 1.</b> Below we provide you with example questions to interact with the deployed service.</p>
        <p>Remove the <code>#</code> before the <code>load</code> instruction in the next code cell to display the sample solutions.</p>
        <p>You can then re-run the cell to see its output.</p>
        <br/>
    </span>
</div>

In [33]:
# %load solutions/lab1_ex1_solutions.txt

# These questions about Lambda are answered correctly
lambda_questions = [
    "How is AWS Lambda different from EC2?",
    "What happens to an AWS Lambda function if a Lambda layer is deleted?",
    "What architectures does Lambda support?"
]

# Most of these non-Lambda questions about AWS services lead to errors
aws_questions = [
    "What built-in algorithms are supported in SageMaker?",
    "What languages are covered by Amazon Translate?",
    "What file types does Amazon Kendra support?",
]

# These questions unrelated to AWS lead to errors
unrelated_questions = [
    "Which movie won an Oscar in 1995?",
    "What are the differences between alligators and crocodiles?",
    "What's the capital of Island?",
]

display(Markdown("### Questions about AWS Lambda"))
for lambda_q in lambda_questions:
    display(Markdown(f"**{lambda_q}**"))
    display(Markdown(make_live_request(lambda_q)))

display(Markdown("### Questions about AWS Services"))
for aws_q in aws_questions:
    display(Markdown(f"**{aws_q}**"))
    display(Markdown(make_live_request(aws_q)))

display(Markdown("### Questions unrelated to AWS Services"))
for unrelated_q in unrelated_questions:
    display(Markdown(f"**{unrelated_q}**"))
    display(Markdown(make_live_request(unrelated_q)))

### Questions about AWS Lambda

**How is AWS Lambda different from EC2?**

The key differences between AWS Lambda and Amazon EC2 are:

1. Compute management: With AWS Lambda, you don't need to manage any compute infrastructure. Lambda handles the provisioning, scaling, and management of the compute resources. In contrast, with Amazon EC2, you are responsible for provisioning, managing, and scaling the EC2 instances yourself.

2. Pricing model: Lambda charges based on the number of requests and the compute time required to execute your code, whereas EC2 charges based on the duration you run the instances and the instance type.

3. Scalability: Lambda automatically scales up or down based on the incoming workload, while with EC2 you need to manually scale the instances to handle increased demand.

4. Programming model: Lambda functions are event-driven and execute in response to specific events or triggers, while EC2 instances run continuously and you need to manage the application lifecycle.

5. Customization: With EC2, you have full control over the operating system, software stack, and configurations, whereas with Lambda, you are limited to the provided runtimes and cannot customize the underlying operating system.

In summary, Lambda is a serverless compute service that abstracts away the infrastructure management, while EC2 provides more control over the underlying compute resources but requires more operational overhead.

**What happens to an AWS Lambda function if a Lambda layer is deleted?**

If a Lambda layer that is being used by an AWS Lambda function is deleted, the following happens:

- The Lambda function can continue to use the deleted layer version. The function will continue to run as if the layer version still exists.
- However, you cannot create a new function that uses the deleted layer version. Any attempt to add the deleted layer version to a new function will fail.
- If you need to update the layers used by the function, you will have to remove the reference to the deleted layer version and add a new layer version instead.

**What architectures does Lambda support?**

According to the context provided, Lambda supports multiple instruction set architectures. Specifically, the context states that "Lambda provides base images for each of the instruction set architectures and Lambda also provides base images that support both architectures." However, the context also notes that "the image you build for your function must target only one of the architectures. Lambda does not support functions that use multi-architecture container images."

### Questions about AWS Services

**What built-in algorithms are supported in SageMaker?**

Request failed. Check the CloudWatch logs in your Lambda application.

**What languages are covered by Amazon Translate?**

I don't have enough information to answer the question about the languages covered by Amazon Translate. The context provided is about AWS Lambda and does not mention anything about Amazon Translate or the languages it supports.

**What file types does Amazon Kendra support?**

Request failed. Check the CloudWatch logs in your Lambda application.

### Questions unrelated to AWS Services

**Which movie won an Oscar in 1995?**

Request failed. Check the CloudWatch logs in your Lambda application.

**What are the differences between alligators and crocodiles?**

Request failed. Check the CloudWatch logs in your Lambda application.

**What's the capital of Island?**

Request failed. Check the CloudWatch logs in your Lambda application.

***

###### 2
## <a> Part 2 - Explore LLM capabilities via prompting in Amazon Bedrock</a>
([Go to top](#0))

Let us now take a closer look at some of the foundational components and techniques that are being used in the GenAI application before extending it any further. In this section we will recreate the logic contained in the main code package and will experiment with modifications to it. 

Take a minute to inspect the code inside your `{Alias}MLUCourseLLMOps/src/{alias}_mlu_course_llm_ops/handler.py`. That contains the core logic to invoke the LLM that powers the GenAI application. 

You can quickly find the source code for that file in the link printed when you execute the next cell. 

In [14]:
display(Markdown("### Click here to see the source code for the `handler.py` file"))

print(f"https://code.amazon.com/packages/{MAIN_PACKAGE_NAME}/blobs/mainline/--/src/{alias}_mlu_course_llm_ops/handler.py")

### Click here to see the source code for the `handler.py` file

https://code.amazon.com/packages/KoachangMLUCourseLLMOps/blobs/mainline/--/src/koachang_mlu_course_llm_ops/handler.py


### Simulating a scientist environment
As mentioned in Module 1, __LLMOps foundations__, scientist roles typically work on environments that allow them to run their experiments comfortably, rather than operating directly in the systems that implement the solution in production.

The next cells exemplify this idea by replicating code that is present in the Lambda function and breaking it out into pieces in this notebook, which simulates what might happen in an LLMOps process.
 
A typical process would be for scientist roles to experiment and test in the notebook. When the model is ready, an SDE role would move this code into production Lambda functions and trigger the pipeline to create the new version. Ideally, a set of tests exist that any model change needs to pass.

__Note__: for smaller teams, it is common that the same person executes both tasks, SDE an scientist, if they have the required skills for both.

First, import the `langchain` library that you will use to simplify the calls to the Bedrock service.

In [15]:
from langchain_aws import ChatBedrock
from langchain.output_parsers.regex import RegexParser
from langchain_core.prompts import PromptTemplate
from langchain.tools import tool

**RAG with Amazon Kendra**

The next cell refers to the RAG components of the application, built on Amazon Kendra. 

We get the Kendra index that is needed to retrieve  documents, then define a function that, given a query, returns the list of stored documents that are relevant to it. 

We will explore and extend this part of the system in **Lab 2: Enhancing the capabilities of the RAG system**.

In [31]:
kendra = session.client("kendra", region_name="us-west-2")
response = kendra.list_indices()
KENDRA_INDEX_ID = response["IndexConfigurationSummaryItems"][0]["Id"]


@tool
def retrieve_context(query: str) -> dict:
    """Retrieve the list of documents from Kendra that are relevant to the query"""
    response = kendra.retrieve(
        IndexId=KENDRA_INDEX_ID,
        QueryText=query,
        PageNumber=1,
        PageSize=5,
    )
    documents = response["ResultItems"]
    if(documents == []):
        return {
            "question": query,
            "context": "No relevant documents found",
        }
    context = "\n".join([document["Content"] for document in documents])
    return {
        "question": query,
        "context": context,
    }

**Bedrock Guardrails for increased security**

Next, let's see how the GenAI application incorporates [Bedrock Guardrails](https://aws.amazon.com/bedrock/guardrails/) to implement safeguards that ensure the application complies with responsible AI policies. 

During the cloning of the GenAI application, a Bedrock guardrail was already created in your AWS Burner account. The AWS Lambda handler includes the guardrail to pre- and post-process all requests to the underlying LLM via the following code:


In [17]:
bedrock_runtime = session.client("bedrock-runtime", region_name="us-west-2")
bedrock = session.client("bedrock", region_name="us-west-2")

response = bedrock.list_guardrails()
GUARDRAIL_ID = response["guardrails"][0]["id"]
GUARDRAIL_VERSION = response["guardrails"][0]["version"]

@tool(infer_schema=False)
def guardrail(content: str) -> str:
    """Guard the content with Bedrock Guardrail. If the content is not flagged by Guardrail,
    forward it to the next tool in chain.
    """
    result = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=GUARDRAIL_ID,
        guardrailVersion=GUARDRAIL_VERSION,
        source="INPUT",
        content=[
            {
                "text": {
                    "text": content,
                    "qualifiers": [
                        "guard_content",
                    ],
                }
            },
        ],
    )
    if result["action"] != "NONE":
        print(f"Guardrail ({GUARDRAIL_ID}) intervened ({result['ResponseMetadata']['RequestId']})")
        raise Exception("Content was blocked by guardrail")

    return content

If you want to see which configuration was used to create the available Bedrock Guardrail, run the following code cell:

In [18]:
bedrock.get_guardrail(guardrailIdentifier=GUARDRAIL_ID)

{'ResponseMetadata': {'RequestId': '0bfc6209-074d-4a47-aed9-dbc0424da990',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Sat, 17 Aug 2024 23:53:01 GMT',
   'content-type': 'application/json',
   'content-length': '763',
   'connection': 'keep-alive',
   'x-amzn-requestid': '0bfc6209-074d-4a47-aed9-dbc0424da990'},
  'RetryAttempts': 0},
 'name': 'KoachangMLUCourseLLMOpsGuardrail',
 'guardrailId': 'xn0mzel53o7e',
 'guardrailArn': 'arn:aws:bedrock:us-west-2:961341554577:guardrail/xn0mzel53o7e',
 'version': 'DRAFT',
 'status': 'READY',
 'contentPolicy': {'filters': [{'type': 'PROMPT_ATTACK',
    'inputStrength': 'HIGH',
    'outputStrength': 'NONE'}]},
 'sensitiveInformationPolicy': {'piiEntities': [{'type': 'AWS_ACCESS_KEY',
    'action': 'BLOCK'},
   {'type': 'AWS_SECRET_KEY', 'action': 'BLOCK'}],
  'regexes': []},
 'createdAt': datetime.datetime(2024, 8, 12, 20, 26, 4, tzinfo=tzlocal()),
 'updatedAt': datetime.datetime(2024, 8, 12, 20, 26, 7, 945575, tzinfo=tzlocal()),
 'statusRea

Below you can see an example of an input that would be blocked by the GenAI application via the Bedrock Guardrail:

In [19]:
text_with_sensitive_pii = "Here's auth info for an AWS account: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY."
guardrail(text_with_sensitive_pii)

  warn_deprecated(


Guardrail (xn0mzel53o7e) intervened (08678d1f-3b6b-427f-8dfc-47ea6397f644)


Exception: Content was blocked by guardrail

### Amazon Bedrock to access multiple foundation models behind a common API

With [Amazon Bedrock](https://aws.amazon.com/bedrock/), you can access generative AI foundation models from AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon via a single API. 

Although Bedrock models can be invoked via the [boto3 `invokeModel` API](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-invoke.html), AWS integrations are available in the [langchain_aws](https://python.langchain.com/v0.2/docs/integrations/platforms/aws/) library that simplify the development of LLM-powered applications built with Amazon Bedrock. 

The code below initializes a Runnable interface to a Bedrock model. Your initial deployed GenAI application is powered by [Claude 3 Haiku](https://www.anthropic.com/news/claude-3-haiku), which is Anthropic's fastest and most affordable model in its class. 

Notice that the initialization sets some [inference parameters](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html) that influence the model's response. Read section [Use inference parameters](https://docs.aws.amazon.com/bedrock/latest/userguide/general-guidelines-for-bedrock-users.html#use-inference-parameters) to learn more about the effect of some commonly used parameters such as `temperature`, `top_k`, `top_p`, maximum new tokens, and end sequence.


In [20]:
llm_claude_haiku = ChatBedrock(
    model_id="anthropic.claude-3-haiku-20240307-v1:0",
    credentials_profile_name="MLU-LLMOps-Burner",
    client=bedrock_runtime,
    model_kwargs={
        "max_tokens": 500,
        "temperature": 0.0,
        "top_k": 10,
        "top_p": 1.0,
    },
    cache=False,
)

We will now copy the same prompt that the deployed application is using and can be found in the `handler.py` file on the main code package. Notice how the prompt contains placeholders for the `context` that will be retrieved from Kendra and the `question` from the user. 

The prompt also instructs the model to use the contextual information and to output its `answer` within proper XML tags, which is useful when parsing the response using LangChain's `RegexParser` utility.

In [27]:
prompt = PromptTemplate.from_template(
    """You act as an AWS Cloud Practitioner and only answer questions about AWS. Read the user's
question supplied within the <question> tags. Then, use the contextual information provided
above within the <context> tags to provide an answer in <answer> tag. Do not repeat the context.
Respond that you don't know if you don't have enough information to answer
<context>
{context}
</context>

<question>
{question}
</question>"""
)

parser = RegexParser(regex=r"(?s)<answer>(.*)</answer>", output_keys=["answer"])

Using [Langchain's syntax](https://python.langchain.com/docs/expression_language/get_started#basic-example-prompt-model-output-parser) it's easy to define a basic chain that sends the above prompt to the LLM and parses its response. 

In the code below we piece together all components into a single chain using the `|` symbol, akin to a Unix pipe operator. 

In our chain the following steps are concatenated: 
 1. The user input is first passed to the Bedrock Guardrail, that executes an `action` that might intercept, or not, the request, according to the defined configuration. 
 2. Next, the request is used to retrieve relevant context from the Kendra index, which will be used to enhance the LLM response using RAG. 
 3. Both user request and retrieved context are passed into the prompt template to generate the response from the LLM. 
 4. The generated model output is passed to the output parser. The output `parser` is defined as a regex that searches for text inside `<answer></answer>` tags. Notice that in this implementation, if the model's answer is not wrapped within `<answer></answer>` tags, the regex parser won't find any matches.
 5. Finally, the parsed output is passed to the Bedrock Guardrail for potential flagging of the generated LLM response before reaching the user. 

In [28]:
chain_claude_haiku = guardrail | retrieve_context | prompt | llm_claude_haiku | parser | guardrail

To execute the chain, we will use the `invoke` method, passing any `question` from the user as input parameter. 

Below we define a function that takes a user question and a LangChain chain as input. The chain is invoked with the question and the answer is returned.

In [29]:
def generate_answer(question, chain):
    answer = chain.invoke(question)
    return answer.strip()

### Exercise 2


<div style="align: left; border: 4px solid cornflowerblue; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_challenge.png" alt="MLU challenge" width=15% height=15%/>
    <span style="padding: 20px; align: left;">
        <p><b>Try it yourself!</b><p/>
        <p><b>Exercise 2.</b> It is time to <b>experiment with Bedrock invocation</b>:</p>
            <ul>
                <li>Repeat some of the questions that you asked to the live system in <a href="#1">Part 1</a>, but now invoking Bedrock directly with the <code>generate_answer</code> function.</li>
                <li>Do the answers coincide with those from calling the deployed API endpoint? If not, why?</li>
            </ul>
        <br/>
    </span>
</div>


In [32]:
############## CODE HERE ####################


question = "When was elon musk born?"
display(Markdown(generate_answer(question, chain_claude_haiku)))


############# END OF CODE ###################

I don't have enough information to answer when Elon Musk was born. The context provided is about AWS IAM roles and does not contain any information about Elon Musk's birth date.

<div style="display: flex; align-items: center; justify-content: left; background-color:#330066; width:99%;"> 
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_robot.png" alt="MLU robot" width="100" height="100"/>
    <span style="color: white; padding-left: 10px; align: left; margin: 15px;">
        Setting temperature equal to zero does not guarantee deterministic response. 
    <br/><br/>
        T=0 minimizes randomness in the model answer by choosing at every generation step the single most likely next token. However, it is believed that non determinism in GPU calculations around floating point operations might lead to divergent generations on occasion.
    <br/><br/>If you're seeing discrepant responses between the live API endpoint and the <code style="color: lightcoral;">generate_answer</code> function, that might be the reason.
    </span>
</div>


<div style="align: left; border: 4px solid lightcoral; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_question.png" alt="MLU solution" width=15% height=15%/>
    <span style="padding: 20px; align: left;">
        <p><b>Challenge Help</b><p/>
        <p><b>Exercise 2.</b> Below we provide you with example questions to experiment with Bedrock invocation.</p>
        <p>Remove the <code>#</code> before the <code>load</code> instruction in the next code cell to display the sample solutions.</p>
        <p>You can then re-run the cell to see its output.</p>
        <br/>
    </span>
</div>

In [25]:
# %load solutions/lab1_ex2_solutions.txt

# Notice how the answers to this question are essentially the same 
# regardless whether we send a request to the API endpoint or invoke Bedrock from this notebook
question = "What happens to an AWS Lambda function if a Lambda layer is deleted?"
display(Markdown(f"**{question}**"))

display(Markdown("#### API endpoint answer"))
display(Markdown(make_live_request(question)))

display(Markdown("#### Direct invocation"))
display(Markdown(generate_answer(question, chain_claude_haiku)))

# Notice how the request fails when asking the following question
question = "What file types does Amazon Kendra support?"
display(Markdown(f"**{question}**"))

display(Markdown("#### API endpoint answer"))
display(Markdown(make_live_request(question)))

display(Markdown("#### Direct invocation"))
display(Markdown(generate_answer(question, chain_claude_haiku)))

**What happens to an AWS Lambda function if a Lambda layer is deleted?**

#### API endpoint answer

If a Lambda layer that is being used by an AWS Lambda function is deleted, the following happens:

- The Lambda function can continue to use the deleted layer version. The function will still be able to access and use the layer version that it was previously configured with.
- However, you cannot create a new function that uses the deleted layer version. Any new functions or updates to the existing function's layer configuration will not be able to reference the deleted layer version.
- If you need to update the layers used by the function, you will need to specify a different, non-deleted layer version. The function will continue to work as long as it can access at least one of the configured layer versions.

#### Direct invocation

If a Lambda layer that is being used by an AWS Lambda function is deleted, the following happens:

- The Lambda function can continue to use the deleted layer version. The function will continue to run as if the layer version still exists.
- However, you cannot create a new function that uses the deleted layer version. Any attempt to add the deleted layer version to a new function will fail.
- If you need to update the layers used by the function, you will need to remove the reference to the deleted layer version and add a new version of the layer or a different layer.

**What file types does Amazon Kendra support?**

#### API endpoint answer

Request failed. Check the CloudWatch logs in your Lambda application.

#### Direct invocation

ValueError: Could not parse output: I don't have enough information to answer the question about what file types Amazon Kendra supports. The context provided is about AWS IAM policies and access control, and does not contain any information about Amazon Kendra or the file types it supports.

The following solutions file provides more insight to diagnose the root problem that yields the application error.

In [26]:
# %load solutions/lab1_ex2_solutions2.txt

# Analyze the behaviour of the system for last question versus the first question
question_1 = "What happens to an AWS Lambda function if a Lambda layer is deleted?"
question_2 = "What file types does Amazon Kendra support?"

# Notice what comes out of the LLM invocation for both questions:
display(Markdown("**Output of LLM for first question**"))
chain = guardrail | retrieve_context | prompt | llm_claude_haiku
print(chain.invoke(question_1))

display(Markdown("**Output of LLM for second question**"))
chain = guardrail | retrieve_context | prompt | llm_claude_haiku
print(chain.invoke(question_2))

# Notice what happens when the output of the LLM goes through the parser
display(Markdown("**Output of parser for second question**"))
chain = guardrail | retrieve_context | prompt | llm_claude_haiku | parser
print(chain.invoke(question_2))


**Output of LLM for first question**

content='<answer>\nIf a Lambda layer that is being used by an AWS Lambda function is deleted, the following happens:\n\n- The Lambda function can continue to use the deleted layer version. The function will continue to run as if the layer version still exists.\n- However, you cannot create a new function that uses the deleted layer version. Any attempt to add the deleted layer version to a new function will fail.\n- If you need to update the layers used by the function, you will have to remove the reference to the deleted layer version and add a new layer version instead.\n</answer>' additional_kwargs={'usage': {'prompt_tokens': 1380, 'completion_tokens': 125, 'total_tokens': 1505}, 'stop_reason': 'end_turn', 'model_id': 'anthropic.claude-3-haiku-20240307-v1:0'} response_metadata={'usage': {'prompt_tokens': 1380, 'completion_tokens': 125, 'total_tokens': 1505}, 'stop_reason': 'end_turn', 'model_id': 'anthropic.claude-3-haiku-20240307-v1:0'} id='run-ecf9bed7-e9fc-4b69-b53b-120cd7b3e777-

**Output of LLM for second question**

content="I don't have enough information to answer what file types Amazon Kendra supports. The context provided is about AWS IAM policies and access control, and does not contain any information about Amazon Kendra or the file types it supports." additional_kwargs={'usage': {'prompt_tokens': 1614, 'completion_tokens': 51, 'total_tokens': 1665}, 'stop_reason': 'end_turn', 'model_id': 'anthropic.claude-3-haiku-20240307-v1:0'} response_metadata={'usage': {'prompt_tokens': 1614, 'completion_tokens': 51, 'total_tokens': 1665}, 'stop_reason': 'end_turn', 'model_id': 'anthropic.claude-3-haiku-20240307-v1:0'} id='run-33cdf605-9aa5-4655-b046-99b0447b5876-0' usage_metadata={'input_tokens': 1614, 'output_tokens': 51, 'total_tokens': 1665}


**Output of parser for second question**

ValueError: Could not parse output: I don't have enough information to answer what file types Amazon Kendra supports. The context provided is about AWS IAM policies and access control, and does not contain any information about Amazon Kendra or the file types it supports.

***

### Exercise 3


<div style="align: left; border: 4px solid cornflowerblue; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_challenge.png" alt="MLU challenge" width=15% height=15%/>
    <span style="padding: 20px; align: left;">
        <p><b>Try it yourself!</b><p/>
        <p><b>Exercise 3.</b> Now you can improve the way of invoking Bedrock models to try and fix errors returned by the application.</p>
        <p>For the questions that led to a <code>Request failed</code> before, can you think of any modifications to fix the problem?</p>
        Here're some ideas:
            <ul>
                <li><b>Modify the prompt</b> to force the correct behavior, providing examples if needed (few-shot prompting).</li>
                <li>Experiment with <b>other models available in Bedrock</b> such as Claude 3 or the Mistral AI models.</li>
                <li>Change the LangChain to <b>modify or remove the output parser</b>.</li>
            </ul>
        <br/>
    </span>
</div>


<div style="display: flex; align-items: center; justify-content: left; background-color:#330066; width:99%;"> 
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_robot.png" alt="MLU robot" width="100" height="100"/>
    <span style="color: white; padding-left: 10px; align: left; margin: 15px;">
        <b>LLM choice</b>
    <br/><br/>
        Your initially deployed GenAI application uses Claude 3 Haiku as a starting point and you can experiment with other models available in Amazon Bedrock. 
        <br/><br/>However, take into account cost and latency issues when choosing larger and/or more costly models. 
        <br/><br/>Your GenAI application deploys an endpoint to AWS Lambda and is thus subject to its timeout limits.
    </span>
</div>


In [None]:
############## CODE HERE ####################





############# END OF CODE ###################

<div style="align: left; border: 4px solid lightcoral; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_question.png" alt="MLU solution" width=15% height=15%/>
    <span style="padding: 20px; align: left;">
        <p><b>Challenge Help</b><p/>
        <p><b>Exercise 3.</b> Below we provide you with example solutions to fix errors in the Bedrock invocation.</p>
        <p>Remove the <code>#</code> before the <code>load</code> instruction in the next code cell to display the sample solutions.</p>
        <p>You can then re-run the cell to see its output.</p>
        <br/>
    </span>
</div>

In [None]:
# %load solutions/lab1_ex3_solutions.txt

# These questions about Lambda are answered correctly
lambda_questions = [
    "How is AWS Lambda different from EC2?",
    "What happens to an AWS Lambda function if a Lambda layer is deleted?",
    "What architectures does Lambda support?"
]

# Most of these non-Lambda questions about AWS services lead to errors
aws_questions = [
    "What built-in algorithms are supported in SageMaker?",
    "What languages are covered by Amazon Translate?",
    "What file types does Amazon Kendra support?",
]

# These questions unrelated to AWS lead to errors
unrelated_questions = [
    "Which movie won an Oscar in 1995?",
    "What are the differences between alligators and crocodiles?",
    "What's the capital of Island?",
]

all_questions = lambda_questions[:2] + aws_questions[:2] + unrelated_questions[:2]

# This prompt for Claude 3 Haiku forces the model to follow a particular output format
prompt_enforce = PromptTemplate.from_template(
    """You act as a AWS Cloud Practitioner and only answer questions about AWS. Read the user's
question supplied within the <question></question> tags. Then, use the contextual information provided
above within the <context></context> tags to provide an answer. Do not repeat the context.
Respond that you don't know if you don't have enough information to answer.

Return your output in <answer></answer> tags as in this example:

<context>
Example context
</context>

<question>
Example question
</question>

<answer>
Example answer
</answer>

Below starts the real task:

<context>
{context}
</context>

<question>
{question}
</question>
""")

chain_claude_haiku_enforce = guardrail | retrieve_context | prompt_enforce | llm_claude_haiku | parser | guardrail

display(Markdown("## Direct invocation with Claude 3 Haiku and modified prompt"))

for question in all_questions:
    display(Markdown(f"**{question}**"))
    display(Markdown(generate_answer(question, chain_claude_haiku_enforce)))


# This calls Claude 3 Sonnet instead of Haiku, without changing the original application prompt

llm_claude_sonnet = ChatBedrock(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    credentials_profile_name="MLU-LLMOps-Burner",
    client=bedrock_runtime,
    model_kwargs={
        "max_tokens": 500,
        "temperature": 0.0,
        "top_k": 10,
        "top_p": 1.0,
    },
    cache=False,
)

chain_claude_sonnet = guardrail | retrieve_context | prompt | llm_claude_sonnet | parser | guardrail

display(Markdown("---"))
display(Markdown("## Direct invocation with Claude 3 Sonnet and original prompt"))

for question in all_questions:
    display(Markdown(f"**{question}**"))
    display(Markdown(generate_answer(question, chain_claude_sonnet)))


# This calls model Mistral AI with a Mistral-specific prompt and removes the output parser from the chain

from langchain_aws import BedrockLLM

llm_mistral = BedrockLLM(
    model_id="mistral.mistral-7b-instruct-v0:2",
    credentials_profile_name="MLU-LLMOps-Burner",
    client=bedrock_runtime,
    model_kwargs={
        "max_tokens": 500,
        "temperature": 0.0,
        "top_k": 10,
        "top_p": 1.0,
        "stop": ["</s>"]
    },
    cache=False,
)


prompt_mistral = PromptTemplate.from_template(
    """[INST]You act as a AWS Cloud Practitioner and only answer questions about AWS. Read the user's
question supplied within the <question> tags. Then, use the contextual information provided
above within the <context> tags to provide an answer. Do not repeat the context.
Respond that you don't know if you don't have enough information to answer.

<context>
{context}
</context>

<question>
{question}
</question>

[/INST]
"""
)

# This chain removes the output parser
chain_mistral = guardrail | retrieve_context | prompt_mistral | llm_mistral | guardrail

display(Markdown("---"))
display(Markdown("## Direct invocation with Mistral 7b and modified prompt"))

# Run only a subset of all questions to avoid Throttling error
for question in all_questions[2:]:
    display(Markdown(f"**{question}**"))
    display(Markdown(generate_answer(question, chain_mistral)))

###### 3
## <a>Part 3 - Implement changes to the system and deploy with CDK</a>
([Go to top](#0))

After experimenting with prompts, models, and other parameters, you might come up with an improved way to invoke the LLM in your system. Eventually, it is time to deploy those changes so that they're available via the API endpoint!

The ultimate goal is to update the CloudFormation template behind the deployed AWS Lambda service. A CloudFormation stack update is triggered by a call to the CloudFormation API with a new template and artifacts such as code assets or data.

There are serveral ways to trigger such CloudFormation API. Two common ways are:
- using Pipelines
- deploying locally via CDK CLI

Both have the same effect. Deploying via Pipelines is recommended for a real workload, as it commits lasting changes to the code repositories. Deploying using CDK is temporary and the deployment will eventually be overridden by a Pipelines deployment later, but can be useful for fast tests.

For this first change we will use local deployment via CDK CLI for faster turnaround. Read [How the Brazil CDK Pipelines Work](https://builderhub.corp.amazon.com/docs/pipelines/cdk-guide/concepts-cdk-pipelines.html) if you want to know more about CDK.

<div style="align: left; border: 4px solid cornflowerblue; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_challenge.png" alt="MLU challenge" width=15% height=15%/>
    <span style="padding: 20px; align: left;">
        <br/><p><b>Try it yourself!</b><p/>
        <p><b>Exercise 4.</b> After experimenting and testing changes to the LLM interaction in this playground notebook, you can now make changes to the live service by modifying the code that is run by the AWS Lambda function and using the CDK package to deploy the service stack.</p>
        <p>
        To do so, follow these steps: 
        <ol>
            <li>Using VS Code or a Terminal in your Cloud Desktop, navigate to folder <code>{Alias}MLUCourseLLMOps</code> and locate file <code>{Alias}MLUCourseLLMOps/src/{alias}_mlu_course_llm_ops/handler.py</code>.</li><br/>
            <li>Edit the file <code>{Alias}MLUCourseLLMOps/src/{alias}_mlu_course_llm_ops/handler.py</code> with your desired changes, for instance the <code>prompt</code> to the Bedrock model, the actual model, or the chain. This is where you <b>bring the results of your experimentation in the notebook to the actual production code</b>.</li><br/>
            <li>Save the edited file.</li><br/>
            <li>From a terminal in your Cloud Desktop, build the main application package. To do so, execute the cell below to get the exact command that you need to run in the command line to trigger the build.</li>
        </ol>
        </p>
        <br/>
    </span>
</div>

In [36]:
display(Markdown("### Copy these commands and run them on your Cloud Desktop terminal to build the main application package."))

print(f"cd $(brazil-context workspace root)/src/{MAIN_PACKAGE_NAME}")

print(f"brazil-build release")

### Copy these commands and run them on your Cloud Desktop terminal to build the main application package.

cd $(brazil-context workspace root)/src/KoachangMLUCourseLLMOps
brazil-build release


<div style="align: left; border: 4px solid cornflowerblue; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_challenge.png" alt="MLU challenge" width=15% height=15%/>
    <span style="padding: 20px; align: left;">
        <br/><br/><p>Continue with these steps:<p/>
        <ol start="5">
            <li>Switch to the CDK package and deploy the service stack. To do that, execute the cell below to get the exact command that you need run in the command line to perform the deployment.</li><br/>
        </ol>
        </p>
        <br/>
    </span>
</div>

In [37]:
display(Markdown("### Copy these commands and run them on your Cloud Desktop terminal to deploy the service stack."))

print(f"cd $(brazil-context workspace root)/src/{MAIN_PACKAGE_NAME}CDK")

print(f"brazil-build && brazil-build run cdk deploy {MAIN_PACKAGE_NAME}-Service-alpha -- --hotswap")

### Copy these commands and run them on your Cloud Desktop terminal to deploy the service stack.

cd $(brazil-context workspace root)/src/KoachangMLUCourseLLMOpsCDK
brazil-build && brazil-build run cdk deploy KoachangMLUCourseLLMOps-Service-alpha -- --hotswap


<div style="align: left; border: 4px solid cornflowerblue; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_challenge.png" alt="MLU challenge" width=15% height=15%/>
    <span style="padding: 20px; align: left;">
        <br/><p>Continue with these steps:<p/>
        <ol start="6">
            <li>Wait for the cdk deploy command to complete.</li><br/>
            <li><b>Now ask the same questions from Part 1. Does the behavior from the live system change?</b></li><br/>If done properly, your fix should now prevent any errors coming back from the live system.
        </ol>
        </p>
        <br/>
    </span>
</div>

In [None]:
############## CODE HERE ####################





############# END OF CODE ###################

<div style="align: left; border: 4px solid lightcoral; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_question.png" alt="MLU solution" width=15% height=15%/>
    <span style="padding: 20px; align: left;">
        <p><b>Challenge Help</b><p/>
        <p><b>Exercise 4.</b> Below we provide you with example solutions to call the current live service.</p>
        <p>Remove the <code>#</code> before the <code>load</code> instruction in the next code cell to display the sample solutions.</p>
        <p>You can then re-run the cell to see its output.</p>
        <br/>
    </span>
</div>

In [None]:
# %load solutions/lab1_ex4_solutions.txt

# These questions about Lambda are answered correctly
lambda_questions = [
    "How is AWS Lambda different from EC2?",
    "What happens to an AWS Lambda function if a Lambda layer is deleted?",
    "What architectures does Lambda support?"
]

# Most of these non-Lambda questions about AWS services lead to errors
aws_questions = [
    "What built-in algorithms are supported in SageMaker?",
    "What languages are covered by Amazon Translate?",
    "What file types does Amazon Kendra support?",
]

# These questions unrelated to AWS lead to errors
unrelated_questions = [
    "Which movie won an Oscar in 1995?",
    "What are the differences between alligators and crocodiles?",
    "What's the capital of Island?",
]

all_questions = lambda_questions + aws_questions + unrelated_questions

# This runs all questions from above against the current live service
for question in all_questions:
    display(Markdown(f"**{question}**"))
    display(Markdown(make_live_request(question)))


<div style="display: flex; align-items: center; justify-content: left; background-color:#330066; width:99%;"> 
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="images/MLU_robot.png" alt="MLU robot" width="100" height="100"/>
    <span style="color: white; padding-left: 10px; align: left; margin: 15px;">
        <h3>Congratulations!</h3>
        You have completed Lab 1 of MLU's course Operationalizing Generative AI with LLMOps.
        <br/>
    </span>
</div>