# Build agentic workflows with Amazon Bedrock and open source frameworks - Introduction

The goal of this workshop is to provide in-depth examples on key concepts and frameworks for Retrieval Augmented Generation (RAG) and Agentic application. We introduce an example use case to serve as a backdrop for curated and prescriptive guidance for RAG and Agentic workflows including libraries and blueprints for some of the top trends in the market today.

## Overview

Through web-scale training, foundation models (FMs) are built to support a wide variety of tasks across a large body of general knowledge. Without being exposed to additional information or further fine-tuning, they suffer from a knowledge cutoff preventing them from reliably completing tasks requiring specific data not available at training time. Furthermore, their inability to call external functions limits their capacity to resolve complex tasks beyond ones that can be solved with their own internal body of knowledge.

In this notebook, we introduce the requirements that lead us to build our **Virtual Travel Agent**. We end by running some **course-grained model evaluation** across a subset of the models available in Amazon Bedrock.

### Requirements

In [23]:
!pip3 install langchain-aws --quiet

## Functional requirements

The purpose of the solution is to improve the experience for customers searching for their dream travel destination. To do this, a customer needs the ability to do the following:
- Rapidly get a sense a given destination with a representative description.
- Discover new destinations based on location, weather or other aspects that may be of interest.
- Book travel dates for a given destination ensuring it does not collide with their other travel.

Before diving deeper into the solution, we begin with some lite testing of the various models available in the `us-west-2` region.

We start by importing the boto3 client for the Bedrock Runtime.

In [2]:
import boto3
import os
from IPython.display import Markdown, display

region = 'us-east-1'#'us-west-2'
bedrock = boto3.client(
    service_name = 'bedrock-runtime',
    region_name = region,
)


bedrock_service = boto3.client(
    service_name='bedrock',
    region_name=region,
)
print(boto3.__version__)

1.35.29


#### Validate the connection

We can check the client works by trying out the `list_foundation_models()` method, which will tell us all the models available for us to use 

In [3]:
bedrock_service.list_foundation_models()

{'ResponseMetadata': {'RequestId': '86ffedd8-4db6-4224-b080-c4d14ccb10c6',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Mon, 28 Oct 2024 04:21:24 GMT',
   'content-type': 'application/json',
   'content-length': '31106',
   'connection': 'keep-alive',
   'x-amzn-requestid': '86ffedd8-4db6-4224-b080-c4d14ccb10c6'},
  'RetryAttempts': 0},
 'modelSummaries': [{'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-tg1-large',
   'modelId': 'amazon.titan-tg1-large',
   'modelName': 'Titan Text Large',
   'providerName': 'Amazon',
   'inputModalities': ['TEXT'],
   'outputModalities': ['TEXT'],
   'responseStreamingSupported': True,
   'customizationsSupported': [],
   'inferenceTypesSupported': ['ON_DEMAND'],
   'modelLifecycle': {'status': 'ACTIVE'}},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-image-generator-v1:0',
   'modelId': 'amazon.titan-image-generator-v1:0',
   'modelName': 'Titan Image Generator G1',
   'providerName': 'Amazon',

---

## Common inference parameter definitions

### Randomness and Diversity

Foundation models support the following parameters to control randomness and diversity in the 
response.

**Temperature** – Large language models use probability to construct the words in a sequence. For any 
given next word, there is a probability distribution of options for the next word in the sequence. When 
you set the temperature closer to zero, the model tends to select the higher-probability words. When 
you set the temperature further away from zero, the model may select a lower-probability word.

In technical terms, the temperature modulates the probability density function for the next tokens, 
implementing the temperature sampling technique. This parameter can deepen or flatten the density 
function curve. A lower value results in a steeper curve with more deterministic responses, and a higher 
value results in a flatter curve with more random responses.

**Top K** – Temperature defines the probability distribution of potential words, and Top K defines the cut 
off where the model no longer selects the words. For example, if K=50, the model selects from 50 of the 
most probable words that could be next in a given sequence. This reduces the probability that an unusual 
word gets selected next in a sequence.
In technical terms, Top K is the number of the highest-probability vocabulary tokens to keep for Top-
K-filtering - This limits the distribution of probable tokens, so the model chooses one of the highest-
probability tokens.

**Top P** – Top P defines a cut off based on the sum of probabilities of the potential choices. If you set Top 
P below 1.0, the model considers the most probable options and ignores less probable ones. Top P is 
similar to Top K, but instead of capping the number of choices, it caps choices based on the sum of their 
probabilities.
For the example prompt "I hear the hoof beats of ," you may want the model to provide "horses," 
"zebras" or "unicorns" as the next word. If you set the temperature to its maximum, without capping 
Top K or Top P, you increase the probability of getting unusual results such as "unicorns." If you set the 
temperature to 0, you increase the probability of "horses." If you set a high temperature and set Top K or 
Top P to the maximum, you increase the probability of "horses" or "zebras," and decrease the probability 
of "unicorns."

### Length

The following parameters control the length of the generated response.

**Response length** – Configures the minimum and maximum number of tokens to use in the generated 
response.

**Length penalty** – Length penalty optimizes the model to be more concise in its output by penalizing 
longer responses. Length penalty differs from response length as the response length is a hard cut off for 
the minimum or maximum response length.

In technical terms, the length penalty penalizes the model exponentially for lengthy responses. 0.0 
means no penalty. Set a value less than 0.0 for the model to generate longer sequences, or set a value 
greater than 0.0 for the model to produce shorter sequences.

### Repetitions

The following parameters help control repetition in the generated response.

**Repetition penalty (presence penalty)** – Prevents repetitions of the same words (tokens) in responses. 
1.0 means no penalty. Greater than 1.0 decreases repetition.

We use the `ChatBedrock` object part of `langchain-aws` to interact with the Bedrock service.

In [4]:
from langchain_aws.chat_models.bedrock import ChatBedrock

modelId = 'anthropic.claude-3-haiku-20240307-v1:0'
llm = ChatBedrock(
    model_id=modelId,
    client=bedrock,
    beta_use_converse_api=True
)
llm.invoke("Help me with my travel needs.").content

"Sure, I'd be happy to help with your travel needs. What kind of information or assistance are you looking for? Some areas I can potentially help with include:\n\n- Trip planning and itinerary suggestions\n- Booking flights, hotels, rental cars, etc.\n- Destination research and recommendations\n- Travel tips and advice\n- Budgeting and cost estimates\n- Visa/passport requirements\n- Packing lists and travel checklists\n- Travel insurance and healthcare considerations\n- Getting around at your destination (transportation options)\n- Navigating language/cultural differences\n\nPlease provide some more details about your specific travel needs and I'll do my best to assist you. Let me know the destination, trip duration, budget, interests, and any other relevant information."

### Converse API

In this notebook, we'll explore the basics of the Converse API in Amazon Bedrock. The Converse or ConverseStream API is a unified structured text API action that allows you simplifying the invocations to Bedrock LLMs, using a universal syntax and message structured prompts for any of the supported model providers.

To use the Converse API, you call the `Converse` or `ConverseStream` operations to send messages to a model. To call Converse, you require permission for the `bedrock:InvokeModel` operation. To call ConverseStream, you require permission for the `bedrock:InvokeModelWithResponseStream` operation.

<h3> Prerequisites </h3>

Before you can use Amazon Bedrock, you must carry out the following steps:

- Sign up for an AWS account (if you don't already have one) and IAM Role with the necessary permissions for Amazon Bedrock, see [AWS Account and IAM Role](https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started.html#new-to-aws).
- Request access to the foundation models (FM) that you want to use, see [Request access to FMs](https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started.html#getting-started-model-access). 
    
    We have used below Foundation Models in our examples in this Notebook in `us-west-2` (Oregon) region.
    
| Provider Name | Foundation Model Name | Model Id |
| ------- | ------------- | ------------- |
| Amazon | Titan Text G1 - Lite | amazon.titan-text-lite-v1 |
| Anthropic | Claude 3.5 Sonnet  | anthropic.claude-3-5-sonnet-20240620-v1:0 |
| Anthropic | Claude 3 Haiku  | anthropic.claude-3-haiku-20240307-v1:0 |
| Meta | Llama 3.2 3B Instruct | meta.llama3-1-8b-instruct-v1:0 |


<h3> ConverseStream for streaming invocations </h3>

We can also use the Converse API for streaming invocations. In this case we rely on the ConverseStream action.

In [5]:
import sys
def invoke_bedrock_model_stream(client, id, prompt, max_tokens=2000, temperature=0, top_p=0.9):
    response = ""
    response = client.converse_stream(
        modelId=id,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "text": prompt
                    }
                ]
            }
        ],
        inferenceConfig={
            "temperature": temperature,
            "maxTokens": max_tokens,
            "topP": top_p
        }
    )
    # Extract and print the response text in real-time.
    for event in response['stream']:
        if 'contentBlockDelta' in event:
            chunk = event['contentBlockDelta']
            sys.stdout.write(chunk['delta']['text'])
            sys.stdout.flush()
            
        # Log token usage.
    print("", flush=True)
    
def invoke_bedrock_model_converse(client, id, prompt, max_tokens=2000, temperature=0, top_p=0.9):
    response = ""
    response = client.converse(
        modelId=id,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "text": prompt
                    }
                ]
            }
        ],
        inferenceConfig={
            "temperature": temperature,
            "maxTokens": max_tokens,
            "topP": top_p
        }
    )
    #print(response)
    print(response['output']['message']['content'][0]['text'])
    token_usage = response['usage']
    print(f"Input tokens: {token_usage['inputTokens']}")
    print(f"Output tokens: {token_usage['outputTokens']}")
    print(f"Total tokens: {token_usage['totalTokens']}")
    print(f"Stop reason: {response['stopReason']}")
    return

In [6]:
prompt = ("Help me with my travel needs.")
print(f'Prompt: {prompt}\n')
MODEL_IDS = ['anthropic.claude-3-haiku-20240307-v1:0',]

for i in MODEL_IDS:
    print(f'\n\nModel: {i}')
    invoke_bedrock_model_stream(bedrock, i, prompt)
    #- just to demonstreate the tokens usage
    invoke_bedrock_model_converse(bedrock, i, prompt)

Prompt: Help me with my travel needs.



Model: anthropic.claude-3-haiku-20240307-v1:0
Sure, I'd be happy to help with your travel needs. What kind of information or assistance are you looking for? Some common travel-related topics I can help with include:

- Trip planning and itinerary suggestions
- Booking flights, hotels, rental cars, etc.
- Destination research and recommendations
- Travel tips and advice
- Budgeting and cost estimates
- Travel documentation requirements
- Packing lists and travel checklists
- Local transportation options
- Travel insurance and health/safety information

Let me know the specifics of what you need help with, and I'll do my best to provide useful information and guidance. I have access to a wide range of travel-related data and resources, so I should be able to assist with most travel planning and logistics questions.
Sure, I'd be happy to help with your travel needs. What kind of information or assistance are you looking for? Some common travel-rela

## Introduction to the use case

The use -case we are going to try and model is Travel Agents -  We will model some of the workflows like these
- External Users workflow — 
- Hotels flow, Airlines Flow, Car booking flows , 
- Cancel Booking, 
- Update Booking, 
- Notification to the user for updates , 
- Any external events / news items impact — notifications for existing bookings

![Use-case](./assets/intro_to_usecase.png)

This shows multiple agents working together to provide a single pane of entry for the users

To perform an initial evaluation, let us simply prompt the model to see the response. It will be clear that out of the box models cannot work on this use case. The model seems to indicate and need more context and information


In [7]:
prompt = ("Help me with my travel needs. can you find a dream destination near me for skiing ?")
print(f'Prompt: {prompt}\n')
MODEL_IDS = ['anthropic.claude-3-haiku-20240307-v1:0',]

invoke_bedrock_model_stream(bedrock, MODEL_IDS[0], prompt)
    #- just to demonstreate the tokens usage


Prompt: Help me with my travel needs. can you find a dream destination near me for skiing ?

Certainly! To find a dream skiing destination near you, I'll need to know your location first. Could you please provide me with your city and state or zip code? That will help me identify the best skiing options in your local area or region.


## Conclusion

In this notebook, we demonstrated simple interactions between Langchain and Bedrock. We tailored model outputs by suppliying it with a system prompt and few-shots, which both help guide behavior. 

Now we need to add RAG and other details to this system -- let us build !!