# Module 1, Activity 1: Running an AWS-Hosted LLM with LangChain

In this demo notebook, we demonstrate how to use the boto3 Python SDK along with the abstractions available through the LangChain package to work with Amazon Bedrock Foundation Models.

In [None]:
import boto3
from langchain_aws import BedrockLLM
from langchain_aws import ChatBedrock

In [None]:
session = boto3.session.Session()
region = session.region_name

## Create Bedrock management connection

The Bedrock client is used as the control for Bedrock.  It can do things like list models, check availability, and manage configurations.  But it doesn't actually do anything with the models.

In [None]:
bedrock = boto3.client(
    service_name='bedrock',
    region_name = region,
)

## Listing available models

Here we can see which foundation models Bedrock as access to.  However, remember that not all of these are active for this workshop.

In [None]:
[models['modelId'] for models in bedrock.list_foundation_models()['modelSummaries']]

## Creating Bedrock runtime client

In this next code block, a dedicated boto3 client for the bedrock-runtime service is created.  This client is responsible for executing runtime operations, such as invoking a model with a given prompt.

In [None]:
bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name=region,
)

## Initializing BedrockLLM and invoking a model

Here, the BedrockLLM class from the langchain_aws package is instantiated.  This class serves as a high-level wrapper to interface with AWS-hosted LLMs.
The initialization parameters include the model ID (in this case, "amazon.titan-tg1-large"), region, and the necessary AWS credentials.  Once the instance is created, the invoke method is used to send a prompt ("What is the recipe of mayonnaise?") to the model.  This section demonstrates the fundamental workflow: setting up the model wrapper and making a basic invocation call to test the model’s response, providing a concrete example of how to interact with AWS-hosted generative AI models using LangChain.

Try several different prompts here to see what different types of answers you can get!

In [None]:
llm = BedrockLLM(
    model_id="amazon.titan-tg1-large",
    region_name=region,
)

llm.invoke(input='What is the recipe of mayonnaise?')

## Introducing ChatBedrock

BedrockLLM is designed for single-turn, prompt-based interactions where you provide the prompt ("What is the recipe for mayonnaise?") and the model generates an output in one go.  This is fine for simple things, but when you need to have more sophisticated interactions you want something that supports chat-like exchanges where the model can manage context over several turns of dialogue.  Additionally, not all of the available models, including more sophisticated models like Anthropic's Claude 3 Sonnet below, are supported by BedrockLLM.  Hence, we have the more sophisticated ChatBedrock, as shown below.

Also note that the output of ChatBedrock contains much more information than just a text output.

In [None]:
chat_llm = ChatBedrock(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    region_name=region,
)

chat_llm.invoke(input="What is the recipe of mayonnaise?")

## Temperature

Temperature is the thing that gives models creativity.  It controls the randomness of the model's responses.  Setting it to 0.0 (the minimum) typically results in a more deterministic and consistent output while setting it to 1.0 (the maximum) results in more creative responses.  Experiment with the temperature setting in the following cell to see how the output changes as a result.

In [None]:
chat_llm = ChatBedrock(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    region_name=region,
    temperature=0.0
)

chat_llm.invoke(input="What is the recipe of mayonnaise?")

## Limiting the number of tokens returned

The cost of using an LLM is dependent on how many tokens are sent back and forth with the model.  The `max_tokens` parameter can provide a limit on how many total tokens are returned.  Limiting the token count can be useful when you need to ensure that the responses remain concise or when working within strict output size constraints.  Experiment with a few different values for this to see how the output changes.

In [None]:
chat_llm = ChatBedrock(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    region_name=region,
    temperature=0.0,
    max_tokens=10
)

chat_llm.invoke(input="What is the recipe of mayonnaise?")