# Amazon Bedrock

Bedrock model inferencing using boto3 & langchain

After some tinkling, you do not need "Provisioned throughput". As long as the model access is enabled and the right IAM policy is in place, inferencing via AWS credentials is straight forward.

I will update again when I got a clear picture how much I will be charged. The early "Provisioned Throughput" do cost a bump. :(

### Pre-requisites

#### AWS Security Credentials

**User Security Token**
- Ensure your user security `Access Key` and `Secret Key ID` is enabled in AWS Security & Credential console
- Ensure the these keys are setup in respective enivironment (colab/kaggle/vscode via .env)

**IAM Policy**

Ensure the following policy is added to the user's IAM Role for model inferencing.

[[Reference](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-prereq.html)]

```python
IAM_POLICY="""
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ModelInvocationPermissions",
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream",
                "bedrock:GetInferenceProfile",
                "bedrock:ListInferenceProfiles",
                "bedrock:RenderPrompt",
                "bedrock:GetCustomModel",
                "bedrock:ListCustomModels",
                "bedrock:GetImportedModel",
                "bedrock:ListImportedModels",
                "bedrock:GetProvisionedModelThroughput",
                "bedrock:ListProvisionedModelThroughputs",
                "bedrock:GetGuardrail",
                "bedrock:ListGuardrails",
                "bedrock:ApplyGuardrail"
            ],
            "Resource": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3*"
        }
    ]
}
"""
```

**Bedrock model access** to desire model(s) are setup

#### "Provisioned throughput

Provisioned Throughput to provision is for a higher level of throughput for a model at a fixed cost. **Customized model** will require to purchase Provisioned Throughput to be able to use it.

> very important: remember to <u>delete immediately</u> after use (just in case)
> https://aws.amazon.com/bedrock/pricing/

[supported models](https://docs.aws.amazon.com/bedrock/latest/userguide/batch-inference-supported.html) by region in AWS




### Libraries Setup

In [42]:
!pip install -q langchain boto3 langchain-aws anthropic

In [9]:
import os
from google.colab import userdata

os.environ['AWS_ACCESS_KEY_ID'] = userdata.get('AWS_ACCESS_KEY_ID')
os.environ['AWS_SECRET_ACCESS_KEY'] = userdata.get('AWS_SECRET_ACCESS_KEY')
os.environ['AWS_REGION'] = userdata.get('AWS_REGION')
os.environ['AWS_ACCOUNT_ID'] = userdata.get('AWS_ACCOUNT_ID')

### Using boto3

In [25]:
import boto3
import json

bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-east-1')

prompt = "What is the capital of France?"

# original intended for claude 3.5 haiku but this model is available for "Provisioned throughput"
# kwargs = {
#   "modelId": "anthropic.claude-3-5-haiku-20241022-v1:0",
#   "contentType": "application/json",
#   "accept": "application/json",
#   "body": json.dumps({
#     "anthropic_version": "bedrock-2023-05-31",
#     "max_tokens": 200,
#     "top_k": 250,
#     "stopSequences": [],
#     "temperature": 1,
#     "top_p": 0.999,
#     "messages": [
#       {
#         "role": "user",
#         "content": [
#           {
#             "type": "text",
#             "text": prompt
#           }
#         ]
#       }
#     ]
#   })
# }

kwargs = {
  "modelId": "anthropic.claude-3-haiku-20240307-v1:0",
  "contentType": "application/json",
  "accept": "application/json",
  "body": json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1000,
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": prompt
          }
        ]
      }
    ]
  })
}

response = bedrock_runtime.invoke_model(**kwargs)
response_body = json.loads(response.get('body').read())

from pprint import pprint
pprint(response_body)

{'content': [{'text': 'The capital of France is Paris.', 'type': 'text'}],
 'id': 'msg_bdrk_016wCsryaQDWaW2AriPgs9TK',
 'model': 'claude-3-haiku-20240307',
 'role': 'assistant',
 'stop_reason': 'end_turn',
 'stop_sequence': None,
 'type': 'message',
 'usage': {'input_tokens': 14, 'output_tokens': 10}}


In [26]:
pprint(response_body['content'][0])

{'text': 'The capital of France is Paris.', 'type': 'text'}


### Using boto3

In [28]:
import boto3
session = boto3.Session(
    aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
    aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY'],
    region_name=os.environ['AWS_REGION']
)

bedrock_runtime = session.client('bedrock-runtime')

# For LangChain integration
from langchain_aws import ChatBedrock

llm = ChatBedrock(
    model_id="anthropic.claude-3-haiku-20240307-v1:0",
    client=bedrock_runtime,
    model_kwargs={
        "temperature": 0.5,
        "max_tokens": 1000
    }
)

response = bedrock_runtime.invoke_model(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": [{
            "role": "user",
            "content": "Explain quantum computing in simple terms"
        }]
    })
)
response_body = json.loads(response.get('body').read())
pprint(response_body)

{'content': [{'text': 'Quantum computing is a revolutionary computing '
                      'technology that harnesses the principles of quantum '
                      'mechanics, which govern the behavior of particles at '
                      'the atomic and subatomic levels. In simple terms, '
                      "here's how it works:\n"
                      '\n'
                      '1. Qubits: In classical computing, the basic unit of '
                      'information is a bit, which can exist in either a 0 or '
                      'a 1 state. In quantum computing, the basic unit is '
                      'called a qubit (quantum bit), which can exist in a '
                      'combination of 0 and 1 states simultaneously, a '
                      'phenomenon known as superposition.\n'
                      '\n'
                      '2. Superposition: This superposition allows qubits to '
                      'represent and process multiple states simultaneousl

In [36]:
import boto3
session = boto3.Session(
    aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
    aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY'],
    region_name=os.environ['AWS_REGION']
)

bedrock_runtime = session.client('bedrock-runtime')

# For LangChain integration
from langchain_aws import ChatBedrock

llm = ChatBedrock(
    model_id="anthropic.claude-3-haiku-20240307-v1:0",
    client=bedrock_runtime,
    model_kwargs={
        "temperature": 0.5,
        "max_tokens": 1000
    }
)

response = bedrock_runtime.invoke_model(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": [{
            "role": "user",
            "content": "Explain GPT transformer in 50 words to a 5 year old."
        }]
    })
)
response_body = json.loads(response.get('body').read())
pprint(response_body)

{'content': [{'text': 'GPT transformer is like a smart machine that can '
                      'understand and create language like humans. It works by '
                      'breaking down texts into small pieces, understanding '
                      'their meanings, and then putting them back together in '
                      "new and creative ways. It's like a puzzle-solving robot "
                      'that can understand and build sentences.',
              'type': 'text'}],
 'id': 'msg_bdrk_01543aaZrCsDBGwi9vZiQBh1',
 'model': 'claude-3-sonnet-20240229',
 'role': 'assistant',
 'stop_reason': 'end_turn',
 'stop_sequence': None,
 'type': 'message',
 'usage': {'input_tokens': 25, 'output_tokens': 61}}


### Using Langchain

In [43]:
def calculate_cost(input_tokens, output_tokens):
    input_cost = (input_tokens / 1000) * 0.003
    output_cost = (output_tokens / 1000) * 0.015
    return round(input_cost + output_cost, 4)



In [46]:
# Usage cost:
cost = calculate_cost(
    response_body['usage']['input_tokens'],
    response_body['usage']['output_tokens']
)
print(f"Cost: ${cost}")

Cost: $0.001


In [37]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant"),
    ("human", "{input}")
])

chain = prompt | llm
response = chain.invoke({"input": "Explain serverless architecture in 50 words to 80 year old"})

In [41]:
import json
pprint(response)
print(type(response))

AIMessage(content='Serverless architecture is like having a personal chef who cooks your meals for you. You don\'t have to worry about the kitchen or the equipment, you just tell the chef what you want, and they take care of it. The chef is the "server" and you are the "client" who gets the food (or service) without the hassle.', additional_kwargs={'usage': {'prompt_tokens': 28, 'completion_tokens': 77, 'cache_read_input_tokens': 0, 'cache_write_input_tokens': 0, 'total_tokens': 105}, 'stop_reason': 'end_turn', 'thinking': {}, 'model_id': 'anthropic.claude-3-haiku-20240307-v1:0', 'model_name': 'anthropic.claude-3-haiku-20240307-v1:0'}, response_metadata={'usage': {'prompt_tokens': 28, 'completion_tokens': 77, 'cache_read_input_tokens': 0, 'cache_write_input_tokens': 0, 'total_tokens': 105}, 'stop_reason': 'end_turn', 'thinking': {}, 'model_id': 'anthropic.claude-3-haiku-20240307-v1:0', 'model_name': 'anthropic.claude-3-haiku-20240307-v1:0'}, id='run-265d25e9-d684-4dce-b322-0afb4cf9d4db