# Bedrock Additional Features - Streaming support & Retry 

This notebook covers additional features and controls with Bedrock. This covers Streaming and Retry mechanism

## Pre-requisites

In [None]:
#Check Python version is greater than 3.8 
import sys
sys.version

## Install Dependencies

In [None]:
!pip install langchain --upgrade
!pip install langchain-community --upgrade

In [None]:
!pip install boto3 --upgrade

## Restart Kernel

In [None]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)  

In [1]:
#Check Python version is greater than 3.8 which is required by Langchain if you want to use Langchain
import sys
sys.version

'3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:07:37) [GCC 12.3.0]'

In [2]:
assert sys.version_info >= (3, 8)

In [3]:
import langchain

In [4]:
langchain.__version__

'0.1.4'

In [5]:
import sagemaker
import boto3
session = boto3.Session()
sagemaker_session = sagemaker.Session()
studio_region = sagemaker_session.boto_region_name 
#sagemaker_session.get_caller_identity_arn()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


In [6]:
import json

#Create Bedrock client
bedrock = boto3.client('bedrock-runtime')
prompt_data = """Command: Write me a blog about making strong business decisions as a leader.\nBlog:"""

## Streaming Response
Bedrock provides streaming inference for models that support streaming. To run inference with streaming, use the InvokeModelWithResponseStream operation.

In [7]:
from IPython.display import display, display_markdown, Markdown, clear_output

prompt_formatted = "Human:\n" + prompt_data + "\nAssistant:\n"
body = json.dumps({"prompt": prompt_formatted, "max_tokens_to_sample": 200})
modelId = "anthropic.claude-instant-v1"  
accept = "application/json"
contentType = "application/json"

response = bedrock.invoke_model_with_response_stream(body=body, modelId=modelId, accept=accept, contentType=contentType)
stream = response.get('body')
output = []

if stream:
    for event in stream:
        chunk = event.get('chunk')
        if chunk:
            chunk_obj = json.loads(chunk.get('bytes').decode())
            text = chunk_obj['completion']
            clear_output(wait=True)
            output.append(text)
            display_markdown(Markdown(''.join(output)))

 Here is a draft blog post about making strong business decisions as a leader:

**Making Strong Business Decisions as a Leader** 

As a business leader, one of your most important responsibilities is making strategic decisions that help move your company forward. However, making the right choices is not always straightforward. There are countless variables to consider, and the potential outcomes of any decision are uncertain. So how can you make strong choices that positively impact your business? Here are some tips:

**Gather thorough information.** The foundation for any good decision is having a complete understanding of the situation. Take time to research all relevant facts, figures, and perspectives before making a call. Identify key stakeholders and get their input. Look at past performance metrics, industry trends, consumer behaviors - anything that provides useful context. Leaving no stone unturned reduces risks down the road.

**Consider multiple options.** Don't get tunnel vision focusing on just one path. Brainstorm as many potential alternatives

## Bedrock boto3 client API calls with retry
The code sniippet below shows how to implement retry with Botocore configuration. For more details check the [documentation.](https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html)

**total_max_attempts** :An integer representing the maximum number of total attempts that will be made on a single request. This includes the initial request, so a value of 1 indicates that no requests will be retried.

**mode**: Possible values legacy, standard and adaptive.

- legacy - The pre-existing retry behavior.
- standard - The standardized set of retry rules. This will also default to 3 max attempts unless overridden.
- adaptive - Retries with additional client side throttling.


In [8]:
import boto3
from botocore.config import Config

config = Config(
   retries = {
      'total_max_attempts': 10, #This includes total attempts including the initial attempt
      'mode': 'standard' # legacy, standard, adaptive
   }
)

#Create Bedrock client
bedrock = boto3.client('bedrock',config=config)
bedrock.list_foundation_models()

{'ResponseMetadata': {'RequestId': 'cf4abf2d-204e-486f-9387-65e50275dab6',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Fri, 26 Jan 2024 20:47:54 GMT',
   'content-type': 'application/json',
   'content-length': '17086',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'cf4abf2d-204e-486f-9387-65e50275dab6'},
  'RetryAttempts': 0},
 'modelSummaries': [{'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-tg1-large',
   'modelId': 'amazon.titan-tg1-large',
   'modelName': 'Titan Text Large',
   'providerName': 'Amazon',
   'inputModalities': ['TEXT'],
   'outputModalities': ['TEXT'],
   'responseStreamingSupported': True,
   'customizationsSupported': [],
   'inferenceTypesSupported': ['ON_DEMAND'],
   'modelLifecycle': {'status': 'ACTIVE'}},
  {'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-embed-g1-text-02',
   'modelId': 'amazon.titan-embed-g1-text-02',
   'modelName': 'Titan Text Embeddings v2',
   'providerName': 'Amazon',
   'inp

## Retry option when using Langchain
You can pass the Boto3 client created with retry attempts configuration when creating Bedrock LLM with Langchain 

In [9]:
import boto3
from botocore.config import Config
from langchain_community.llms.bedrock import Bedrock

config = Config(
   retries = {
      'total_max_attempts': 10, #This includes total attempts including the initial attempt
      'mode': 'standard' # legacy, standard, adaptive
   }
)

#Create Bedrock client
bedrock = boto3.client('bedrock-runtime',config=config)

#Pass the client to create Bedrock
llm = Bedrock(
        client=bedrock,
        model_id="amazon.titan-tg1-large",
        model_kwargs={"temperature": 0.5, "maxTokenCount": 100}
    )

llm.invoke(prompt_data)

' Making Strong Business Decisions as a Leader\n\nIn the dynamic world of business, making informed and effective decisions is crucial for success. As a leader, you have the responsibility of leading your team toward growth and prosperity. In this blog, we will explore some key principles and strategies for making strong business decisions as a leader.\n\nUnderstand the Situation:\nBefore making any decision, it is essential to gather all relevant information and analyze the situation thoroughly. This includes understanding the market trends, customer needs'

## Submitting a batch with Retry attempts
You can submit a batch of requests with retry attempts config enabled client

In [12]:
from concurrent.futures import ThreadPoolExecutor, as_completed

class ThrottlingException(Exception):
    "Raised when Langchain llm gets retry error"
    pass

def bedrock_llm_call(prompt):
    try:
        return llm.invoke(prompt)
    except ValueError as ex:
        print(f"Received value error. Details {ex}")
        raise ThrottlingException(f"Retry error encounetred. Failed to process data {prompt}.")

In [13]:
batch_prompt_data = []

prompt_template = """Command: Write me a blog about making strong business decisions as a leader. Limit your response to {n} sentence(s). \nBlog:"""

for i in range(1,6):
    batch_prompt_data.append(prompt_template.format(n=i))

with ThreadPoolExecutor(max_workers=3) as executor:
    #Create KV pairs
    submitted_items = {executor.submit(bedrock_llm_call, d): d for d in batch_prompt_data}

    for i, future in enumerate(as_completed(submitted_items)):
        d = submitted_items[future]
        try:
            resp = future.result()
            print(f"{i +1}: Prompt: {d} Response: {resp}")
        except ThrottlingException as ex:
            print(f"Received Throttling exception: {ex}")

1: Prompt: Command: Write me a blog about making strong business decisions as a leader. Limit your response to 1 sentence(s). 
Blog: Response:  Making strong business decisions as a leader requires careful consideration of data, intuition, and experience.
2: Prompt: Command: Write me a blog about making strong business decisions as a leader. Limit your response to 2 sentence(s). 
Blog: Response:  Making strong business decisions as a leader requires careful consideration of data, intuition, and a commitment to the organization's values. Leaders must be able to balance short-term goals with long-term vision and adapt to changing market conditions.
3: Prompt: Command: Write me a blog about making strong business decisions as a leader. Limit your response to 3 sentence(s). 
Blog: Response:  Making strong business decisions as a leader requires careful consideration of data, intuition, and a strategic vision. Leaders must be able to weigh the potential risks and rewards of different option

## Bedrock Batch Inference
Amazon Bedrock supports batch inference, you can run multiple inference requests asynchronously to process a large number of requests efficiently by running inference on data that is stored in an S3 bucket. You can use batch inference to improve the performance of model inference on large datasets. To perform batch inference, we need to store data in S3, create a batch inference job and get the output.

Batch Inference is in preview and requires Python SDK to be installed. Check the documentation page for code samples and details

https://docs.aws.amazon.com/bedrock/latest/userguide/batch-inference-example.html
