# Bedrock Additional Features - Streaming support & Retry 

This notebook covers additional features and controls with Bedrock. This covers Streaming and Retry mechanism

(This notebook was tested on SageMaker Studio ml.m5.2xlarge instance with Datascience 3.0 kernel)

## Pre-requisites

In [None]:
#Check Python version is greater than 3.8 which is required by Langchain if you want to use Langchain
import sys
sys.version

## Install Dependencies

In [None]:
!pip install langchain --upgrade

In [None]:
!pip install boto3 --upgrade

## Restart Kernel

In [None]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)  

In [2]:
#Check Python version is greater than 3.8 which is required by Langchain if you want to use Langchain
import sys
sys.version

'3.10.6 (main, Oct  7 2022, 20:19:58) [GCC 11.2.0]'

In [3]:
assert sys.version_info >= (3, 8)

In [4]:
import langchain

In [5]:
langchain.__version__

'0.0.314'

In [6]:
import sagemaker
import boto3
session = boto3.Session()
sagemaker_session = sagemaker.Session()
studio_region = sagemaker_session.boto_region_name 
#sagemaker_session.get_caller_identity_arn()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


In [7]:
import json

#Create Bedrock client
bedrock = boto3.client('bedrock-runtime' , 'us-east-1', endpoint_url='https://bedrock.us-east-1.amazonaws.com')
prompt_data = """Command: Write me a blog about making strong business decisions as a leader.\nBlog:"""

## Streaming Response
Bedrock provides streaming inference for models that support streaming. To run inference with streaming, use the InvokeModelWithResponseStream operation.

In [8]:
from IPython.display import display, display_markdown, Markdown, clear_output

prompt_formatted = "Human:\n" + prompt_data + "\nAssistant:\n"
body = json.dumps({"prompt": prompt_formatted, "max_tokens_to_sample": 200})
modelId = "anthropic.claude-instant-v1"  
accept = "application/json"
contentType = "application/json"

response = bedrock.invoke_model_with_response_stream(body=body, modelId=modelId, accept=accept, contentType=contentType)
stream = response.get('body')
output = []

if stream:
    for event in stream:
        chunk = event.get('chunk')
        if chunk:
            chunk_obj = json.loads(chunk.get('bytes').decode())
            text = chunk_obj['completion']
            clear_output(wait=True)
            output.append(text)
            display_markdown(Markdown(''.join(output)))

 Here is a draft blog post on making strong business decisions as a leader:

**Making Strong Business Decisions as a Leader**

As a business leader, one of your most important responsibilities is making good decisions that move your company forward in a positive direction. However, making the right calls is not always easy. There are often many factors to consider and potential consequences to weigh. Here are some tips for making strong, well-thought-out business decisions as the leader of your organization.

**Gather all relevant information.** Before making any decision, ensure you have all the facts at hand. Look at data and metrics related to the potential options. Get input from different departments on how choices might impact them. Talk to senior leaders and subject matter experts for their perspectives. The more informed you are, the better equipped you'll feel to choose the best path. 

**Consider short and long-term impacts.** When evaluating options, think about immediate effects but also how decisions could influence

## Bedrock boto3 client API calls with retry
The code sniippet below shows how to implement retry with Botocore configuration. For more details check the [documentation.](https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html)

**total_max_attempts** :An integer representing the maximum number of total attempts that will be made on a single request. This includes the initial request, so a value of 1 indicates that no requests will be retried.

**mode**: Possible values legacy, standard and adaptive.

- legacy - The pre-existing retry behavior.
- standard - The standardized set of retry rules. This will also default to 3 max attempts unless overridden.
- adaptive - Retries with additional client side throttling.


In [9]:
import boto3
from botocore.config import Config

config = Config(
   retries = {
      'total_max_attempts': 10, #This includes total attempts including the initial attempt
      'mode': 'standard' # legacy, standard, adaptive
   }
)

#Create Bedrock client
bedrock = boto3.client('bedrock' , 'us-east-1', endpoint_url='https://bedrock.us-east-1.amazonaws.com',config=config)
bedrock.list_foundation_models()

{'ResponseMetadata': {'RequestId': '36288f10-cafd-485e-bbfb-792f3347d87d',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Fri, 13 Oct 2023 20:33:41 GMT',
   'content-type': 'application/json',
   'content-length': '5729',
   'connection': 'keep-alive',
   'x-amzn-requestid': '36288f10-cafd-485e-bbfb-792f3347d87d'},
  'RetryAttempts': 0},
 'modelSummaries': [{'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-tg1-large',
   'modelId': 'amazon.titan-tg1-large',
   'modelName': 'Titan Text Large',
   'providerName': 'Amazon',
   'inputModalities': ['TEXT'],
   'outputModalities': ['TEXT'],
   'responseStreamingSupported': True,
   'customizationsSupported': ['FINE_TUNING'],
   'inferenceTypesSupported': ['ON_DEMAND']},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-e1t-medium',
   'modelId': 'amazon.titan-e1t-medium',
   'modelName': 'Titan Text Embeddings',
   'providerName': 'Amazon',
   'inputModalities': ['TEXT'],
   'outputModalities'

## Retry option when using Langchain
You can pass the Boto3 client created with retry attempts configuration when creating Bedrock LLM with Langchain 

In [10]:
import boto3
from botocore.config import Config
from langchain.llms.bedrock import Bedrock

config = Config(
   retries = {
      'total_max_attempts': 10, #This includes total attempts including the initial attempt
      'mode': 'standard' # legacy, standard, adaptive
   }
)

#Create Bedrock client
bedrock = boto3.client('bedrock-runtime' , 'us-east-1', endpoint_url='https://bedrock.us-east-1.amazonaws.com',config=config)

#Pass the client to create Bedrock
llm = Bedrock(
        client=bedrock,
        model_id="amazon.titan-tg1-large",
        model_kwargs={"temperature": 0.5, "maxTokenCount": 100}
    )

llm(prompt_data)

" Making strong business decisions as a leader requires a combination of strategic thinking, critical analysis, and intuition. Here are some tips to help you make informed and effective decisions:\nDefine your goals and vision: Clearly understand your organization's goals and vision, as well as your personal leadership goals. This will help you align your decisions with the overall mission and direction of the company.\nGather relevant information: Collect as much relevant information as possible about the decision you need to make. This can include financial"

## Submitting a batch with Retry attempts
You can submit a batch of requests with retry attempts config enabled client

In [11]:
from concurrent.futures import ThreadPoolExecutor, as_completed

class ThrottlingException(Exception):
    "Raised when Langchain llm gets retry error"
    pass

def bedrock_llm_call(prompt):
    try:
        return llm(prompt)
    except ValueError as ex:
        print(f"Received value error. Details {ex}")
        raise ThrottlingException(f"Retry error encounetred. Failed to process data {prompt}.")

In [12]:
batch_prompt_data = []

prompt_template = """Command: Write me a blog about making strong business decisions as a leader. Limit your response to {n} sentence(s). \nBlog:"""

for i in range(1,6):
    batch_prompt_data.append(prompt_template.format(n=i))

with ThreadPoolExecutor(max_workers=3) as executor:
    #Create KV pairs
    submitted_items = {executor.submit(bedrock_llm_call, d): d for d in batch_prompt_data}

    for i, future in enumerate(as_completed(submitted_items)):
        d = submitted_items[future]
        try:
            resp = future.result()
            print(f"{i +1}: Prompt: {d} Response: {resp}")
        except ThrottlingException as ex:
            print(f"Received Throttling exception: {ex}")

1: Prompt: Command: Write me a blog about making strong business decisions as a leader. Limit your response to 1 sentence(s). 
Blog: Response:  Making strong business decisions as a leader requires careful analysis, strategic planning, and the ability to adapt to changing circumstances.
2: Prompt: Command: Write me a blog about making strong business decisions as a leader. Limit your response to 2 sentence(s). 
Blog: Response:  Making strong business decisions as a leader requires a combination of intuition, strategic thinking, and careful consideration of the facts. Leaders must be able to balance competing priorities, consider the potential risks and rewards of different options, and ultimately make a decision that is in the best interest of the organization. This process can be challenging, but it is essential for the success of any business.
3: Prompt: Command: Write me a blog about making strong business decisions as a leader. Limit your response to 3 sentence(s). 
Blog: Response: