### Setting up the Bedrock SDK

This section of code sets up the Bedrock SDK by:

- Downloading and unzipping the preview Python SDK package from a cloudfront URL
- Installing the preview botocore and boto3 wheel packages from the extracted SDK directory 
- Importing necessary libraries like boto3, botocore, and JSON utilities

The Bedrock SDK provides an API client to interact with AWS Bedrock services for batch inference jobs. We set up the client and packages here to prepare for defining and invoking batch jobs later on.

Some things to note:

- The `!` prefix runs these as shell commands in the jupyter notebook 
- We quietly install `-q` the wheel packages 
- The preview SDK gives access to Bedrock services before general availability
- We'll use boto3 built on botocore to call Bedrock batch APIs with Python

In [None]:
# Download and unzip preview SDK
!wget https://d2eo22ngex1n9g.cloudfront.net/Documentation/SDK/bedrock-python-sdk-reinvent.zip
!unzip -qq ./bedrock-python-sdk-reinvent.zip -d ./bedrock-python-sdk-reinvent

# Install preview SDK packages
!pip install -q $(ls ./bedrock-python-sdk-reinvent/botocore-*.whl | head -1)
!pip install -q $(ls ./bedrock-python-sdk-reinvent/boto3-*.whl | head -1)

### Import Python modules 

This code chunk imports Python libraries we'll need for using Bedrock and monitoring batch jobs:

- `boto3` - AWS SDK library to call Bedrock services 
- `botocore` - Underlying AWS SDK library boto3 builds on
- `Config` - Botocore config object for SDK behavior
- `time` - For sleep delays in loops monitoring job status
- `json` - For serializing data to JSON for Bedrock API input/output 
- `uuid` - For generating unique job names 
- `display`, `clear_output` - IPython display functions to refresh notebook output

Key points:

- We import AWS SDK clients and configs to call Bedrock batch APIs
- JSON helps pre-process data for batch input and parse results 
- Time and IPython functions help display feedback on async job status
- We'll generate unique batch job names using the uuid module

In [None]:
import boto3
import botocore
from botocore.client import Config

import time
import json
import uuid

from IPython.display import display, clear_output

### Upload input data to S3

This code handles uploading our input data to an S3 bucket:

- An S3 client is created using the boto3 SDK
- The S3 bucket and input/output keys are defined
- Sample input data is prepared in a list of dictionaries 
- The data contains prompts and parameters for the Claude model
- The data is looped through and converted to JSON lines format
- The JSON lines output is uploaded to the S3 input key 

Key points:

- Input data for Batch needs to be formatted and stored in S3
- Here we take care of uploading an input.jsonl file 
- The file contains JSON records, one prompt/parameter input per line
- This will be consumed by the Bedrock batch job as model input  

In [None]:
# Set up S3 client
session = boto3.Session()
s3 = session.client('s3')

# Bucket for input/output data 
bucket = "YOUR_BUCKET_NAME_HERE"
input_key = "input.jsonl" 
output_key = "output/"

# Upload sample input data
data = [
  {'recordId': '1', 'modelInput': {'prompt': 'Human: Write an email from Bob, Customer Service Manager, to the customer "John Doe" who provided negative feedback on the service provided by our customer support engineer Assistant:', 'max_tokens_to_sample':300}},

  {'recordId': '2', 'modelInput': {'prompt': 'Human: Craft an email response from Amanda, Sales Director, to a customer inquiry about pricing options for a large order Assistant:', 'max_tokens_to_sample':300}},

  {'recordId': '3', 'modelInput': {'prompt': 'Human: Write an email from the HR department to a new employee Daniel with orientation and onboarding information Assistant:', 'max_tokens_to_sample':300}},

  {'recordId': '4', 'modelInput': {'prompt': 'Human: Compose an email from IT support to employees about a planned network upgrade and potential service disruptions Assistant:', 'max_tokens_to_sample':300}},

  {'recordId': '5', 'modelInput': {'prompt': 'Human: Generate an email newsletter from a company CEO to employees highlighting recent company news and achievements Assistant:', 'max_tokens_to_sample':300}},

  {'recordId': '6', 'modelInput': {'prompt': 'Human: Draft an email from a manager to her team announcing an upcoming office holiday party with event details Assistant:', 'max_tokens_to_sample':300}},

  {'recordId': '7', 'modelInput': {'prompt': 'Human: Create an email template for the marketing team to use when sending lead nurturing campaigns to potential customers Assistant:', 'max_tokens_to_sample':300}},

  {'recordId': '8', 'modelInput': {'prompt': 'Human: Compose an out of office auto-reply email for John Smith, Accounting Controller, who will be on vacation next week Assistant:', 'max_tokens_to_sample':300}},

  {'recordId': '9', 'modelInput': {'prompt': 'Human: Write a follow up email from Amanda to existing customers announcing a special limited-time sale on products Assistant:', 'max_tokens_to_sample':300}},

  {'recordId': '10', 'modelInput': {'prompt': 'Human: Generate an email from the support team to users regarding an application outage that occurred yesterday Assistant:', 'max_tokens_to_sample':300}}
]

# Process data and output to new lines
output_data = ""
for row in data:
    output_data += json.dumps(row) + "\n"

s3.put_object(Body=output_data, Bucket=bucket, Key=input_key)

### Create and monitor batch job

This code handles creating a Bedrock batch job and monitoring its status:

- A Bedrock SDK client is created
- Input and output config point to the S3 buckets/keys 
- A unique job name is generated using the UUID library
- The batch job is created using the bedrock.create_job API
- Parameters include the IAM role, Claude model ID, and config
- The returned job ARN is stored to track job status  
- A loop checks the status every 5 seconds using the ARN
- IPython display functions clear and update the notebook output
- Once the job shows complete status, the output key is built
- Output JSON lines data is downloaded from S3 
- The output is printed line by line as formatted JSON

Key points:

- Bedrock APIs are used to submit and monitor async batch jobs 
- A loop tracks status by periodically checking based on the job ARN
- Result data is stored back in the designated S3 output location
- We retrieve the output and print the finished JSON lines result

In [None]:
# Create Bedrock client  
bedrock = boto3.client('bedrock')

# Input and output config
input_config = {"s3InputDataConfig":{"s3Uri": f"s3://{bucket}/{input_key}"}}  
output_config = {"s3OutputDataConfig":{"s3Uri": f"s3://{bucket}/{output_key}"}}

#Let's create a unique job name
suffix = str(uuid.uuid4())[:8] 
job_name = f"claude-batch-test-{suffix}" 

# Create batch job
response = bedrock.create_model_invocation_job(
    roleArn="arn:aws:iam::450006286986:role/BedrockBatchInferenceRole",
    modelId="anthropic.claude-v2",
    jobName=job_name,
    inputDataConfig=input_config,
    outputDataConfig=output_config  
)

job_arn = response['jobArn']

# Monitor job status
status = 'Submitted'
while status == 'Submitted' or status == 'InProgress':
    result = bedrock.get_model_invocation_job(jobIdentifier=job_arn)
    status = result['status']
    
    clear_output(wait=True)
    display(f"Status: {status}")
    
    time.sleep(5)
    
print("Job completed")

#Get the output manifest result key
result_key = output_key + result['jobArn'].split('/')[1] + '/' + input_key.split('.')[0] + '.jsonl.out'

# Get output data
obj = s3.get_object(Bucket=bucket, Key=result_key)
content = obj['Body'].read().decode('utf-8').strip().split('\n')

clear_output(wait=True)

for line in content:
    print(json.dumps(json.loads(line), indent=4))