# Batch inference with Bedrock

This notebook aims to provide an example on how to run batch inference with Bedrock. 

With batch inference, you can process a larger number of prompts in a more efficient way. Also, according to [pricing](https://aws.amazon.com/bedrock/pricing/), it is ~50% cheaper than doing inference with single requests.

Note: Not all the models support batch inference. Check [here](https://docs.aws.amazon.com/bedrock/latest/userguide/batch-inference-supported.html) the ones that do.

## Creating the dataset

First of all, we need to create a .jsonl dataset with the following format:

```{ "recordId" : "00001", "modelInput" : {JSON body} }```

The format of the modelInput JSON object must match the body field for the model that you use in the InvokeModel request.

For this example, we will use an Anthropic Claude model. Let's create some data with the required format:

In [1]:
MAX_TOKENS = 1000
SYSTEM_PROMPT = "Please respond only with emoji."

user_message = {"role": "user", "content": "Hello there! How are you doing today?"}

examples = [
    {
        "recordId": "CALL0000001",
        "modelInput": {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": MAX_TOKENS,
            "messages": [{"role": "user", "content": "Hello there! How are you doing today?"}],
        },
    },
    {
        "recordId": "CALL0000002",
        "modelInput": {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": MAX_TOKENS,
            "messages": [{"role": "user", "content": "My cat is so cute!"}],
        },
    },
    {
        "recordId": "CALL0000003",
        "modelInput": {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": MAX_TOKENS,
            "messages": [{"role": "user", "content": "I'm programming today"}],
        },
    },
]

examples = 100 * examples  # there is a requirement in Bedrock for having min 100 input samples

In [2]:
!pip install jsonlines

[0m

In [3]:
import jsonlines

with jsonlines.open("examples.jsonl", mode="w") as writer:
    writer.write_all(examples)

## Upload the dataset to S3

The way batch inference works with Bedrock is as follows:
- You upload (or already have) a dataset with your prompts in an S3 bucket.
- You run the batch inference job. 
- The results are written in an S3 bucket. Then, if needed, you can download them.

Thus, the next step is to upload the jsonl file with the data to S3.

In order to do so, we need to be authenticated in our AWS ml account:

In [4]:
import boto3

ml_session = boto3.Session(profile_name="ml", region_name="us-east-1")

In [5]:
s3_client = ml_session.client("s3")

response = s3_client.list_buckets()

for bucket in response["Buckets"]:
    print(bucket["Name"])

ml-rd-bedrock-datasets
ml-rd-bedrock-inference-outputs
ml-rd-mlflow-artifact-storage
ml-rd-query-rewriting-experiments
sagemaker-studio-43ifgnfx4ho
sagemaker-studio-ezt7qs2smmu
sagemaker-us-east-1-879381254630


The S3 bucket ```ml-rd-bedrock-datasets``` has been created as the storage for Bedrock datasets, so we will use it in this example.

In [6]:
from botocore.exceptions import ClientError

try:
    s3_client.upload_file("examples.jsonl", "ml-rd-bedrock-datasets", "examples.jsonl")
except ClientError as e:
    print(e)

Now, if you check the S3 bucket, you'll our dataset stored as examples.jsonl.

## Run inference batch job

Now, let's run the inference batch job in Bedrock. To keep things organized, we will create 1 folder per job to store its outputs.

We will use the bucket ```ml-rd-bedrock-inference-outputs```.

Let's now define the input and output S3 buckets:

In [7]:
input_data_config = {"s3InputDataConfig": {"s3Uri": "s3://ml-rd-bedrock-datasets/examples.jsonl"}}

output_data_config = {"s3OutputDataConfig": {"s3Uri": "s3://ml-rd-bedrock-inference-outputs"}}

And now, let's run the batch job:

In [8]:
bedrock_client = ml_session.client("bedrock")

response = bedrock_client.create_model_invocation_job(
    roleArn=ml_session.client("iam").get_role(
        RoleName="AmazonSageMaker-ExecutionRole-20241203T102031"
    )["Role"]["Arn"],
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    jobName=f"example-bedrock-batch-inference-job-v8",
    inputDataConfig=input_data_config,
    outputDataConfig=output_data_config,
)

job_arn = response["jobArn"]
print(f"Successfully launched job with ARN: {job_arn}")

Successfully launched job with ARN: arn:aws:bedrock:us-east-1:879381254630:model-invocation-job/i6kuqtep06cn


In [11]:
bedrock_client.get_model_invocation_job(jobIdentifier=job_arn)["status"]

'Scheduled'

Now you can go to the Bedrock UI and check your job under Inference and Assessment > Batch inference.

## (Optional) Download the results

Optionally, we can download the results from S3 to our local filesystem. Of course, this step is not recommended if the dataset is huge, as it might take lots of time and storage.

In [None]:
job_id = job_arn.split("/")[
    1
]  # results are stored in s3://ml-rd-bedrock-inference-outputs/{job_id}/examples.jsonl.out

s3_client.download_file(
    "ml-rd-bedrock-inference-outputs", f"{job_id}/examples.jsonl.out", "examples.jsonl.out"
)

## More resources

- [Another example from AWS](https://github.com/aws-samples/amazon-bedrock-samples/blob/main/introduction-to-bedrock/batch_api/batch-inference-transcript-summarization.ipynb)