# Amazon Bedrock Application Inference Profile

Amazon Bedrock offersn two kinds of inference profiles
* Cross region inference profiles – These are inference profiles predefined by Bedrock service (system defined) and routs the model requests to other regions and include multiple Regions to which requests for a model can be routed. This helps to improve resilience and improe throughputs. You can effectively manage traffic bursts with this feature. 
* Application Inference profiles - These are inference profiles created by users (user defined). This helps our customers to track costs and model usage. You can create an inference profile that routes model invocation requests to one region (with foundation model ID) or to multiple regions (by using cross region inference profile ID)

With applciaiton inference profile you have following benefits

Track usage metrics – When you enable model invocation logging and record to CloudWatch logs, you can track requests submitted  with an application inference profile to view usage metrics.

Use tags to monitor costs – You can attach tags to an application inference profile and track costs for on-demand model invocation requests. 

Cross-region inference – Increase your throughput by using a cross regional inference profile when creating the applicaiton inference profile to distribute invocations across regions. 



### Install dependecies

In [None]:
!pip install --upgrade --force-reinstall boto3 botocore awscli

In [1]:
import boto3
print(boto3.__version__)

1.35.54


In [None]:
import boto3
import sagemaker
import re
import time
import json

session = boto3.Session()
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
studio_region = sagemaker_session.boto_region_name 

bedrock = session.client("bedrock", region_name=studio_region)
br = session.client("bedrock-runtime", region_name=studio_region)

### List application inference profiles

In [None]:
bedrock.list_inference_profiles(typeEquals='APPLICATION')

### List cross region inference profiles

In [None]:
bedrock.list_inference_profiles(typeEquals='SYSTEM_DEFINED')

In [54]:
models = [{'modelArn': fm['modelArn'], 'modelName':fm['modelName'], 'modelId': fm['modelId']} for fm in bedrock.list_foundation_models()['modelSummaries'] if fm['modelName'].startswith('Claude') and fm['modelLifecycle']['status']== 'ACTIVE']
models

[{'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0',
  'modelName': 'Claude 3.5 Sonnet v2',
  'modelId': 'anthropic.claude-3-5-sonnet-20241022-v2:0'},
 {'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-instant-v1:2:100k',
  'modelName': 'Claude Instant',
  'modelId': 'anthropic.claude-instant-v1:2:100k'},
 {'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-instant-v1',
  'modelName': 'Claude Instant',
  'modelId': 'anthropic.claude-instant-v1'},
 {'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-v2:0:18k',
  'modelName': 'Claude',
  'modelId': 'anthropic.claude-v2:0:18k'},
 {'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-v2:0:100k',
  'modelName': 'Claude',
  'modelId': 'anthropic.claude-v2:0:100k'},
 {'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-v2:1:18k',
  'modelName': 'Claude',
  'modelId': 'anthropic.cla

## Setup single Region application inference profile
To create an application inference profile for one Region, specify a foundation model's ARN. Usage and costs for requests made to that Region with that model will be tracked. When creating the request you will supply following parameters

* inferenceProfileName - name for the inference profile
* modelSource - For single region you specify ARN of the foundation model in the copyFrom attribute
* description -	Description of the inference profile (optional)
* tags	- Attach tags to the inference profile. You can track costs using AWS cost allocation tags. This could be your project ID, department ID or how you want to track the cost

In [46]:
#Claude 3 Sonnet
#Model ARN: arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0:200k
#Model ID: anthropic.claude-v2:1:200k

modelId = 'anthropic.claude-3-5-sonnet-20241022-v2:0'
modelARN = [m['modelArn'] for m in models if m['modelId'] == modelId][0]
modelId, modelARN 

('anthropic.claude-3-5-sonnet-20241022-v2:0',
 'arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0')

In [47]:
inf_profile_response = bedrock.create_inference_profile(
    inferenceProfileName='ClaudeSonnetParts',
    description='Application profile for Claude Sonnet 3.0',
    modelSource={
        'copyFrom': modelARN
    },
    tags=[
        {
            'key': 'projectId',
            'value': 'partsUSXRC28'
        },
    ]
)

In [None]:
inf_profile_arn = inf_profile_response['inferenceProfileArn']
inf_profile_arn

In [None]:
bedrock.get_inference_profile(inferenceProfileIdentifier=inf_profile_arn)

In [50]:
inf_profile_id = bedrock.get_inference_profile(inferenceProfileIdentifier=inf_profile_arn)['inferenceProfileId']
inf_profile_id

'0swqzhpsd07r'

### Example usage with Converse API

To use an inference profile specify the ARN) of the inference profile in the modelId field 

In [52]:
from time import time
system_prompt = "You are an expert on AWS services and always provide correct and concise answers."
input_message = "Should I be storing documents in Amazon S3 or EFS for cost effective applications?"
start = time()
response = br.converse(
    modelId=inf_profile_arn,
    system=[{"text": system_prompt}],
    messages=[{
        "role": "user",
        "content": [{"text": input_message}]
    }]
)
end = time()
print(f"Response time: {int(end-start)} second(s)")
print(f"Using Application Inf Profile::Response output: {response['output']['message']['content']}")

Response time: 3 second(s)
Using Application Inf Profile::Response output: [{'text': 'For most cost-effective document storage, Amazon S3 is the better choice compared to EFS because:\n\n1. S3 is significantly cheaper per GB of storage\n2. S3 charges only for actual storage used\n3. S3 has multiple storage tiers (like S3 Standard-IA, Glacier) for further cost optimization\n4. No provisioning of capacity is needed\n\nEFS would be more appropriate when you need:\n- File system semantics (POSIX)\n- Shared file access from multiple EC2 instances\n- Low-latency read/write access\n\nFor typical document storage use cases, S3 is more cost-effective and should be your default choice.'}]


### Example usage with invoke model

In [57]:
import json
body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "temperature": 0.1,
    "top_p": 0.9,
    "system": system_prompt,
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": f"{input_message}",
                }
            ]
        }
    ]
})
accept = 'application/json'
contentType = 'application/json'
start = time()
response = br.invoke_model(body=body, modelId=inf_profile_arn, accept=accept, contentType=contentType)
end = time()
response_body = json.loads(response.get('body').read())
print(f"Response output: {response_body['content'][0]['text']}")

Response output: For most cost-effective document storage, Amazon S3 is the better choice compared to EFS because:

1. S3 is significantly cheaper per GB of storage
2. S3 has multiple storage tiers (Standard, Infrequent Access, Glacier) to optimize costs
3. You only pay for what you use with no pre-provisioning needed
4. S3 is designed for object storage (like documents)

Use EFS when you need:
- Shared file system access
- Linux-based file system features
- Low-latency access from multiple EC2 instances simultaneously

For simple document storage, S3 is more cost-effective and the recommended solution.


## Multiple regions application Inference Profile

To create an application inference profile across regions, specify cross region inference profile's ARN and rest of the parameters remain same as single region application inference profile

* inferenceProfileName - name for the inference profile
* modelSource - For multi region application profile you specify ARN of the cross region (system-defined) inference profile in the copyFrom attribute
* description -	Description of the inference profile (optional)
* tags	- Attach tags to the inference profile. You can track costs using AWS cost allocation tags. This could be your project ID, department ID or how you want to track the cost

In [None]:
cr_inf_profile = [ip for ip in bedrock.list_inference_profiles(typeEquals='SYSTEM_DEFINED')['inferenceProfileSummaries'] if ip['inferenceProfileName'] == 'US Anthropic Claude 3.5 Sonnet v2'][0]
cr_inf_profile

In [65]:
cr_inf_profile_arn = 'arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0'
cr_app_inf_profile_response = bedrock.create_inference_profile(
    inferenceProfileName='ClaudeSonnetSupplyCrossRegion',
    description='Application profile for Claude Sonnet 3.5 v2 with cross regional routing',
    modelSource={
        'copyFrom': cr_inf_profile_arn
    },
    tags=[
        {
            'key': 'projectId',
            'value': 'supplyUSXRC28'
        },
    ]
)

In [None]:
cr_app_inf_profile_arn = cr_app_inf_profile_response['inferenceProfileArn']
cr_app_inf_profile_arn

In [None]:
bedrock.get_inference_profile(inferenceProfileIdentifier=cr_app_inf_profile_arn)

### Example usage Converse API

In [70]:
from time import time
system_prompt = "You are an expert on AWS services and always provide correct and concise answers."
input_message = "Should I be storing documents in Amazon S3 or EFS for cost effective applications?"
start = time()
response = br.converse(
    modelId=cr_app_inf_profile_arn,
    system=[{"text": system_prompt}],
    messages=[{
        "role": "user",
        "content": [{"text": input_message}]
    }]
)
end = time()
print(f"Response time: {int(end-start)} second(s)")
print(f"Using Application Inf Profile::Response output: {response['output']['message']['content']}")

Response time: 4 second(s)
Using Application Inf Profile::Response output: [{'text': 'For most cost-effective document storage, Amazon S3 is the better choice over EFS because:\n\n1. S3 is significantly cheaper for storage (around $0.023 per GB/month) compared to EFS (around $0.30 per GB/month)\n\n2. S3 is designed for object storage and optimized for documents, files, and infrequently accessed data\n\n3. S3 offers flexible storage tiers (Standard, IA, Glacier) to further optimize costs based on access patterns\n\nUse EFS instead when you need:\n- Shared file system access\n- Low-latency concurrent access from multiple EC2 instances\n- POSIX file system capabilities\n\nFor typical document storage scenarios, S3 is the more cost-effective solution.'}]


### Example usage Invoke Model

In [71]:
import json
body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "temperature": 0.1,
    "top_p": 0.9,
    "system": system_prompt,
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": f"{input_message}",
                }
            ]
        }
    ]
})
accept = 'application/json'
contentType = 'application/json'
start = time()
response = br.invoke_model(body=body, modelId=cr_app_inf_profile_arn, accept=accept, contentType=contentType)
end = time()
response_body = json.loads(response.get('body').read())
print(f"Response output: {response_body['content'][0]['text']}")

Response output: For most cost-effective document storage, Amazon S3 is the better choice compared to EFS because:

1. S3 is significantly cheaper per GB of storage
2. S3 has pay-as-you-go pricing with no minimum commitments
3. S3 offers different storage classes (like S3 Standard-IA, S3 One Zone-IA, S3 Glacier) to optimize costs based on access patterns
4. S3 is serverless with no infrastructure to manage

EFS is more suitable when you need:
- File system semantics
- Shared file access from multiple EC2 instances
- Low-latency access with high IOPS
- Linux-based file system operations

For simple document storage, S3 is almost always the more cost-effective solution.


## Delete Inference Profiles

In [None]:
app_inf_profiles = bedrock.list_inference_profiles(typeEquals='APPLICATION')['inferenceProfileSummaries']
app_inf_profiles

In [78]:
for app_ip in app_inf_profiles:
    response = bedrock.delete_inference_profile(inferenceProfileIdentifier=app_ip['inferenceProfileArn'])

In [79]:
app_inf_profiles = bedrock.list_inference_profiles(typeEquals='APPLICATION')['inferenceProfileSummaries']
app_inf_profiles

[]

## Monitor in CloudWatch Logs

You can monitor requests and usage metrics at the application inference profile level. To perform this, enable model invocation logging in Bedrock service settings and record to CloudWatch logs. Then from CloudWtach console, you can track requests submitted with an application inference profile to view usage metrics.

<img src="application_inference_profile.png" width="650">
