Multi-tenant Generative AI gateway with cost and usage tracking on AWS

In this repository, we show you how to build a multi-tenant SaaS solution to access foundation models with Amazon Bedrock and Amazon SageMaker.

Enterprise IT teams may need to track the usage of FMs across teams, chargeback costs and provide visibility to the relevant cost center in the LOB. Additionally, they may need to regulate access to different models per team. For example, if only specific FMs may be approved for use.

An internal software as a service (SaaS) for foundation models can address governance requirements while providing a simple and consistent interface for the end users. API Gateway is a common design pattern that enable consumption of services with standardization and governance. They can provide loose coupling between model consumers and the model endpoint service that gives flexibility to adapt to changing model versions, architectures and invocation methods.

Project Description
API Specifications
Reporting Costs Example
Deploy Stack
1. Full Deployment
2. API Key Deployment
SageMaker Endpoints

Project Description

Multiple tenants within an enterprise could simply reflect to multiple teams or projects accessing LLMs via REST APIs just like other SaaS services. IT teams can add additional governance and controls over this SaaS layer. In this cdk example, we focus specifically on showcasing multiple tenants with different cost centers accessing the service via API gateway. An internal service is responsible to perform usage and cost tracking per tenant and aggregate that cost for reporting. The cdk template provided here deploys all the required resources to the AWS account.

The CDK Stack provides the following deployments:

Full Deployment: It deploys the following resources:

Private Networking environment with VPC, Private Subnets, VPC Endpoints for Lambda, API Gateway, and Amazon Bedrock
API Gateway Rest API
API Gateway Usage Plan
API Gateway Key
Lambda functions to list foundation models on Bedrock
Lambda functions to invoke models on Bedrock and SageMaker
Lambda functions to invoke models on Bedrock and SageMaker with streaming response
DynamoDB table for saving streaming responses asynchronously
Lambda function to aggregate usage and cost tracking
EventBridge to trigger the cost aggregation on a regular frequency
S3 buckets to store the cost tracking logs
Cloudwatch logs to collect logs from Lambda invocations

API Key Deployment: It deploys the following resources:

API Gateway Usage Plan
API Gateway Key

Sample notebook in the notebooks folder can be used to invoke Bedrock as either one of the teams/cost_center. API gateway then routes the request to the lambda that invokes Bedrock models or SageMaker hosted models and logs the usage metrics to cloudwatch. EventBridge triggers the cost tracking lambda on a regular frequnecy to aggregate metrics from the cloudwatch logs and generate aggregate usage and cost metrics for the chosen granularity level. The metrics are stored in S3 and can further be visualized with custom reports.

API Specifications

The CDK Stack creates Rest API compliant with OpenAPI specification standards.

The solution is currently support both REST invocation and Streaming invocation with long polling for Bedrock and SageMaker.

OpenAPI 3

openapi: 3.0.1
info:
  title: "<REST_API_NAME>"
  version: '2023-12-13T12:12:15Z'
servers:
- url: https://<HOST>.execute-api.<REGION>.amazonaws.com/{basePath}
  variables:
    basePath:
      default: prod
paths:
  "/list_foundation_models":
    get:
      responses:
        '401':
          description: 401 response
          headers:
            Access-Control-Allow-Origin:
              schema:
                type: string
          content:
            application/json:
              schema:
                "$ref": "#/components/schemas/Error"
      security:
      - api_key: []
  "/invoke_model":
      requestBody:
        required: true
        content:
          application/json:
             schema:
               $ref: '#/components/schemas/InvokeModelRequest'
      parameters:
      - name: model_id
        in: query
        required: true
        schema:
          type: string
        description: Id of the base model to invoke
      - name: model_arn
        in: query
        required: true
        schema:
          type: string
        description: ARN of the custom model in Amazon Bedrock
      - name: requestId
        in: query
        required: false
        schema:
          type: string
        description: Request ID for long-polling functionality. Requires streaming=true
      - name: team_id
        in: header
        required: true
        schema:
          type: string
      - name: messages_api
        in: header
        required: false
        schema:
          type: string
      - name: streaming
        in: header
        required: false
        schema:
          type: string
      - name: type
        in: header
        required: false
        schema:
          type: string
      responses:
        '401':
          description: 401 response
          headers:
            Access-Control-Allow-Origin:
              schema:
                type: string
          content:
            application/json:
              schema:
                "$ref": "#/components/schemas/Error"
      security:
      - api_key: []
components:
  schemas:
   InvokeModelRequest:
      type: object
      required:
        - inputs
        - parameters
      properties:
        inputs:
          $ref: '#/components/schemas/Prompt'
        parameters:
          $ref: '#/components/schemas/ModelParameters'
   Prompt:
      type: object
      example:
        - role: 'user'
          content: 'What is Amazon Bedrock?'
    ModelParameters:
      type: object
      properties:
        maxTokens:
          type: integer
          required: false
        temperature:
          type: number
          required: false
        topP:
          type: number
          required: false
        stopSequences:
          type: array
          required: false
          items:
            type: string
        system:
          type: string
          required: false
          
   Error:
      title: Error Schema
      type: object
      properties:
        message:
          type: string
  securitySchemes:
    api_key:
      type: apiKey
      name: x-api-key
      in: header

Reporting Costs Example

team_id	model_id	input_tokens	output_tokens	invocations	input_cost	output_cost
tenant1	amazon.titan-tg1-large	24000	2473	1000	0.0072	0.00099
tenant1	anthropic.claude-v2	2448	4800	24	0.02698	0.15686
tenant2	amazon.titan-tg1-large	35000	52500	350	0.0105	0.021
tenant2	ai21.j2-grande-instruct	4590	9000	45	0.05738	0.1125
tenant2	anthropic.claude-v2	1080	4400	20	0.0119	0.14379

Deploy Stack

Note

The following examples are providing guidelines on the structure for the configuration file. Please make sure to look at setup/configs.json for the most updated version of the file.

Full Deployment

Step 1

Edit the global configs used in the CDK Stack. For each organizational units that requires a dedicated multi-tenant SaaS environment, create an entry in setup/configs.json

[
  {
    "STACK_PREFIX": "", # unit 1 with dedicated SaaS resources
    "BEDROCK_ENDPOINT": "https://bedrock-runtime.{}.amazonaws.com", # bedrock-runtime endpoint used for invoking Amazon Bedrock
    "BEDROCK_REQUIREMENTS": "boto3>=1.34.62 awscli>=1.32.62 botocore>=1.34.62", # Requirements for Amazon Bedrock
    "LANGCHAIN_REQUIREMENTS": "aws-lambda-powertools langchain==0.1.12 pydantic PyYaml", # python modules installed for langchain layer
    "PANDAS_REQUIREMENTS": "pandas", # python modules installed for pandas layer
    "VPC_CIDR": "10.10.0.0/16" # CIDR used for the private VPC Env,
    "API_THROTTLING_RATE": 10000, #Throttling limit assigned to the usage plan
    "API_BURST_RATE": 5000 # Burst limit assigned to the usage plan
  },
  {
    "STACK_PREFIX": "" # unit 2 with dedicated SaaS resources,
    "BEDROCK_ENDPOINT": "https://bedrock-runtime.{}.amazonaws.com", # bedrock-runtime endpoint used for invoking Amazon Bedrock
    "BEDROCK_REQUIREMENTS": "boto3>=1.34.62 awscli>=1.32.62 botocore>=1.34.62", # Requirements for Amazon Bedrock
    "LANGCHAIN_REQUIREMENTS": "aws-lambda-powertools langchain==0.1.12 pydantic PyYaml", # python modules installed for langchain layer
    "PANDAS_REQUIREMENTS": "pandas", # python modules installed for pandas layer
    "VPC_CIDR": "10.20.0.0/16" # CIDR used for the private VPC Env,
    "API_THROTTLING_RATE": 10000,
    "API_BURST_RATE": 5000
  },
]

Step 2

Execute the following commands:

chmod +x deploy_stack.sh

./deploy_stack.sh

API Key Deployment

Step 1

Option 1

Edit the global configs used in the CDK Stack. For each organizational units that requires a dedicated API Key associated to a crated API Gateway REST API, create an entry in setup/configs.json by specifying API_GATEWAY_ID and API_GATEWAY_RESOURCE_ID:

[
  {
    "STACK_PREFIX": "", # unit 1 with dedicated SaaS resources
    "API_GATEWAY_ID": "", # Rest API ID
    "API_GATEWAY_RESOURCE_ID": "", # Resource ID of the Rest API
    "API_THROTTLING_RATE": 10000, #Throttling limit assigned to the usage plan
    "API_BURST_RATE": 5000 # Burst limit assigned to the usage plan
    
  }
]

Option 2

Edit the global configs used in the CDK Stack. For each organizational units that requires a dedicated API Key associated to a crated API Gateway REST API, create an entry in setup/configs.json by specifying PARENT_STACK_PREFIX:

[
  {
    "STACK_PREFIX": "", # unit 1 with dedicated SaaS resources
    "PARENT_STACK_PREFIX": "", # unit parent you want to import configurations
    "API_THROTTLING_RATE": 10000, #Throttling limit assigned to the usage plan
    "API_BURST_RATE": 5000 # Burst limit assigned to the usage plan
    
  }
]

Step 2

Execute the following commands:

chmod +x deploy_stack.sh

./deploy_stack.sh

SageMaker Endpoints

Add FMs through Amazon SageMaker:

We can expose Foundation Models hosted in Amazon SageMaker by providing the endpoint names in a JSON format in a string representation, as described in the example below:

[
  {
    "STACK_PREFIX": "", # unit 1 with dedicated SaaS resources
    "BEDROCK_ENDPOINT": "https://bedrock-runtime.{}.amazonaws.com", # bedrock-runtime endpoint used for invoking Amazon Bedrock
    "BEDROCK_REQUIREMENTS": "boto3>=1.34.62 awscli>=1.32.62 botocore>=1.34.62", # Requirements for Amazon Bedrock
    "LANGCHAIN_REQUIREMENTS": "aws-lambda-powertools langchain==0.1.12 pydantic PyYaml", # python modules installed for langchain layer
    "PANDAS_REQUIREMENTS": "pandas", # python modules installed for pandas layer
    "VPC_CIDR": "10.10.0.0/16" # CIDR used for the private VPC Env,
    "API_THROTTLING_RATE": 10000, #Throttling limit assigned to the usage plan
    "API_BURST_RATE": 5000 # Burst limit assigned to the usage plan,
    "SAGEMAKER_ENDPOINTS": "{'Mixtral 8x7B': 'Mixtral-SM-Endpoint'}" # List of SageMaker endpoints
  }
]

InferenceComponentName with SageMaker Endpoint

We can provide InferenceComponentNamespecification for the model invocation. Please refer to the notebook 01_bedrock_api.ipynb for an example

Important note

Amazon SageMaker Hosting is providing flexibility in the definition of the inference container. This solution is currently supporting general purpose inference scripts provided by SageMaker JumpStart and Hugging Face TGI container.

It is required to adapt the lambda functions invoke_model and invoke_model_streaming in case of custom inference scripts.

Reading resources

For additional reading, refer to:

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
images		images
lambdas		lambdas
notebooks		notebooks
setup		setup
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
deploy_stack.sh		deploy_stack.sh
destroy_stack.sh		destroy_stack.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-tenant Generative AI gateway with cost and usage tracking on AWS

Project Description

Full Deployment: It deploys the following resources:

API Key Deployment: It deploys the following resources:

API Specifications

OpenAPI 3

Reporting Costs Example

Deploy Stack

Note

Full Deployment

Step 1

Step 2

API Key Deployment

Step 1

Option 1

Option 2

Step 2

SageMaker Endpoints

InferenceComponentName with SageMaker Endpoint

Important note

Reading resources

About

Releases

Packages

Contributors 5

Languages

License

aws-solutions-library-samples/guidance-for-a-multi-tenant-generative-ai-gateway-with-cost-and-usage-tracking-on-aws

Folders and files

Latest commit

History

Repository files navigation

Multi-tenant Generative AI gateway with cost and usage tracking on AWS

Project Description

Full Deployment: It deploys the following resources:

API Key Deployment: It deploys the following resources:

API Specifications

OpenAPI 3

Reporting Costs Example

Deploy Stack

Note

Full Deployment

Step 1

Step 2

API Key Deployment

Step 1

Option 1

Option 2

Step 2

SageMaker Endpoints

InferenceComponentName with SageMaker Endpoint

Important note

Reading resources

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages