# Deploying a Generative AI Toolkit agent on AWS Lambda

Here's a minimal rundown of what it takes to deploy your Generative AI Toolkit agent on **AWS Lambda** and expose it as a **Function URL**.


## Step 1: Create a DynamoDB table for conversation history and traces

In order to persist conversation history and traces, we'll need a DynamoDB table. Generative AI Toolkit needs a table with partition key `pk`, sort key `sk`, and a GSI with partition key `conversation_id` and sort key `sk`.

Here's how to quickly create one with the CLI:


In [None]:
!aws dynamodb create-table \
  --table-name MyAgentTable \
  --attribute-definitions \
    AttributeName=pk,AttributeType=S \
    AttributeName=sk,AttributeType=S \
    AttributeName=conversation_id,AttributeType=S \
  --key-schema \
    AttributeName=pk,KeyType=HASH \
    AttributeName=sk,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST \
  --global-secondary-indexes '[{"IndexName":"by_conversation_id","KeySchema":[{"AttributeName":"conversation_id","KeyType":"HASH"},{"AttributeName":"sk","KeyType":"RANGE"}],"Projection":{"ProjectionType":"ALL"}}]'

Here's CDK code for the same, with two additions:

1. A stream is configured on the table. You would use that for calculating metrics on your deployed agents (see other docs, we won't do that in this notebook).
2. A `timeToLiveAttribute` is configured, so traces are automatically deleted after 30 days (configurable, doesn't apply to conversation history).

```typescript
// Create table:
const agentTable = new cdk.aws_dynamodb.Table(this, "MyAgentTable", {
    partitionKey: {
      name: "pk",
      type: cdk.aws_dynamodb.AttributeType.STRING,
    },
    sortKey: {
      name: "sk",
      type: cdk.aws_dynamodb.AttributeType.STRING,
    },
    removalPolicy: cdk.RemovalPolicy.DESTROY,
    stream: cdk.aws_dynamodb.StreamViewType.NEW_IMAGE,
    pointInTimeRecovery: true,
    encryption: cdk.aws_dynamodb.TableEncryption.CUSTOMER_MANAGED,
    billingMode: cdk.aws_dynamodb.BillingMode.PAY_PER_REQUEST,
    timeToLiveAttribute: "expire_at",
});
// GSI:
agentTable.addGlobalSecondaryIndex({
    indexName: "by_conversation_id",
    partitionKey: {
      name: "conversation_id",
      type: cdk.aws_dynamodb.AttributeType.STRING,
    },
    sortKey: {
      name: "sk",
      type: cdk.aws_dynamodb.AttributeType.STRING,
    },
    projectionType: cdk.aws_dynamodb.ProjectionType.ALL,
});
```

## Step 2: Create Agent and Tool

A very minimal Generative AI Toolkit agent would only subclass the `BedrockConverseAgent` and e.g. set a system prompt and have one tool:

In [None]:
from generative_ai_toolkit.agent import BedrockConverseAgent
from generative_ai_toolkit.conversation_history import DynamoDbConversationHistory
from generative_ai_toolkit.tracer.dynamodb import DynamoDbTracer


class MyAgent(BedrockConverseAgent):
    def __init__(self):
        super().__init__(
            model_id="anthropic.claude-3-haiku-20240307-v1:0",
            temperature=0.0,
            system_prompt="You are a helpful assistant. Use your tools to help the user as well as you can.",
            conversation_history=DynamoDbConversationHistory(
                table_name="MyAgentTable"
            ),
            tracer=DynamoDbTracer(table_name="MyAgentTable"),
        )

def get_current_weather_report(city_name: str):
    """
    Gets the current weather report for a city.

    Parameters
    ----------
    city_name : str
        The city name, e.g. "New York City", "Paris", "Amsterdam"
    """

    return f"It's currently very sunny in {city_name}."

my_agent = MyAgent()
my_agent.register_tool(get_current_weather_report)

## Step 3: Quick local test of your agent

The agent will invoke the LLM and the tool to help you:

In [None]:
my_agent.reset() # Ensure we're in a new conversation, should you execute this notebook cell multiple times ;)

for tokens in my_agent.converse_stream("Hi there! I'm in a train to Amsterdam. Tell me what the weather is there currently."):
    print(tokens, end="", flush=True)

See everything that happened under the hood by inspecting the traces:

In [None]:
for trace in my_agent.traces:
    print(trace.as_human_readable())

There's also a Web UI to inspect the traces:

In [None]:
from generative_ai_toolkit.ui import traces_ui
demo = traces_ui(my_agent.traces)
demo.launch()

## Step 4: Deploying to AWS Lambda

In this example, we'll package our Lambda function as a Dockerfile (just a choice).

### Dockerfile

Here's a Dockerfile that would work. Note that we're using the [AWS Lambda Web Adapter](https://github.com/awslabs/aws-lambda-web-adapter) to run our agent as a HTTP server inside Lambda. This works well for response streaming, when we expose the function with a Function URL.

```Dockerfile
# Docker file:
FROM public.ecr.aws/docker/library/python:3.12-slim

# AWS Lambda Web Adapter
COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.8.4 /lambda-adapter /opt/extensions/lambda-adapter

ENV UV_COMPILE_BYTECODE=1 \
    UV_SYSTEM_PYTHON=1

WORKDIR /var/task

COPY --from=ghcr.io/astral-sh/uv:0.4.0 /uv /bin/uv

RUN uv pip install "generative_ai_toolkit[run-agent]"

# This presumes you have saved the agent code above in agent.py:
COPY agent.py ./

CMD ["gunicorn", "-b=:8080", "agent:app()"]
```

### AWS CDK

Here's the AWS CDK code to deploy the Lambda function and enable the Function URL:

```typescript
// Function:
const fn = new cdk.aws_lambda.DockerImageFunction(this, "Agent", {
  code: cdk.aws_lambda.DockerImageCode.fromImageAsset(
    // Let's presume this dir is where you saved both the agent.py and the Dockerfile:
    path.join(__dirname, "my-agent-dir")
  ),
  memorySize: 1024,
  timeout: cdk.Duration.minutes(5),
  environment: {
    AWS_LWA_INVOKE_MODE: "RESPONSE_STREAM",
    CONVERSATION_HISTORY_TABLE_NAME: agentTable.tableName,
    TRACES_TABLE_NAME: agentTable.tableName,
  },
});

// Read write permission on the DynamoDB table:
agentTable.grantReadWriteData(fn);

// Permission to invoke Bedrock LLMs
fn.addToRolePolicy(
  new cdk.aws_iam.PolicyStatement({
    actions: [
      "bedrock:InvokeModelWithResponseStream",
      "bedrock:InvokeModel",
    ],
    resources: [`arn:${cdk.Aws.Partition}:bedrock:${cdk.Aws.Region}::foundation-model/anthropic.claude-3-haiku-20240307-v1:0`],
    effect: cdk.aws_iam.Effect.ALLOW,
  })
);

// Expose as Function URL
const lambdaUrl = fn.addFunctionUrl({
  authType: cdk.aws_lambda.FunctionUrlAuthType.AWS_IAM,
  invokeMode: cdk.aws_lambda.InvokeMode.RESPONSE_STREAM,
});
```

## Step 5: Invoke your agent

We can use curl, as it supports AWS IAM AUTH. Pass the value of `my_agent.conversation_id` in as `CONVERSATION_ID` and the agent will continue the conversation, i.e. understand that you were talking about "Amsterdam" earlier:

In [None]:
!curl -v \
  https://your-lambda-function-url \
  --data '{"user_input": "And what are some touristic highlights of the city?"}' \
  --header "x-conversation-id: $CONVERSATION_ID" \
  --header "Content-Type: application/json" \
  --header "x-amz-security-token: $AWS_SESSION_TOKEN" \
  --no-buffer \
  --user "${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}" \
  --aws-sigv4 "aws:amz:$AWS_REGION:lambda"

The Generative AI Toolkit also includes helper code to invoke the Lambda function URL programmatically from Python:

In [None]:
from generative_ai_toolkit.utils.lambda_url import IamAuthInvoker

lambda_url_invoker = IamAuthInvoker("https://<your-lambda-function-url>")
response = lambda_url_invoker.converse_stream(
    user_input="And what are some famous museums there?",
    conversation_id=my_agent.conversation_id
)

print("Conversation ID:", response.conversation_id)
print()
for tokens in response:
    print(tokens, end="", flush=True)

That's it!

Happy coding