Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .env.template
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@ COSMOS_DB_DATABASE=ai_memory
COSMOS_DB_CONTAINER=memories
COSMOS_DB_COUNTERS_CONTAINER=counter
COSMOS_DB_LEASE_CONTAINER=leases
# Throughput mode for all required Cosmos DB containers created by the toolkit
# (memories, counter, and lease).
# - serverless: default. The toolkit does not send container RU/s settings.
# Use this only with a Cosmos DB account configured for serverless.
# - autoscale: the toolkit provisions all required containers with autoscale
# throughput using COSMOS_DB_AUTOSCALE_MAX_RU as the max RU/s cap.
# Default max RU/s is 1000.
COSMOS_DB_THROUGHPUT_MODE=serverless
COSMOS_DB_AUTOSCALE_MAX_RU=1000

# ---- Change Feed Thresholds (set to 0 to disable) ----
Expand Down
6 changes: 3 additions & 3 deletions Docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ This folder contains the main project documentation for Agent Memory Toolkit.

| Document | Purpose |
|----------|---------|
| [concepts.md](concepts.md) | Explains the core memory model, including memory types (turn, summary, fact, user summary), threads, roles, the processing pipeline, and automatic change feed processing. |
| [local_testing.md](local_testing.md) | Covers local setup, environment configuration, RBAC, Cosmos provisioning, running the toolkit and Azure Functions locally, and testing change feed auto-processing. |
| [azure_testing.md](azure_testing.md) | Covers Azure deployment, cloud configuration, required services, change feed settings, and validation steps for running the toolkit in Azure. |
| [concepts.md](concepts.md) | Explains the core memory model, including memory types (turn, summary, fact, user summary), threads, roles, the processing pipeline, automatic change feed processing, and shared Cosmos throughput configuration. |
| [local_testing.md](local_testing.md) | Covers local setup, environment configuration, RBAC, Cosmos provisioning, running the toolkit and Azure Functions locally, and testing change feed auto-processing with serverless or autoscale container provisioning. |
| [azure_testing.md](azure_testing.md) | Covers Azure deployment, cloud configuration, required services, change feed settings, throughput mode configuration, and validation steps for running the toolkit in Azure. |
| [design_patterns.md](design_patterns.md) | Shows when and how to call CRUD operations, summarization, fact extraction, and memory retrieval in chat and multi-agent applications, including automatic processing via the change feed. |

## Recommended Reading Order
Expand Down
27 changes: 23 additions & 4 deletions Docs/azure_testing.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Deploying and Testing Agent Memory Toolkit in Azure

This guide covers the minimum Azure resources, deployment steps, and validation order for running the toolkit in Azure.
This guide covers the minimum Azure resources, deployment steps, throughput settings, and validation order for running the toolkit in Azure.

---

Expand Down Expand Up @@ -71,7 +71,7 @@ az cosmosdb create \
--resource-group <resource-group>
```

The toolkit can create the database and container later via `create_memory_store()`.
The toolkit can create the database and required containers later via `create_memory_store()`.

---

Expand Down Expand Up @@ -104,13 +104,18 @@ az functionapp config appsettings set \
COSMOS_DB_ENDPOINT="https://<cosmos-account-name>.documents.azure.com:443/" \
COSMOS_DB_DATABASE="ai_memory" \
COSMOS_DB_CONTAINER="memories" \
COSMOS_DB_COUNTERS_CONTAINER="counter" \
COSMOS_DB_LEASE_CONTAINER="leases" \
COSMOS_DB_THROUGHPUT_MODE="serverless" \
COSMOS_DB_AUTOSCALE_MAX_RU="1000" \
AI_FOUNDRY_ENDPOINT="https://<openai-account-name>.openai.azure.com/" \
EMBEDDING_MODEL="text-embedding-3-large" \
EMBEDDING_DIMENSIONS="1536" \
LLM_MODEL="gpt-5-mini"
```

`COSMOS_DB_THROUGHPUT_MODE=serverless` is the default and creates the `memories`, `counter`, and `leases` containers without specifying RU/s. Set `COSMOS_DB_THROUGHPUT_MODE=autoscale` to apply the shared `COSMOS_DB_AUTOSCALE_MAX_RU` cap to all required containers.

### Change feed settings (optional)

To enable automatic processing via the change feed trigger, add these settings:
Expand All @@ -122,14 +127,17 @@ az functionapp config appsettings set \
--settings \
COSMOS_DB__accountEndpoint="https://<cosmos-account-name>.documents.azure.com:443/" \
COSMOS_DB_COUNTERS_CONTAINER="counter" \
COSMOS_DB_LEASE_CONTAINER="leases" \
COSMOS_DB_THROUGHPUT_MODE="serverless" \
COSMOS_DB_AUTOSCALE_MAX_RU="1000" \
THREAD_SUMMARY_EVERY_N="5" \
FACT_EXTRACTION_EVERY_N="3" \
USER_SUMMARY_EVERY_N="10"
```

Set any threshold to `"0"` to disable that processing type.

The `leases` container is created automatically by the Azure Functions runtime.
The `leases` container is provisioned by `create_memory_store()` alongside the `memories` and `counter` containers, so the Function App should be configured to use that existing lease container.

If you use function-key auth for the HTTP trigger, keep the key for the client as `ADF_KEY`.

Expand Down Expand Up @@ -161,6 +169,9 @@ Update `.env` to point at Azure instead of localhost:
COSMOS_DB_ENDPOINT=https://<cosmos-account-name>.documents.azure.com:443/
COSMOS_DB_DATABASE=ai_memory
COSMOS_DB_CONTAINER=memories
COSMOS_DB_COUNTERS_CONTAINER=counter
COSMOS_DB_LEASE_CONTAINER=leases
COSMOS_DB_THROUGHPUT_MODE=serverless
COSMOS_DB_AUTOSCALE_MAX_RU=1000

AI_FOUNDRY_ENDPOINT=https://<openai-account-name>.openai.azure.com/
Expand Down Expand Up @@ -192,6 +203,10 @@ memory = AgentMemory(
cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
cosmos_database=os.getenv("COSMOS_DB_DATABASE"),
cosmos_container=os.getenv("COSMOS_DB_CONTAINER"),
cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
embedding_model=os.getenv("EMBEDDING_MODEL", "text-embedding-3-large"),
adf_endpoint=os.getenv("ADF_ENDPOINT"),
Expand All @@ -218,6 +233,10 @@ memory = AsyncAgentMemory(
cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
cosmos_database=os.getenv("COSMOS_DB_DATABASE"),
cosmos_container=os.getenv("COSMOS_DB_CONTAINER"),
cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
Comment thread
jcodella marked this conversation as resolved.
embedding_model=os.getenv("EMBEDDING_MODEL", "text-embedding-3-large"),
adf_endpoint=os.getenv("ADF_ENDPOINT"),
Expand All @@ -235,7 +254,7 @@ await memory.connect_cosmos(
await memory.create_memory_store()
```

This provisions the hierarchical partition key (`user_id`, `thread_id`), vector index, full-text index, and autoscale throughput.
This provisions the `memories`, `counter`, and `leases` containers. `serverless` is the default throughput mode; if you set `COSMOS_DB_THROUGHPUT_MODE=autoscale`, the shared `COSMOS_DB_AUTOSCALE_MAX_RU` value is applied to all three containers.

---

Expand Down
11 changes: 10 additions & 1 deletion Docs/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,16 @@ Set any value to `0` to disable that processing type. For example, setting `THRE
|-----------|---------------|---------|
| `memories` | `/user_id`, `/thread_id` (hierarchical) | Existing memory store |
| `counter` | `/user_id`, `/thread_id` (hierarchical) | Message count tracking for automatic processing |
| `leases` | `/id` | Auto-created by the trigger for change feed checkpointing |
| `leases` | `/id` | Change feed checkpointing container created by `create_memory_store()` |

### Throughput configuration

The toolkit provisions all required Cosmos containers under one shared throughput mode:

- `serverless` is the default. The toolkit creates the `memories`, `counter`, and `leases` containers without specifying RU/s.
- `autoscale` applies the shared `COSMOS_DB_AUTOSCALE_MAX_RU` cap to all three containers.

This keeps the change feed dependencies aligned with the main memory store instead of letting the Functions trigger create the lease container independently.

### Push vs. pull

Expand Down
18 changes: 17 additions & 1 deletion Docs/local_testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,9 @@ Minimum `.env` values:
COSMOS_DB_ENDPOINT=https://<your-account>.documents.azure.com:443/
COSMOS_DB_DATABASE=ai_memory
COSMOS_DB_CONTAINER=memories
COSMOS_DB_COUNTERS_CONTAINER=counter
COSMOS_DB_LEASE_CONTAINER=leases
COSMOS_DB_THROUGHPUT_MODE=serverless
COSMOS_DB_AUTOSCALE_MAX_RU=1000

AI_FOUNDRY_ENDPOINT=https://<your-project>.services.ai.azure.com/
Expand All @@ -85,13 +88,18 @@ ADF_KEY=

The Functions runtime uses `azure_functions/local.settings.json`, not `.env`, so mirror the same values there.

`COSMOS_DB_THROUGHPUT_MODE=serverless` is the default and creates the required Cosmos containers without specifying RU/s. If you set `COSMOS_DB_THROUGHPUT_MODE=autoscale`, the toolkit provisions the memories, counter, and lease containers with the shared max RU/s value from `COSMOS_DB_AUTOSCALE_MAX_RU`.

Comment thread
jcodella marked this conversation as resolved.
### Change feed settings (optional)

In `azure_functions/local.settings.json`, add these to enable automatic processing:

```json
"COSMOS_DB__accountEndpoint": "https://<your-account>.documents.azure.com:443/",
"COSMOS_DB_COUNTERS_CONTAINER": "counter",
"COSMOS_DB_LEASE_CONTAINER": "leases",
"COSMOS_DB_THROUGHPUT_MODE": "serverless",
"COSMOS_DB_AUTOSCALE_MAX_RU": "1000",
"THREAD_SUMMARY_EVERY_N": "5",
"FACT_EXTRACTION_EVERY_N": "3",
"USER_SUMMARY_EVERY_N": "10"
Expand Down Expand Up @@ -153,6 +161,10 @@ memory = AgentMemory(
cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
cosmos_database=os.getenv("COSMOS_DB_DATABASE"),
cosmos_container=os.getenv("COSMOS_DB_CONTAINER"),
cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
embedding_model=os.getenv("EMBEDDING_MODEL", "text-embedding-3-large"),
adf_endpoint=os.getenv("ADF_ENDPOINT", "http://localhost:7071/api"),
Expand Down Expand Up @@ -192,6 +204,10 @@ memory = AsyncAgentMemory(
cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
cosmos_database=os.getenv("COSMOS_DB_DATABASE"),
cosmos_container=os.getenv("COSMOS_DB_CONTAINER"),
cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
Comment thread
jcodella marked this conversation as resolved.
ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
embedding_model=os.getenv("EMBEDDING_MODEL", "text-embedding-3-large"),
adf_endpoint=os.getenv("ADF_ENDPOINT", "http://localhost:7071/api"),
Expand All @@ -217,7 +233,7 @@ for r in results:
await memory.close()
```

`create_memory_store()` creates the database/container and configures the hierarchical partition key (`user_id`, `thread_id`), vector index, full-text index, and autoscale throughput.
`create_memory_store()` creates the database and required containers, configures the hierarchical partition key (`user_id`, `thread_id`) for memories and counters, uses `/id` for the lease container, and applies either serverless or autoscale throughput based on `COSMOS_DB_THROUGHPUT_MODE`.

---

Expand Down
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,14 +134,20 @@ memory = CosmosMemoryClient(
cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
cosmos_database=os.getenv("COSMOS_DB_DATABASE"),
cosmos_container=os.getenv("COSMOS_DB_CONTAINER"),
cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
embedding_model=os.getenv("EMBEDDING_MODEL", "text-embedding-3-large"),
adf_endpoint=os.getenv("ADF_ENDPOINT", "http://localhost:7071/api"),
adf_key=os.getenv("ADF_KEY", ""),
use_default_credential=True,
cosmos_credential=DefaultAzureCredential(),
)
# Constructor auto-creates the database and container if they don't exist.
# Constructor auto-creates the database and required containers if they don't exist.
# `serverless` is the default throughput mode. Set `COSMOS_DB_THROUGHPUT_MODE=autoscale`
# to provision memories, counter, and lease containers with a shared autoscale RU cap.

# Add directly to Cosmos
thread_id = str(uuid.uuid4())
Expand Down Expand Up @@ -187,7 +193,7 @@ summary = memory.get_user_summary(user_id="user-001")
| **Azure OpenAI / AI Foundry** | Embedding model + chat model for summarization / fact extraction |
| **Azure Functions** | Durable Functions orchestrator and activity functions |

Automatic change feed processing stores lightweight counter documents in a dedicated `counter` container and also uses a `leases` container (auto-created). See [concepts.md](Docs/concepts.md#automatic-processing-change-feed) for details.
Automatic change feed processing stores lightweight counter documents in a dedicated `counter` container and also uses a `leases` container that is provisioned by `create_memory_store()`. Throughput defaults to `serverless`; set `COSMOS_DB_THROUGHPUT_MODE=autoscale` to apply the shared `COSMOS_DB_AUTOSCALE_MAX_RU` cap to the memories, counter, and lease containers. See [concepts.md](Docs/concepts.md#automatic-processing-change-feed) for details.

All services use **Entra ID** auth via `DefaultAzureCredential`.

Expand Down
Loading
Loading