AzureCosmosDB · aayush3011 · Apr 22, 2026 · Apr 20, 2026 · Apr 20, 2026 · Apr 20, 2026
diff --git a/.env.template b/.env.template
@@ -9,6 +9,14 @@ COSMOS_DB_DATABASE=ai_memory
 COSMOS_DB_CONTAINER=memories
 COSMOS_DB_COUNTERS_CONTAINER=counter
 COSMOS_DB_LEASE_CONTAINER=leases
+# Throughput mode for all required Cosmos DB containers created by the toolkit
+# (memories, counter, and lease).
+# - serverless: default. The toolkit does not send container RU/s settings.
+#   Use this only with a Cosmos DB account configured for serverless.
+# - autoscale: the toolkit provisions all required containers with autoscale
+#   throughput using COSMOS_DB_AUTOSCALE_MAX_RU as the max RU/s cap.
+#   Default max RU/s is 1000.
+COSMOS_DB_THROUGHPUT_MODE=serverless
 COSMOS_DB_AUTOSCALE_MAX_RU=1000
 
 # ---- Change Feed Thresholds (set to 0 to disable) ----

diff --git a/Docs/README.md b/Docs/README.md
@@ -6,9 +6,9 @@ This folder contains the main project documentation for Agent Memory Toolkit.
 
 | Document | Purpose |
 |----------|---------|
-| [concepts.md](concepts.md) | Explains the core memory model, including memory types (turn, summary, fact, user summary), threads, roles, the processing pipeline, and automatic change feed processing. |
-| [local_testing.md](local_testing.md) | Covers local setup, environment configuration, RBAC, Cosmos provisioning, running the toolkit and Azure Functions locally, and testing change feed auto-processing. |
-| [azure_testing.md](azure_testing.md) | Covers Azure deployment, cloud configuration, required services, change feed settings, and validation steps for running the toolkit in Azure. |
+| [concepts.md](concepts.md) | Explains the core memory model, including memory types (turn, summary, fact, user summary), threads, roles, the processing pipeline, automatic change feed processing, and shared Cosmos throughput configuration. |
+| [local_testing.md](local_testing.md) | Covers local setup, environment configuration, RBAC, Cosmos provisioning, running the toolkit and Azure Functions locally, and testing change feed auto-processing with serverless or autoscale container provisioning. |
+| [azure_testing.md](azure_testing.md) | Covers Azure deployment, cloud configuration, required services, change feed settings, throughput mode configuration, and validation steps for running the toolkit in Azure. |
 | [design_patterns.md](design_patterns.md) | Shows when and how to call CRUD operations, summarization, fact extraction, and memory retrieval in chat and multi-agent applications, including automatic processing via the change feed. |
 
 ## Recommended Reading Order

diff --git a/Docs/azure_testing.md b/Docs/azure_testing.md
@@ -1,6 +1,6 @@
 # Deploying and Testing Agent Memory Toolkit in Azure
 
-This guide covers the minimum Azure resources, deployment steps, and validation order for running the toolkit in Azure.
+This guide covers the minimum Azure resources, deployment steps, throughput settings, and validation order for running the toolkit in Azure.
 
 ---
 
@@ -71,7 +71,7 @@ az cosmosdb create \
   --resource-group <resource-group>
 ```
 
-The toolkit can create the database and container later via `create_memory_store()`.
+The toolkit can create the database and required containers later via `create_memory_store()`.
 
 ---
 
@@ -104,13 +104,18 @@ az functionapp config appsettings set \
     COSMOS_DB_ENDPOINT="https://<cosmos-account-name>.documents.azure.com:443/" \
     COSMOS_DB_DATABASE="ai_memory" \
     COSMOS_DB_CONTAINER="memories" \
+    COSMOS_DB_COUNTERS_CONTAINER="counter" \
+    COSMOS_DB_LEASE_CONTAINER="leases" \
+    COSMOS_DB_THROUGHPUT_MODE="serverless" \
     COSMOS_DB_AUTOSCALE_MAX_RU="1000" \
     AI_FOUNDRY_ENDPOINT="https://<openai-account-name>.openai.azure.com/" \
     EMBEDDING_MODEL="text-embedding-3-large" \
     EMBEDDING_DIMENSIONS="1536" \
     LLM_MODEL="gpt-5-mini"
 ```
 
+`COSMOS_DB_THROUGHPUT_MODE=serverless` is the default and creates the `memories`, `counter`, and `leases` containers without specifying RU/s. Set `COSMOS_DB_THROUGHPUT_MODE=autoscale` to apply the shared `COSMOS_DB_AUTOSCALE_MAX_RU` cap to all required containers.
+
 ### Change feed settings (optional)
 
 To enable automatic processing via the change feed trigger, add these settings:
@@ -122,14 +127,17 @@ az functionapp config appsettings set \
   --settings \
     COSMOS_DB__accountEndpoint="https://<cosmos-account-name>.documents.azure.com:443/" \
     COSMOS_DB_COUNTERS_CONTAINER="counter" \
+    COSMOS_DB_LEASE_CONTAINER="leases" \
+    COSMOS_DB_THROUGHPUT_MODE="serverless" \
+    COSMOS_DB_AUTOSCALE_MAX_RU="1000" \
     THREAD_SUMMARY_EVERY_N="5" \
     FACT_EXTRACTION_EVERY_N="3" \
     USER_SUMMARY_EVERY_N="10"
 ```
 
 Set any threshold to `"0"` to disable that processing type.
 
-The `leases` container is created automatically by the Azure Functions runtime.
+The `leases` container is provisioned by `create_memory_store()` alongside the `memories` and `counter` containers, so the Function App should be configured to use that existing lease container.
 
 If you use function-key auth for the HTTP trigger, keep the key for the client as `ADF_KEY`.
 
@@ -161,6 +169,9 @@ Update `.env` to point at Azure instead of localhost:
 COSMOS_DB_ENDPOINT=https://<cosmos-account-name>.documents.azure.com:443/
 COSMOS_DB_DATABASE=ai_memory
 COSMOS_DB_CONTAINER=memories
+COSMOS_DB_COUNTERS_CONTAINER=counter
+COSMOS_DB_LEASE_CONTAINER=leases
+COSMOS_DB_THROUGHPUT_MODE=serverless
 COSMOS_DB_AUTOSCALE_MAX_RU=1000
 
 AI_FOUNDRY_ENDPOINT=https://<openai-account-name>.openai.azure.com/
@@ -192,6 +203,10 @@ memory = AgentMemory(
     cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
     cosmos_database=os.getenv("COSMOS_DB_DATABASE"),
     cosmos_container=os.getenv("COSMOS_DB_CONTAINER"),
+    cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
+    cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
+    cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
+    cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
     ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
     embedding_model=os.getenv("EMBEDDING_MODEL", "text-embedding-3-large"),
     adf_endpoint=os.getenv("ADF_ENDPOINT"),
@@ -218,6 +233,10 @@ memory = AsyncAgentMemory(
     cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
     cosmos_database=os.getenv("COSMOS_DB_DATABASE"),
     cosmos_container=os.getenv("COSMOS_DB_CONTAINER"),
+    cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
+    cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
+    cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
+    cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
     ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
     embedding_model=os.getenv("EMBEDDING_MODEL", "text-embedding-3-large"),
     adf_endpoint=os.getenv("ADF_ENDPOINT"),
@@ -235,7 +254,7 @@ await memory.connect_cosmos(
 await memory.create_memory_store()
 ```
 
-This provisions the hierarchical partition key (`user_id`, `thread_id`), vector index, full-text index, and autoscale throughput.
+This provisions the `memories`, `counter`, and `leases` containers. `serverless` is the default throughput mode; if you set `COSMOS_DB_THROUGHPUT_MODE=autoscale`, the shared `COSMOS_DB_AUTOSCALE_MAX_RU` value is applied to all three containers.
 
 ---
 

diff --git a/Docs/concepts.md b/Docs/concepts.md
@@ -154,7 +154,16 @@ Set any value to `0` to disable that processing type. For example, setting `THRE
 |-----------|---------------|---------|
 | `memories` | `/user_id`, `/thread_id` (hierarchical) | Existing memory store |
 | `counter` | `/user_id`, `/thread_id` (hierarchical) | Message count tracking for automatic processing |
-| `leases` | `/id` | Auto-created by the trigger for change feed checkpointing |
+| `leases` | `/id` | Change feed checkpointing container created by `create_memory_store()` |
+
+### Throughput configuration
+
+The toolkit provisions all required Cosmos containers under one shared throughput mode:
+
+- `serverless` is the default. The toolkit creates the `memories`, `counter`, and `leases` containers without specifying RU/s.
+- `autoscale` applies the shared `COSMOS_DB_AUTOSCALE_MAX_RU` cap to all three containers.
+
+This keeps the change feed dependencies aligned with the main memory store instead of letting the Functions trigger create the lease container independently.
 
 ### Push vs. pull
 

diff --git a/Docs/local_testing.md b/Docs/local_testing.md
@@ -72,6 +72,9 @@ Minimum `.env` values:
 COSMOS_DB_ENDPOINT=https://<your-account>.documents.azure.com:443/
 COSMOS_DB_DATABASE=ai_memory
 COSMOS_DB_CONTAINER=memories
+COSMOS_DB_COUNTERS_CONTAINER=counter
+COSMOS_DB_LEASE_CONTAINER=leases
+COSMOS_DB_THROUGHPUT_MODE=serverless
 COSMOS_DB_AUTOSCALE_MAX_RU=1000
 
 AI_FOUNDRY_ENDPOINT=https://<your-project>.services.ai.azure.com/
@@ -85,13 +88,18 @@ ADF_KEY=
 
 The Functions runtime uses `azure_functions/local.settings.json`, not `.env`, so mirror the same values there.
 
+`COSMOS_DB_THROUGHPUT_MODE=serverless` is the default and creates the required Cosmos containers without specifying RU/s. If you set `COSMOS_DB_THROUGHPUT_MODE=autoscale`, the toolkit provisions the memories, counter, and lease containers with the shared max RU/s value from `COSMOS_DB_AUTOSCALE_MAX_RU`.
+
 ### Change feed settings (optional)
 
 In `azure_functions/local.settings.json`, add these to enable automatic processing:
 
 ```json
 "COSMOS_DB__accountEndpoint": "https://<your-account>.documents.azure.com:443/",
 "COSMOS_DB_COUNTERS_CONTAINER": "counter",
+"COSMOS_DB_LEASE_CONTAINER": "leases",
+"COSMOS_DB_THROUGHPUT_MODE": "serverless",
+"COSMOS_DB_AUTOSCALE_MAX_RU": "1000",
 "THREAD_SUMMARY_EVERY_N": "5",
 "FACT_EXTRACTION_EVERY_N": "3",
 "USER_SUMMARY_EVERY_N": "10"
@@ -153,6 +161,10 @@ memory = AgentMemory(
     cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
     cosmos_database=os.getenv("COSMOS_DB_DATABASE"),
     cosmos_container=os.getenv("COSMOS_DB_CONTAINER"),
+    cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
+    cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
+    cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
+    cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
     ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
     embedding_model=os.getenv("EMBEDDING_MODEL", "text-embedding-3-large"),
     adf_endpoint=os.getenv("ADF_ENDPOINT", "http://localhost:7071/api"),
@@ -192,6 +204,10 @@ memory = AsyncAgentMemory(
     cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
     cosmos_database=os.getenv("COSMOS_DB_DATABASE"),
     cosmos_container=os.getenv("COSMOS_DB_CONTAINER"),
+    cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
+    cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
+    cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
+    cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
     ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
     embedding_model=os.getenv("EMBEDDING_MODEL", "text-embedding-3-large"),
     adf_endpoint=os.getenv("ADF_ENDPOINT", "http://localhost:7071/api"),
@@ -217,7 +233,7 @@ for r in results:
 await memory.close()
 ```
 
-`create_memory_store()` creates the database/container and configures the hierarchical partition key (`user_id`, `thread_id`), vector index, full-text index, and autoscale throughput.
+`create_memory_store()` creates the database and required containers, configures the hierarchical partition key (`user_id`, `thread_id`) for memories and counters, uses `/id` for the lease container, and applies either serverless or autoscale throughput based on `COSMOS_DB_THROUGHPUT_MODE`.
 
 ---
 

diff --git a/README.md b/README.md
@@ -134,14 +134,20 @@ memory = CosmosMemoryClient(
     cosmos_endpoint=os.getenv("COSMOS_DB_ENDPOINT"),
     cosmos_database=os.getenv("COSMOS_DB_DATABASE"),
     cosmos_container=os.getenv("COSMOS_DB_CONTAINER"),
+    cosmos_counter_container=os.getenv("COSMOS_DB_COUNTERS_CONTAINER", "counter"),
+    cosmos_lease_container=os.getenv("COSMOS_DB_LEASE_CONTAINER", "leases"),
+    cosmos_throughput_mode=os.getenv("COSMOS_DB_THROUGHPUT_MODE", "serverless"),
+    cosmos_autoscale_max_ru=int(os.getenv("COSMOS_DB_AUTOSCALE_MAX_RU", "1000")),
     ai_foundry_endpoint=os.getenv("AI_FOUNDRY_ENDPOINT"),
     embedding_model=os.getenv("EMBEDDING_MODEL", "text-embedding-3-large"),
     adf_endpoint=os.getenv("ADF_ENDPOINT", "http://localhost:7071/api"),
     adf_key=os.getenv("ADF_KEY", ""),
     use_default_credential=True,
     cosmos_credential=DefaultAzureCredential(),
 )
-# Constructor auto-creates the database and container if they don't exist.
+# Constructor auto-creates the database and required containers if they don't exist.
+# `serverless` is the default throughput mode. Set `COSMOS_DB_THROUGHPUT_MODE=autoscale`
+# to provision memories, counter, and lease containers with a shared autoscale RU cap.
 
 # Add directly to Cosmos
 thread_id = str(uuid.uuid4())
@@ -187,7 +193,7 @@ summary = memory.get_user_summary(user_id="user-001")
 | **Azure OpenAI / AI Foundry** | Embedding model + chat model for summarization / fact extraction |
 | **Azure Functions** | Durable Functions orchestrator and activity functions |
 
-Automatic change feed processing stores lightweight counter documents in a dedicated `counter` container and also uses a `leases` container (auto-created). See [concepts.md](Docs/concepts.md#automatic-processing-change-feed) for details.
+Automatic change feed processing stores lightweight counter documents in a dedicated `counter` container and also uses a `leases` container that is provisioned by `create_memory_store()`. Throughput defaults to `serverless`; set `COSMOS_DB_THROUGHPUT_MODE=autoscale` to apply the shared `COSMOS_DB_AUTOSCALE_MAX_RU` cap to the memories, counter, and lease containers. See [concepts.md](Docs/concepts.md#automatic-processing-change-feed) for details.
 
 All services use **Entra ID** auth via `DefaultAzureCredential`.