Memory Types, Pluggable Processors, and Function App Hand-off#7
Conversation
…al/episodic retrieval, and ProcessingPipeline - Replace ProcessingClient (Azure Durable Functions HTTP) with local ProcessingPipeline - Add LLMClient alongside EmbeddingsClient; remove adf_endpoint/adf_key params - Add tags, ttl, salience parameters to add_local() and add_cosmos() - Add tag filters (AND/OR/NOT), include_superseded, min_salience to get_memories(), search_cosmos(), and get_thread() - Add add_tags() and remove_tags() for tag management - Add get_procedural_memories() and search_episodic_memories() retrieval - Add build_procedural_context() and build_episodic_context() formatters - Add extract_memories(), deduplicate_facts() pipeline methods - Enable per-document TTL via default_ttl=-1 in create_memory_store() - Export LLMClient from __init__.py - Update tests for new default superseded_by filter and pipeline delegation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove adf_endpoint/adf_key params; add llm_model param with LLMClient - Add tags, ttl, salience params to add_local() and add_cosmos() - Add tags, any_tags, exclude_tags, include_superseded, min_salience to get_memories(), search_cosmos(), get_thread() - Add new methods: add_tags(), remove_tags(), get_procedural_memories(), search_episodic_memories(), build_procedural_context(), build_episodic_context(), extract_memories(), deduplicate_facts() - Replace ADF-based processing with ProcessingPipeline delegation (uses sync Cosmos container internally) - Pass default_ttl=-1 in create_memory_store() - Delete aio/processing.py (replaced by pipeline.py) - Update aio/__init__.py exports - Update async client tests to match new API Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR expands the Agent Memory Toolkit SDK to support additional memory types (procedural/episodic), introduces a pluggable processing abstraction (in-process vs. Durable Function App hand-off), migrates prompts to Prompty templates, and adds a production-shaped sibling Azure Functions Durable deployment (plus infra and tests) that reuses the same SDK pipeline logic.
Changes:
- Added
MemoryProcessor/AsyncMemoryProcessorprotocol + in-process and durable “no-op” processor implementations; updatedflush/flush_and_waitto delegate via the processor seam. - Migrated prompt assets from Markdown to Prompty 2.0 templates and added new prompts for extraction, summarization, user summaries, and deduplication.
- Added a new
function_app/(Durable orchestration + change-feed counters) andinfra/(azd+Bicep) deployment stack, with significant test suite expansion and sample updates.
Reviewed changes
Copilot reviewed 111 out of 117 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_utils.py | Adds unit coverage for new utils: content hashing, default TTLs, and enriched _make_memory fields. |
| tests/unit/test_query_builder.py | Extends query-builder tests for tag/supersedence-related query helpers. |
| tests/unit/test_processing.py | Removes legacy Durable Functions HTTP client tests (client removed). |
| tests/unit/test_models.py | Expands model tests for new memory types and new fields (tags, salience, TTL, lineage). |
| tests/unit/test_flush.py | Adds sync processor-delegation tests for flush / flush_and_wait. |
| tests/unit/test_exceptions.py | Adds tests for new exception types (DuplicateMemoryError, LLMError). |
| tests/unit/test_cosmos_memory_client.py | Updates client tests to reflect superseded filtering and pipeline-based processing behavior. |
| tests/unit/test_chat.py | Adds unit tests for new ChatClient behavior (kwargs shaping, retries, close). |
| tests/unit/processors/test_protocol_satisfaction.py | Runtime protocol conformance tests for MemoryProcessor implementations. |
| tests/unit/processors/test_inprocess.py | Tests ordering/behavior of pipeline delegation in InProcessProcessor. |
| tests/unit/processors/test_durable.py | Tests DurableFunctionProcessor no-op behavior. |
| tests/unit/processors/test_base.py | Tests protocol/result dataclass defaults and constructor validation. |
| tests/unit/function_app/test_retry.py | Tests Durable retry-options helper without depending on SDK constructor signature. |
| tests/unit/function_app/test_pipeline_factory.py | Tests function-app pipeline factory wiring and caching. |
| tests/unit/function_app/conftest.py | Adds sys.path bootstrap so tests can import function_app/ modules. |
| tests/unit/aio/test_processing.py | Removes legacy async Durable Functions HTTP client tests (client removed). |
| tests/unit/aio/test_flush.py | Adds async processor-delegation tests for flush / flush_and_wait. |
| tests/unit/aio/test_cosmos_memory_client.py | Updates async client tests for pipeline-based processing and cleanup changes. |
| tests/unit/aio/processors/test_protocol_satisfaction.py | Runtime protocol conformance tests for AsyncMemoryProcessor. |
| tests/unit/aio/processors/test_inprocess.py | Tests async in-process wrapper behavior and delegation. |
| tests/unit/aio/processors/test_durable.py | Tests async durable no-op behavior. |
| tests/unit/aio/processors/test_base.py | Tests async protocol/result defaults and constructor validation. |
| tests/integration/test_processor_integration.py | Adds “integration-style” tests validating processor wiring end-to-end with mocked Cosmos. |
| tests/integration/test_processor_integration_async.py | Async mirror of processor wiring integration tests. |
| tests/integration/test_changefeed_integration.py | Updates change-feed integration gating + Cosmos key fallback + counter-id conventions. |
| tests/conftest.py | Updates shared integration fixtures and env var naming for chat/embedding deployments + Cosmos key. |
| Samples/scenario_tagging_and_filtering.py | New sample demonstrating tag writing and retrieval filtering (AND/OR/NOT + semantic). |
| Samples/scenario_remote_processor.py | New sample showing durable/no-op processor configuration and polling behavior. |
| Samples/scenario_remote_processor_async.py | Async variant of the remote processor sample. |
| Samples/scenario_rag_with_memory.py | Updates sample configuration surface for new env vars / client params. |
| Samples/scenario_multi_agent.py | Updates sample configuration surface for new env vars / client params. |
| Samples/scenario_counter_tuning.py | New sample describing Durable counter tuning and polling behavior. |
| Samples/scenario_chat_memory.py | Updates sample configuration surface for new env vars / client params. |
| Samples/quickstart_cosmos.py | Updates quickstart env vars and clarifies “turn” write path does not require embeddings. |
| Samples/processing_fact_extraction.py | Refactors sample to in-process extraction of facts/procedurals/episodics + updated env vars. |
| Samples/advanced_search_patterns.py | Updates printing logic and client configuration surface. |
| pyproject.toml | Adds packaged prompt assets + new dependencies for prompty/jinja2/typing_extensions. |
| infra/README.md | New azd+Bicep infra documentation and operator guidance. |
| infra/main.bicep | Subscription-scoped infra entry point wiring Cosmos + AI Foundry + MI + RBAC + Function App + outputs. |
| infra/main.parameters.json | azd parameter bindings for the infra stack. |
| infra/abbreviations.json | Naming prefix catalog for deterministic resource names. |
| infra/modules/rbac.bicep | Adds RBAC wiring across Cosmos, AI Foundry, and Storage (queues/tables/blob) for MI and user. |
| infra/modules/identity.bicep | Adds user-assigned managed identity module. |
| infra/modules/cosmos.bicep | Adds Cosmos serverless + containers (memories/leases/counter) with vector/full-text config. |
| infra/modules/ai-foundry.bicep | Adds AIServices (AOAI-compatible) account + chat/embedding deployments. |
| azure.yaml | Adds azd project definition wiring infra + function_app service deployment. |
| function_app/shared/pipeline_factory.py | Lazy pipeline factory reusing SDK ProcessingPipeline with MI auth. |
| function_app/shared/counters.py | ETag-guarded counter increments with LSN replay protection + threshold helpers. |
| function_app/shared/cosmos_clients.py | Cached sync/async Cosmos clients for activities vs. change-feed trigger. |
| function_app/shared/config.py | Central function-app configuration parsing and env var conventions. |
| function_app/shared/init.py | Shared helper package marker for function app code. |
| function_app/requirements.txt | Function app dependencies, including vendored SDK wheel. |
| function_app/orchestrators/user_summary.py | Durable orchestrator + activities for user summary generation/persist observability. |
| function_app/orchestrators/thread_summary.py | Durable orchestrator + activities for thread summarization/persist observability. |
| function_app/orchestrators/extract_memories.py | Durable orchestrator + activities for extraction/dedup + optional salience filter sweep. |
| function_app/orchestrators/_retry.py | Centralized Durable retry options helper. |
| function_app/orchestrators/init.py | Orchestrator module documentation and intent. |
| function_app/local.settings.json.template | Local settings template for running the function app with MI-style env vars. |
| function_app/host.json | Durable hubName configuration. |
| function_app/function_app.py | DFApp entry point registering trigger + orchestrator blueprints. |
| Docs/local_testing.md | Updates docs to new env var names for chat/embedding deployment settings. |
| Docs/design_patterns.md | Updates docs to new embedding deployment naming. |
| Docs/azure_testing.md | Updates docs to new env var names for chat/embedding deployment settings. |
| azure_functions/requirements.txt | Removes legacy proof-of-concept function app requirements. |
| azure_functions/prompts/user_summary.md | Removes legacy markdown prompt. |
| azure_functions/prompts/summarize.md | Removes legacy markdown prompt. |
| azure_functions/prompts/facts.md | Removes legacy markdown prompt. |
| azure_functions/local.settings.json.template | Removes legacy local.settings template. |
| agent_memory_toolkit/prompts/user_summary.prompty | Adds Prompty template for user summary generation. |
| agent_memory_toolkit/prompts/user_summary_update.prompty | Adds Prompty template for incremental user summary updates. |
| agent_memory_toolkit/prompts/summarize.prompty | Adds Prompty template for thread summarization. |
| agent_memory_toolkit/prompts/summarize_update.prompty | Adds Prompty template for incremental summary updates. |
| agent_memory_toolkit/prompts/dedup.prompty | Adds Prompty template for dedup action selection. |
| agent_memory_toolkit/processors/inprocess.py | Implements in-process MemoryProcessor backed by ProcessingPipeline. |
| agent_memory_toolkit/processors/durable.py | Adds durable “marker” processor that intentionally no-ops locally. |
| agent_memory_toolkit/processors/base.py | Introduces processor protocol and result dataclasses. |
| agent_memory_toolkit/processors/init.py | Exports processor protocol + built-in implementations. |
| agent_memory_toolkit/processing.py | Removes legacy sync Durable Functions HTTP client. |
| agent_memory_toolkit/models.py | Adds new memory types + tags/salience/ttl/hash/lineage fields + validation and serialization. |
| agent_memory_toolkit/exceptions.py | Adds LLMError and DuplicateMemoryError. |
| agent_memory_toolkit/aio/processors/inprocess.py | Async wrapper around sync pipeline using asyncio.to_thread. |
| agent_memory_toolkit/aio/processors/durable.py | Async durable no-op processor. |
| agent_memory_toolkit/aio/processors/base.py | Adds AsyncMemoryProcessor protocol mirroring sync. |
| agent_memory_toolkit/aio/processors/init.py | Exports async processor protocol + built-ins. |
| agent_memory_toolkit/aio/processing.py | Removes legacy async Durable Functions HTTP client. |
| agent_memory_toolkit/aio/init.py | Updates async package exports to processor protocol and result types. |
| agent_memory_toolkit/_utils.py | Adds content hashing + TTL defaults + richer memory creation + container policy tweaks. |
| agent_memory_toolkit/_query_builder.py | Adds composable query clauses for tags + null/defined checks. |
| agent_memory_toolkit/init.py | Exports new public surface: ChatClient, processor abstractions, new exceptions. |
| .env.template | Updates env template to new var names (and adds Cosmos key fallback). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 111 out of 117 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 111 out of 117 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 111 out of 117 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (2)
Docs/local_testing.md:87
ADF_ENDPOINT/ADF_KEYare still listed as required.envvalues, but this PR removes the Durable Functions HTTP client in favor of change-feed processing +DurableFunctionProcessoras a marker/no-op. These settings look obsolete now; consider removing them (or replacing with the new Function App/change-feed configuration knobs) to avoid sending users down a dead configuration path.
AI_FOUNDRY_ENDPOINT=https://<your-project>.services.ai.azure.com/
AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME=text-embedding-3-large
AI_FOUNDRY_EMBEDDING_DIMENSIONS=1536
AI_FOUNDRY_CHAT_DEPLOYMENT_NAME=gpt-5-mini
ADF_ENDPOINT=http://localhost:7071/api
ADF_KEY=
**Docs/azure_testing.md:184**
* This Azure testing guide still instructs users to configure and use `ADF_ENDPOINT` / `ADF_KEY` and to deploy `azure_functions/`, but this PR moves processing to the new `function_app/` change-feed + Durable orchestration flow and removes the Durable Functions HTTP client from the SDK. The environment variable section should be updated to reflect the new architecture (and avoid pointing users at ADF HTTP-trigger configuration that no longer exists).
AI_FOUNDRY_ENDPOINT=https://.openai.azure.com/
AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME=text-embedding-3-large
AI_FOUNDRY_EMBEDDING_DIMENSIONS=1536
AI_FOUNDRY_CHAT_DEPLOYMENT_NAME=gpt-5-mini
ADF_ENDPOINT=https://.azurewebsites.net/api
ADF_KEY=
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 116 out of 122 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
Docs/local_testing.md:87
- The env var list still includes
ADF_ENDPOINT/ADF_KEYand references the legacyazure_functionsproject, but the Durable Functions HTTP client has been removed and the sibling Functions project now lives underfunction_app/. Please update this section to reflect the new Function App hand-off model (DurableFunctionProcessor + change feed) and remove/rename the obsolete ADF guidance.
COSMOS_DB_AUTOSCALE_MAX_RU=1000
AI_FOUNDRY_ENDPOINT=https://<your-project>.services.ai.azure.com/
AI_FOUNDRY_EMBEDDING_DEPLOYMENT_NAME=text-embedding-3-large
AI_FOUNDRY_EMBEDDING_DIMENSIONS=1536
AI_FOUNDRY_CHAT_DEPLOYMENT_NAME=gpt-5-mini
ADF_ENDPOINT=http://localhost:7071/api
ADF_KEY=
</details>
---
💡 <a href="/AzureCosmosDB/AgentMemoryToolkit/new/main?filename=.github/instructions/*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.
PR: Memory Types, Pluggable Processors, and Function App Hand-off
Pluggable processor architecture
The client no longer owns processing directly. It now talks to a
MemoryProcessor(agent_memory_toolkit/processors/base.py, mirrored underaio/processors/base.py) - atyping.Protocolwith four methods:process_thread_writes,summarize_thread,extract_memories,generate_user_summary. Returns are typed dataclasses(
ProcessThreadResult,ExtractionCounts,UserSummaryResult).Two implementations ship in this PR:
InProcessProcessor(default) - runs theProcessingPipeline(promptrendering, LLM calls, dedup, supersede, Cosmos writes) inside the caller's
process. This is what notebooks, unit tests, and single-process services use.
DurableFunctionProcessor- a no-op marker. Writes still go to Cosmosvia the client's existing path; the sibling Function App picks them up via
the change feed and runs the same pipeline server-side. The client polls
get_thread_summary/get_user_summary/get_memoriesto observeresults.
Why a seam at all: the same client now serves "local dev with zero infra" and
"production with durable, isolated, scalable processing" without two clients
or a feature flag. Switching to the FA processor immediately removes LLM
latency from the request path with no caller-side code change. Tests mock one
small protocol instead of Cosmos + AOAI + Prompty + embeddings. A future
BatchProcessororRemoteHttpProcessoris one new file, not a fork.A
Protocol(not anabc.ABC) was chosen so test doubles and downstreamextensions don't have to import or inherit from us, and so the async/sync
mirrors stay symmetric without a shared async-runtime assumption.
Sibling Function App (
function_app/)Replaces the prior
azure_functions/proof-of-concept (which was monolithic:~567-line
function_app.py, ~940-lineactivities.py, inlined Cosmos/LLMconstruction in every activity, connection-string auth). The new layout is
production-shaped:
Design at a glance:
per-(user, thread, counter_type) Cosmos doc with an ETag-guarded
read-modify-write. When a configurable threshold is crossed, the trigger
starts the right Durable orchestrator. Time-based triggers were rejected:
cron has no idea when a thread is active, and "every N turns" needs per-thread
state regardless.
last_batch_lsn,so an at-least-once redelivery from the change feed is a no-op for the
contents of an already-applied batch. This is what lets us raise on
counter-ETag exhaustion (instead of silently dropping triggers) without
fearing double-processing.
context → call a flaky LLM → write back." Durable's replay model gives us
exactly-once-effective execution per orchestrator instance; deterministic
per-(user, thread, type) instance IDs + a 409 check in
_safe_startpreventoverlapping runs; built-in activity-level retry policies (centralized in
_retry.py) give us uniform retry semantics.ThreadSummaryandExtractMemoriesrun on different turn cadences and have different LLMcost profiles;
UserSummaryruns at a much lower frequency and readsacross threads. Sharing an orchestrator would couple their thresholds and
their failure modes.
ProcessingPipelinefrom the SDK andruns it inside activities — same prompts, same dedup, same supersede logic.
Activities are intentionally thin: parse the trigger payload, call the
pipeline, return counts.
AOAI all use
__credential = managedidentity. The Bicep stack provisions aUAMI with Cosmos DB Data Contributor, Storage Blob Data Owner, Storage
Queue Data Contributor, Storage Table Data Contributor, Cognitive Services
OpenAI User, and Monitoring Metrics Publisher.
allowSharedKeyAccessisfalse.azd upprovisions the full stack frominfra/.New features introduced in this PR
proceduralandepisodicalongsidefactandsummary/user_summary. Different retrieval semantics (procedurals areinjected as system instructions, episodics retrieved by similarity), different
lifecycles (procedurals supersede, episodics accumulate, facts dedup), and
type-specific TTLs. The extractor produces all three from a single LLM call
per turn batch.
tags(legacy any-of),tags_any,tags_all,exclude_tags, andmin_salienceonget_memories,get_thread, andsearch_cosmos. Implemented through acomposable
_QueryBuilderso filters combine into a single Cosmos query.The extractor auto-tags every memory (
sys:fact,sys:auto-extracted,topic:*); customer tags layer on top..mdtoPrompty 2.0 templates with explicit input schemas, pinned models, low
temperature,
response_format: json_object, and boundedmax_tokens.The in-process pipeline and the FA load the same prompt files, so behavior
is identical regardless of where processing runs.
fact_<sha256>/proc_<sha256>/ep_<sha256>IDs derived from(user_id, thread_id, content_hash), so a Durable retry that re-upserts thesame content is a no-op. All
superseded_bywrites useread_item→mutate →
replace_itemwithIfNotModified, eliminating the lost-updaterace.
ChatClient- thin AOAI wrapper that auto-strips deployment-rejectedparameters (
temperatureforgpt-5.2-chat, etc.) and retries cleanly.monolithic
test_changefeed.pysplit into focused per-module tests.Demo notebooks split into in-process (
Demo[_async].ipynb) and FA hand-off(
Demo_function_app[_async].ipynb) pairs so each notebook has one mentalmodel and one set of prerequisites.
How to validate locally
Reviewer guide
agent_memory_toolkit/processors/base.pyfor the protocol,then
pipeline.pyfor the in-process implementation, thenfunction_app/triggers/change_feed.pyandfunction_app/orchestrators/*for the FA path.
docs/design-decisions-memorytypes-pr.md)for deeper rationale on the two architectural pieces above.
plus the deletion of the old
azure_functions/directory and themonolithic-test-file replacement. Net new logic is meaningfully smaller
than the line count suggests.