Memory: support multimodal content (files, images, audio, artifacts)

Proposed by a user building serverless agents.

The current memory/context providers are mostly string-based, which is a good starting point. For multimodal agents, it would be useful to also remember files, images, audio, PDFs, generated artifacts, and their derived text.

Real-world agents work with more than text. An agent might:
- Generate a chart and need to reference it later
- Receive a PDF and extract key facts from it
- Record audio notes and recall them in a future session
- Produce code artifacts that should persist as part of the agent's memory

Today these would need to be stored and retrieved out-of-band, with only a text summary saved to memory.

It might make sense to store text and metadata in Durable Object SQLite, while larger binary assets live in R2. This would keep the current simple text-first API working unchanged, while allowing agents that need multimodal recall to opt in.

(Replaces #1388, #1390)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory: support multimodal content (files, images, audio, artifacts) #1392

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Memory: support multimodal content (files, images, audio, artifacts) #1392

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions