agentic-memory

A cognitive memory system for AI agents, grounded in the taxonomy from Measuring Progress Toward AGI: A Cognitive Framework.

Most agent frameworks treat memory as a single vector store you dump context into. This project implements memory the way the cognitive science paper describes it: separate stores for different memory types, a unified retriever that queries across them, a forgetting service that prunes stale knowledge, and an event bus that lets future modules (working memory, learning, metacognition) subscribe without touching core code.

Built on Gemini Embedding 2 for natively multimodal embeddings — text, images, audio, and video share a single vector space.

Status: Phase 1 in progress. Semantic memory write path is functional. Read path and event bus next.

Architecture

graph TB
    subgraph "Phase 1 — Memory System"
        CLI["demo/cli.py"]
        EMB["GeminiEmbedder"]
        SS["SemanticStore"]
        ES["EpisodicStore"]
        PS["ProceduralStore"]
        RET["UnifiedRetriever"]
        FGT["ForgettingService"]
        BUS["EventBus"]
        DB[(ChromaDB)]

        CLI --> SS
        CLI --> ES
        CLI --> PS
        SS --> EMB
        ES --> EMB
        PS --> EMB
        SS --> DB
        ES --> DB
        PS --> DB
        RET --> SS
        RET --> ES
        RET --> PS
        FGT --> SS
        FGT --> ES
        FGT --> PS
        SS -.->|emit| BUS
        ES -.->|emit| BUS
        PS -.->|emit| BUS
    end

    subgraph "Phase 2 — Working Memory + Learning"
        WM["WorkingMemory"]
        LRN["LearningModule"]
        WM -.->|subscribe| BUS
        LRN -.->|subscribe| BUS
    end

    subgraph "Phase 3 — Metacognition"
        META["MetacognitiveMonitor"]
        META -.->|subscribe| BUS
    end

    style SS fill:#2d6a4f,color:#fff
    style EMB fill:#2d6a4f,color:#fff
    style CLI fill:#2d6a4f,color:#fff
    style DB fill:#2d6a4f,color:#fff
    style ES fill:#555,color:#aaa
    style PS fill:#555,color:#aaa
    style RET fill:#555,color:#aaa
    style FGT fill:#555,color:#aaa
    style BUS fill:#555,color:#aaa
    style WM fill:#333,color:#666
    style LRN fill:#333,color:#666
    style META fill:#333,color:#666

Green = built. Grey = planned (Phase 1). Dark = future phases.

Memory Types

The paper identifies distinct memory sub-types that behave differently — different decay rates, retrieval patterns, and update semantics. This project implements them as separate ChromaDB collections behind a shared interface.

classDiagram
    class MemoryRecord {
        +str content
        +str memory_type
        +str modality
        +str id
        +datetime created_at
        +float importance
        +list~float~ embedding
        +str media_ref
    }
    class SemanticMemory {
        +str category
        +float confidence
        +str supersedes
        +list~str~ related_ids
    }
    class EpisodicMemory {
        +str context
        +float emotional_valence
        +str session_id
    }
    class ProceduralMemory {
        +str trigger
        +list~str~ steps
        +int execution_count
    }
    MemoryRecord <|-- SemanticMemory
    MemoryRecord <|-- EpisodicMemory
    MemoryRecord <|-- ProceduralMemory

Write Path

How a memory goes from raw content to a persisted vector:

sequenceDiagram
    participant C as CLI
    participant S as SemanticStore
    participant E as GeminiEmbedder
    participant G as Gemini API
    participant D as ChromaDB

    C->>S: store(SemanticMemory)
    S->>E: embed_text(content)
    E->>G: embed_content(text, 768 dims)
    G-->>E: [0.023, -0.41, 0.87, ...]
    E-->>S: embedding vector
    S->>D: add(id, embedding, metadata)
    D-->>S: persisted
    S-->>C: record_id

Multimodal Embeddings

All modalities go through the same GeminiEmbedder and land in the same vector space. A text query can retrieve an image memory. An audio clip can be compared to text descriptions.

graph LR
    T["text"] -->|embed_text| EMB["GeminiEmbedder"]
    I["image bytes"] -->|embed_image| EMB
    A["audio bytes"] -->|embed_audio| EMB
    EMB --> V["768-dim vector space"]

    style V fill:#2d6a4f,color:#fff

The embedding model is Gemini Embedding 2 — natively multimodal, mapping text, images, audio, and video into a single embedding space. Matryoshka support allows truncation from 3072 down to 768 dimensions with minimal accuracy loss.

Experiment: Cross-Modal Emotion in Audio

We ran an experiment to test whether audio embeddings encode emotional tone or just acoustic structure. Four songs across three languages (English, Hindi, Arabic), embedded as raw bytes with no metadata passed to the model.

graph LR
    subgraph "Audio Pipeline"
        SONG["song.mp3"] -->|ffmpeg| CHUNKS["60s chunks"]
        CHUNKS -->|embed_audio| VECS["chunk vectors"]
        VECS -->|average| AV["song vector"]
    end

    subgraph "Probe Vectors"
        P1["grief / loss"]
        P2["melancholy"]
        P3["joy / happiness"]
        P4["peaceful / calm"]
        P5["tension / dread"]
        P6["neutral"]
    end

    AV -->|cosine similarity| SCORES["ranked scores"]
    P1 --> SCORES
    P2 --> SCORES
    P3 --> SCORES

Results summary:

Song	Genre	Language	Top Match	Runner-up
Schindler's List Theme	Orchestral	Instrumental	melancholy (0.70)	grief (0.68)
Phir Se	Bollywood	Hindi	grief (0.69)	melancholy (0.69)
Rasputin	Disco-pop	English	joy (0.69)	tension (0.67)
Didi	Rai	Arabic/French	joy (0.66)	melancholy (0.63)

The model correctly separated sad from happy songs, made nuanced distinctions within each cluster (melancholy vs. grief, joy vs. tension), and worked cross-lingually on raw audio bytes with no transcription.

Grief ranked last on both happy songs. Neutral ranked last on both sad songs. The largest winner-to-loser gap was 0.087 (Phir Se: grief vs. neutral).

Full methodology and analysis: experiments/audio_emotion_probe_results.md

Architectural implication: emotional_valence on episodic memories can be derived directly from the embedding — no separate sentiment analysis pipeline needed.

Project Structure

agentic-memory/
├── config.py                  # API keys, model config, ChromaDB path
├── models/
│   ├── base.py                # MemoryRecord dataclass
│   └── semantic.py            # SemanticMemory (factual knowledge)
├── utils/
│   └── embeddings.py          # GeminiEmbedder — text, image, audio
├── stores/
│   ├── base.py                # Abstract BaseStore interface
│   └── semantic_store.py      # ChromaDB-backed semantic store
├── retrieval/                  # (planned) Unified cross-store retriever
├── events/                     # (planned) Event bus for store/retrieve signals
├── api/
│   └── app.py                 # FastAPI boundary over stores/retriever/events
├── demo/
│   └── cli.py                 # CLI for storing and querying memories
├── web/                       # Next.js playground UI for memory workflows
├── experiments/
│   ├── audio_emotion_probe.py           # Cross-modal emotion probe script
│   └── audio_emotion_probe_results.md   # Full results and analysis
├── media/                      # Audio/image files (not committed)
└── research-docs/
    └── measuring-progress-toward-agi-a-cognitive-framework.pdf

Setup

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Create a .env file:

GEMINI_API_KEY=your_key_here

System dependency for audio chunking:

# arch
sudo pacman -S ffmpeg

# ubuntu/debian
sudo apt install ffmpeg

API

Run the Python API locally:

.venv/bin/python -m uvicorn api.app:app --reload

Key routes:

POST /api/memories/semantic
POST /api/memories/episodic/text
POST /api/memories/episodic/file
POST /api/retrieval/query
GET /api/episodes/recent
GET /api/episodes/session/{session_id}
GET /api/episodes/time-range
GET /api/events
GET /api/overview

Interactive docs:

http://localhost:8000/docs

Playground UI

The repository now includes a Next.js playground in web/ that talks to the Python API.

cd web
npm install
NEXT_PUBLIC_MEMORY_API_BASE_URL=http://localhost:8000 npm run dev

For a Vercel deployment such as memory.agentclash.dev, set:

NEXT_PUBLIC_MEMORY_API_BASE_URL=<your deployed Python API base URL>

Usage

Store a fact:

python demo/cli.py store "Python was created by Guido van Rossum"

Run the audio emotion probe:

python experiments/audio_emotion_probe.py "song.mp3"

Offline Evaluation

The repo now includes a deterministic offline episodic-memory evaluation harness for fixed synthetic fixtures across mixed-store retrieval, temporal recall, session reconstruction, recent-event lookup, and cross-modal media-backed episodes.

Run it with:

pytest tests/test_offline_episodic_eval.py

or:

python tests/test_offline_episodic_eval.py

Benchmark mapping and rationale: docs/offline_episodic_eval.md

Theoretical Foundation

This project is built on the cognitive taxonomy from the DeepMind paper Measuring Progress Toward AGI. The paper distinguishes three faculties that most agent frameworks conflate:

Memory — passive storage and retrieval (semantic facts, episodic events, procedural skills)
Working Memory — active manipulation of information for a current goal (sits under Executive Functions, not Memory)
Learning — acquisition and consolidation of new knowledge into long-term memory

The architecture implements these as separate systems. Phase 1 builds the memory stores. Phase 2 adds working memory (an active scratchpad) and a learning module (consolidation from working memory to long-term stores). Phase 3 adds metacognitive monitoring — the system's ability to assess confidence in its own retrieved context.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agentic-memory

Architecture

Memory Types

Write Path

Multimodal Embeddings

Experiment: Cross-Modal Emotion in Audio

Project Structure

Setup

API

Playground UI

Usage

Offline Evaluation

Theoretical Foundation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
api		api
demo		demo
docs		docs
events		events
experiments		experiments
models		models
research-docs		research-docs
retrieval		retrieval
stores		stores
tests		tests
utils		utils
web		web
.gitignore		.gitignore
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt
testing.md		testing.md

Folders and files

Latest commit

History

Repository files navigation

agentic-memory

Architecture

Memory Types

Write Path

Multimodal Embeddings

Experiment: Cross-Modal Emotion in Audio

Project Structure

Setup

API

Playground UI

Usage

Offline Evaluation

Theoretical Foundation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages