NanoLetta

Minimal cognitive agent loop. Letta's cognitive runtime stripped from 134,000 lines to 1,900.

NanoLetta is the cognitive kernel extracted from Letta. It keeps the one thing that actually matters — the reasoning loop — and removes everything else.

message → LLM → tool calls → memory edits → persist → repeat

One dependency. Zero infrastructure. Runs anywhere Python runs.

Architecture

flowchart LR
    U["User message"] --> A["Agent.step()"]
    A --> C["LLM client"]
    C --> D{"tool call?"}
    D -- "memory edit" --> M["MemoryToolExecutor"]
    M --> S["SQLite store"]
    S --> A
    D -- "custom tool" --> T["Your tool handler"]
    T --> A
    D -- "send_message" --> R["Assistant response"]
    G["Governor hooks (optional)"] -.-> A

NanoLetta keeps one loop: reason, call tools, persist memory, continue until the agent sends a response.

Size comparison

Component	Letta	NanoLetta	Reduction
Agent reasoning loop	1,969 LOC	311 LOC	84%
Memory / block system	~2,000 LOC	280 LOC	86%
LLM client	~800 LOC	176 LOC	78%
Persistence	~3,000 LOC	258 LOC	91%
Total	~134,000 LOC	~1,900 LOC	99%

What was cut: telemetry, multi-tenant routing, approval workflows, job scheduling, REST API surface, database migrations, Docker orchestration, span tracking, dry-run mode, step-progression enums, multi-agent routing, run ID tracking.

What remains: the cognition.

Install

pip install httpx  # only external dependency

Copy the nanoletta/ folder into your project. No pip package yet — the code is small enough to own directly.

nanoletta/
├── agent.py        # Reasoning loop
├── governor.py     # Extension hooks
├── interfaces.py   # Protocol definitions
├── llm.py          # OpenAI-compatible client
├── memory_tools.py # Built-in memory tools
├── store.py        # SQLite persistence
└── types.py        # Data model

Quick start

import asyncio
from nanoletta.agent import Agent
from nanoletta.store import SQLiteStore
from nanoletta.llm import OpenAICompatibleClient
from nanoletta.memory_tools import MemoryToolExecutor
from nanoletta.types import AgentState, Block, LLMConfig, Tool

async def main():
    # Persistence (SQLite, zero config)
    store = SQLiteStore("agent.db")

    # LLM client (OpenAI-compatible — works with Ollama, OpenRouter, etc.)
    llm = OpenAICompatibleClient(
        base_url="http://localhost:11434/v1",
        model="llama3.2",
    )

    # Tool executor (built-in memory tools + your custom tools)
    async def my_tools(tool_call, state):
        # Add your custom tools here
        return None  # falls through to memory tools

    tools = MemoryToolExecutor(session_store=store, custom_handler=my_tools)

    # Create an agent with persistent memory blocks
    state = AgentState(
        agent_id="my-agent",
        system_prompt="You are a helpful assistant with persistent memory.",
        blocks=[
            Block(label="persona", value="Friendly, concise, honest."),
            Block(label="user_notes", value=""),  # Agent writes here
        ],
        tools=[],
        llm_config=LLMConfig(model="llama3.2", context_window=8192),
    )
    await store.save(state)

    # Run the agent
    agent = Agent(
        agent_id="my-agent",
        llm_client=llm,
        tool_executor=tools,
        agent_repo=store,
        session_store=store,
        block_store=store,
    )

    response = await agent.step("Hello! What do you remember about me?")
    print(response.content)

asyncio.run(main())

How it works

The reasoning loop

Each call to agent.step() runs this cycle until the agent sends a message or hits max_steps:

1. Load agent state + memory blocks from SQLite
2. Build context: system prompt + memory blocks + conversation history
3. Call LLM with available tools
4. Parse tool calls from response
5. Execute tools (memory edits, custom tools)
6. Persist messages + updated blocks
7. If agent called send_message → return response
   If agent used a thinking tool → go to step 2

Self-editing memory

The agent can modify its own memory blocks mid-conversation using built-in tools:

Tool	What it does
`send_message`	Send a reply to the user
`memory_str_replace`	Edit a memory block (find & replace)
`memory_insert`	Insert text into a memory block
`memory_rethink`	Rewrite a memory block from scratch
`memory_delete`	Remove text from a memory block
`conversation_search`	Search conversation history

Governor hooks

Six hook points let you inject governance logic without modifying the core loop. Pass a GovernanceRuntime to activate them — without one, all hooks are no-ops:

from nanoletta.governor import Governor, GovernorConfig, GovernanceRuntime

class MyRuntime:
    def on_open_question(self, user_msg, ai_response, person="user"):
        # Track unanswered questions, update a knowledge gap log, etc.
        return {}

    def on_draft_response(self, draft, self_model, history, doctrine):
        # Consistency / safety check before response is committed
        # Return {"block": True} to signal the response should be held
        return {"ok": True}

    def on_correction(self, user_msg, ai_response, correction_type):
        # Called when the agent edits its own memory blocks
        return {}

    def get_context_block(self, max_items=3):
        # Return a string injected into the system prompt each turn
        return None  # or "Active reminders:\n- ..."

    def on_daemon_cycle(self, commitments, emotional_state,
                        homeostasis_state, last_user_ts, chat_id):
        # Proactive outreach / initiative decisions
        return {}

governor = Governor(
    config=GovernorConfig(person="Alice"),
    runtime=MyRuntime(),
)
agent = Agent(..., governor=governor)

Hook	When it fires
`pre_step`	Before any LLM call
`post_draft`	After LLM generates a response
`post_tool`	After a memory-editing tool executes
`post_step`	After the full step completes
`build_consciousness_block`	During context assembly (injects text into system prompt)
`daemon_cycle`	External heartbeat (proactive / initiative logic)

All hooks fail silently — a broken runtime cannot crash the agent loop. Subclass Governor instead of using a runtime if you prefer inheritance over composition.

Persistence

NanoLetta uses SQLite with three tables:

agents    -- AgentState (system prompt, block refs, tool defs, LLM config)
blocks    -- Memory blocks (label, value, editable by the agent)
messages  -- Conversation history (role, content, tool calls)

WAL mode is enabled by default. Works fine for single-user agents. For concurrent sessions, wrap with a connection pool or swap in the BlockStore / SessionStore protocols with your preferred backend.

Tests

# Unit tests (no LLM required)
python3 -m pytest tests/ -q --ignore=tests/test_e2e_ollama.py

# End-to-end test (requires Ollama)
OLLAMA_URL=http://localhost:11434/v1 OLLAMA_MODEL=llama3.2 \
  python3 -m pytest tests/test_e2e_ollama.py -v -s

135 unit tests. Covers types, agent loop, memory tools, SQLite store, LLM client, governor hooks.

What was removed from Letta

NanoLetta is not a fork of the Letta API. It is a transplant of one specific module — letta/agent/letta_agent.py — with the essential logic kept and everything deployment-specific removed:

Telemetry / span tracking (~200 LOC removed) — no OpenTelemetry dependency
Step progression enum + error recovery (~100 LOC) — simplified to a clean loop
Approval / denial workflows (~80 LOC) — not needed for single-agent use
Multi-agent routing (~30 LOC) — out of scope
Job cancellation + run ID tracking (~35 LOC) — removed
Dry run mode (~20 LOC) — removed
REST API, database migrations, Docker — entirely separate from the cognitive kernel

The Governor replaces Letta's step-manager callbacks with a cleaner hook interface.

Part of a suite

NanoLetta pairs naturally with:

Temporal — temporal knowledge graph (Graphiti → 2.8k LOC). Knows when facts were true.
Lucid — semantic memory runtime (Hindsight → 2k LOC). retain() / recall() / reflect().

Wire them into NanoLetta via custom tools and the Governor hooks.

Extending NanoLetta

NanoLetta is designed to be extended through interfaces, not forked:

from nanoletta.interfaces import ToolExecutor
from nanoletta.types import ToolCall, ToolResult, AgentState

class MyToolExecutor(ToolExecutor):
    async def execute(self, tool_call: ToolCall, state: AgentState) -> ToolResult:
        match tool_call.name:
            case "web_search":
                result = await do_web_search(tool_call.arguments["query"])
                return ToolResult(output=result, success=True)
            case _:
                return ToolResult(output="unknown tool", success=False)

Swap any of the five protocol interfaces (LLMClient, BlockStore, SessionStore, ToolExecutor, AgentRepo) with your own implementation.

Requirements

Python 3.11+
httpx (for the built-in LLM client — omit if you bring your own)
Any OpenAI-compatible LLM endpoint (Ollama, OpenRouter, OpenAI, Anthropic-compat, etc.)

License

Apache 2.0. See LICENSE.

Acknowledgments

The cognitive loop design and memory block concept come from Letta (Apache 2.0), originally MemGPT. NanoLetta is an independent extraction — not affiliated with Letta AI.

Extracted from Letta's core cognitive-loop design. Not tracking upstream; this is a stable standalone extraction, not a rolling fork.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
golden		golden
nanoletta		nanoletta
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NanoLetta

Architecture

Size comparison

Install

Quick start

How it works

The reasoning loop

Self-editing memory

Governor hooks

Persistence

Tests

What was removed from Letta

Part of a suite

Extending NanoLetta

Requirements

License

Acknowledgments

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NanoLetta

Architecture

Size comparison

Install

Quick start

How it works

The reasoning loop

Self-editing memory

Governor hooks

Persistence

Tests

What was removed from Letta

Part of a suite

Extending NanoLetta

Requirements

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages