Sage

Privacy-first personal AI agent with persistent memory, built in Rust.

⚠️ Experimental - This is a proof of concept / personal project exploring ideas around private, memory-augmented AI agents. It works, but expect rough edges.

What is Sage?

Sage is an AI assistant that prioritizes privacy and data sovereignty. It's designed to be a trusted companion that remembers your conversations, learns about you over time, and can take actions on your behalf - all while keeping your data under your control.

Key Features:

End-to-end encrypted messaging via Signal
Image understanding - send photos and Sage can see and describe them
Long-term memory that persists across conversations
Confidential compute - LLM inference runs in a TEE (Trusted Execution Environment)
Self-hosted - all data stays on your machine
Multi-user support with isolated memory per conversation

Why Build This?

Most AI assistants are stateless - they forget everything after each conversation. The few that have memory send your data to cloud servers you don't control. Sage takes a different approach:

Your conversations stay on your PostgreSQL instance
LLM inference happens in confidential compute (Maple/TEE) - the inference provider can't see your prompts
Communication happens over Signal's E2E encryption
The agent runs in your container on your infrastructure

Technical Highlights

This project explores several unconventional design choices:

No Native Tool Calling

Instead of relying on LLM providers' function calling APIs (which are buggy and provider-specific), Sage uses structured output parsing via DSRs (DSPy in Rust) with BAML. The LLM outputs natural text that gets parsed into typed Rust structs. This approach:

Works identically across all LLM providers
Is immune to vLLM/provider-specific tool calling bugs
Is fully debuggable (just look at the text output)

Regenerated Context, Not Append-Only

Rather than maintaining an ever-growing message log, Sage regenerates the full context on each request:

Single system prompt with injected memory blocks
Recent conversation history (not the full log)
No KV cache dependency - works with any provider

Letta-Inspired Memory Architecture

Custom implementation of a 4-tier memory system (inspired by Letta/MemGPT):

Layer	Purpose	Storage
Core Memory	Always in context (persona, user info)	PostgreSQL
Recall Memory	Searchable conversation history	PostgreSQL + TEE embeddings
Archival Memory	Long-term semantic storage	pgvector + TEE embeddings
Summary Memory	Auto-compaction when context overflows	PostgreSQL

All embeddings are generated via Maple's TEE-based embedding API (nomic-embed-text), meaning your memory content stays private even during vector encoding.

Built for Prompt Optimization

The codebase is structured around DSRs signatures, enabling GEPA (Genetic-Pareto) optimization of prompts. Sage includes a working GEPA system where Claude analyzes test failures and proposes instruction improvements, which are then evaluated against Kimi. See the GEPA section below.

DSRs Signature Architecture

Sage uses typed DSRs signatures to define the contract between inputs and outputs. This makes the agent's interface explicit, debuggable, and optimizable.

Main Agent Signature (AgentResponse):

#[derive(dspy_rs::Signature)]
pub struct AgentResponse {
    // Inputs
    #[input(desc = "The input to respond to - either a user message or tool execution result")]
    pub input: String,

    #[input(desc = "Compacted summary of very old messages (only present for long conversations)")]
    pub previous_context_summary: String,

    #[input(desc = "Recent conversation history including your messages and tool results")]
    pub conversation_context: String,

    #[input]
    pub available_tools: String,

    // Outputs
    #[output(desc = "Your reasoning/thought process (think step by step)")]
    pub reasoning: String,

    #[output(desc = "Array of messages to send to the user (can be empty)")]
    pub messages: Vec<String>,

    #[output(desc = "Array of tool calls to execute (can be empty)")]
    pub tool_calls: Vec<ToolCall>,
}

How it works: DSRs compiles this signature + instruction into a single prompt with field markers ([[ ## field ## ]]). The LLM outputs structured text that gets parsed back into typed Rust structs via BAML.

Example: Compiled Prompt → LLM Response

When Sage processes a message, DSRs compiles the signature into a structured prompt. Here's what gets sent to the LLM:

System Prompt (generated by DSRs):

Your input fields are:
1. `input` (string): The input to respond to - either a user message or tool execution result
2. `previous_context_summary` (string): Compacted summary of very old messages (only present for long conversations). Ignore if empty.
3. `conversation_context` (string): Recent conversation history including your messages and tool results
4. `available_tools` (string)

Your output fields are:
1. `reasoning` (string): Your reasoning/thought process (think step by step)
2. `messages` (string[]): Array of messages to send to the user (can be empty)
3. `tool_calls` (ToolCall[]): Array of tool calls to execute (can be empty, or [{"name": "done", "args": {}}] if nothing to do)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## input ## ]]
input

[[ ## previous_context_summary ## ]]
previous_context_summary

[[ ## conversation_context ## ]]
conversation_context

[[ ## available_tools ## ]]
available_tools

[[ ## reasoning ## ]]
Output field `reasoning` should be of type: string

[[ ## messages ## ]]
Output field `messages` should be of type: string[]

[[ ## tool_calls ## ]]
Output field `tool_calls` should be of type: ToolCall[]

[
  {
    // A tool call requested by the agent
    name: string,
    args: map<string, string>,
  }
]

[[ ## completed ## ]]

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, 
then `[[ ## messages ## ]]`, then `[[ ## tool_calls ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.

In adhering to this structure, your objective is: 
    You are Sage, a helpful AI assistant communicating via Signal.
    
    MEMORY SYSTEM:
    You have full control over your memory. Use it proactively and autonomously:
    
    - **Core Memory Blocks** (<persona>, <human>): Always in your context. Edit anytime.
      - `memory_append`: Add new info to a block
      - `memory_replace`: Update/correct existing info
      - `memory_insert`: Insert at specific line
      
    - **Archival Memory**: Long-term storage for important facts, preferences, details.
      - `archival_insert`: Store information
      - `archival_search`: Search past memories semantically
    
    COMMUNICATION STYLE:
    You communicate via Signal chat. Adapt your message format to the content:
    
    CASUAL - Use multiple short messages:
    messages: ["Hey! Good question.", "The answer is pretty simple.", "It's X because Y."]
    
    DETAILED - Longer messages with paragraphs are fine:
    messages: ["Here's how that works:\n\nFirst, the system does X...\n\nThen Y happens."]
    
    ...

User Message (the actual turn):

[[ ## input ## ]]
What's the weather like in Austin today?

[[ ## previous_context_summary ## ]]


[[ ## conversation_context ## ]]
Current time: 02/01/2026 10:30:00 (Sunday) (America/Chicago)

<memory_blocks>
<persona>
I am Sage, a helpful AI assistant communicating via Signal.
</persona>
<human>
Name: Alex
Location: Austin, TX
Preferences: Prefers concise responses
</human>
</memory_blocks>

Recent conversation:
[user @ 01/31/2026 18:45:00]: hey sage, can you help me with something tomorrow?
[assistant @ 01/31/2026 18:45:12]: Of course! Just let me know what you need.
...

[[ ## available_tools ## ]]
Available tools:
web_search:
  Description: Search the web with AI summaries
  Args: {"query": "search query", "location": "city for local results"}
...

LLM Response:

[[ ## reasoning ## ]]
Alex is asking about weather in Austin. I should use web_search with their location
to get current conditions. I'll keep my response concise per their preferences.

[[ ## messages ## ]]
["Let me check the current weather for you."]

[[ ## tool_calls ## ]]
[{"name": "web_search", "args": {"query": "weather Austin TX today", "location": "Austin, TX"}}]

[[ ## completed ## ]]

DSRs parses this back into a typed AgentResponse struct that Sage uses to execute tools and send messages.

Other signatures in the codebase:

SummarizeConversation - Compacts old messages when context window fills
CorrectionResponse - Fixes malformed LLM outputs (self-healing)

Stack

Component	Choice	Why
Language	Rust	Performance, type safety, reliability
LLM	Kimi K2 (thinking variant)	Strong tool use, 128k context
Inference	Maple	TEE-based confidential compute
Embeddings	nomic-embed-text	Via Maple
Messaging	Signal (signal-cli)	E2E encrypted, works on mobile
Database	PostgreSQL + pgvector	Structured data + vector search
Framework	DSRs (dspy-rs)	Typed signatures, BAML parsing

Tools

Tool	Description
`web_search`	Brave Search with AI summaries
`shell`	Execute commands in workspace
`memory_replace/append/insert`	Edit core memory blocks
`archival_insert/search`	Long-term semantic memory
`conversation_search`	Search conversation history
`schedule_task`	Reminders (cron or one-off)
`set_preference`	User preferences (timezone, etc.)

Quick Start

Prerequisites

Podman or Docker
signal-cli registered with a phone number
Maple API access (or compatible OpenAI endpoint)

Option 1: Docker (Recommended)

Pre-built images are available for linux/amd64 and linux/arm64:

# Pull the latest image
docker pull ghcr.io/anthonyronning/sage:latest

# Clone for docker-compose and configs
git clone https://github.com/AnthonyRonning/sage.git
cd sage

# Configure environment
cp .env.example .env
# Edit .env with your settings

# Initialize signal-cli data volume (requires existing signal-cli registration)
just signal-init

# Start all services (postgres, signal-cli, sage)
docker compose up -d

Or use the image directly in your own compose setup:

services:
  sage:
    image: ghcr.io/anthonyronning/sage:latest
    environment:
      - DATABASE_URL=postgres://sage:sage@postgres:5432/sage
      - MAPLE_API_URL=https://your-maple-endpoint
      - MAPLE_API_KEY=your-api-key
      - SIGNAL_CLI_HOST=signal-cli
      - SIGNAL_CLI_PORT=7583
      - SIGNAL_PHONE_NUMBER=+1234567890

Option 2: Build from Source

Requires Nix with flakes enabled:

git clone https://github.com/AnthonyRonning/sage.git
cd sage
nix develop

cp .env.example .env
# Edit .env with your settings

just signal-init  # Copy signal-cli data to volume
just build        # Build container
just start        # Start all services

Configuration

# Required
MAPLE_API_URL=https://your-maple-endpoint
MAPLE_API_KEY=your-api-key
MAPLE_MODEL=maple/kimi-k2-5
SIGNAL_PHONE_NUMBER=+1234567890

# Optional
BRAVE_API_KEY=your-brave-key          # For web search
MAPLE_VISION_MODEL=maple/kimi-k2-5   # For image understanding (defaults to MAPLE_MODEL)
SIGNAL_ALLOWED_USERS=*                # Or comma-separated UUIDs

Architecture

┌─────────────────┐     Signal      ┌─────────────────┐
│   Your Phone    │◄──────────────►│   signal-cli    │
└─────────────────┘    (encrypted)  └────────┬────────┘
                                             │ JSON-RPC
                                             ▼
┌─────────────────────────────────────────────────────┐
│                    Sage (Rust)                      │
│  ┌─────────────┐  ┌─────────────┐  ┌────────────┐  │
│  │   Agent     │  │   Memory    │  │   Tools    │  │
│  │   Manager   │  │   System    │  │            │  │
│  └─────────────┘  └─────────────┘  └────────────┘  │
└─────────────────────────┬───────────────────────────┘
                          │
        ┌─────────────────┼─────────────────┐
        ▼                 ▼                 ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│  PostgreSQL   │ │    Maple      │ │ Brave Search  │
│  + pgvector   │ │    (TEE)      │ │               │
└───────────────┘ └───────────────┘ └───────────────┘

Privacy Model

Layer	Protection
Transport	Signal E2E encryption
Inference	Maple TEE (confidential compute)
Embeddings	Maple TEE (memory vectors generated privately)
Storage	Local PostgreSQL (your machine)
Search	Brave (privacy-respecting, no tracking)

Project Status

Working:

Multi-user conversations with memory isolation
Image understanding (send photos via Signal)
Web search, shell commands, scheduling
Auto-reconnect on Signal connection drops
Context compaction when approaching limits
GEPA prompt optimization (see below)

Future:

Gmail/Calendar integration
Group chat support
Voice messages

GEPA Prompt Optimization

Sage includes a GEPA (Genetic-Pareto) optimization system for automatically improving the agent instruction based on test cases and feedback.

How it works:

Define training examples in examples/gepa/trainset.json with expected behaviors
Run evaluation to get baseline score against current instruction
Run optimization - Claude (judge) analyzes failures and proposes instruction improvements
Kimi (program) is re-evaluated with the improved instruction
Repeat until convergence or perfect score

Commands:

# Evaluate current instruction (baseline score)
just gepa-eval

# Run optimization loop (requires ANTHROPIC_API_KEY)
just gepa-optimize

# View optimized instruction
just gepa-show

# See training example categories
just gepa-examples

Environment:

# Required for GEPA optimization (Claude as judge)
ANTHROPIC_API_KEY=your-anthropic-key

# Program under test (Kimi via Maple)
MAPLE_API_URL=https://your-maple-endpoint
MAPLE_API_KEY=your-maple-key
MAPLE_MODEL=maple/kimi-k2-5

Training Examples: Training data is in examples/gepa/trainset.json. Each example includes:

Input scenario (user message or tool result)
Context (persona, human block, conversation history)
Expected behavior description
Good/bad response examples

Current categories: first-time users, casual chat, web search, memory storage, tool result processing, corrections.

Related Projects

Letta - Memory management inspiration
DSRs - DSPy in Rust
signal-cli - Signal CLI interface
Maple - Confidential compute LLM inference

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.githooks		.githooks
.github/workflows		.github/workflows
crates		crates
docs		docs
examples/gepa		examples/gepa
optimized_instructions		optimized_instructions
.env.example		.env.example
.envrc		.envrc
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
README.md		README.md
TODO.md		TODO.md
docker-compose.yml		docker-compose.yml
flake.lock		flake.lock
flake.nix		flake.nix
justfile		justfile
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sage

What is Sage?

Why Build This?

Technical Highlights

No Native Tool Calling

Regenerated Context, Not Append-Only

Letta-Inspired Memory Architecture

Built for Prompt Optimization

DSRs Signature Architecture

Stack

Tools

Quick Start

Prerequisites

Option 1: Docker (Recommended)

Option 2: Build from Source

Configuration

Architecture

Privacy Model

Project Status

GEPA Prompt Optimization

Related Projects

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

AnthonyRonning/sage

Folders and files

Latest commit

History

Repository files navigation

Sage

What is Sage?

Why Build This?

Technical Highlights

No Native Tool Calling

Regenerated Context, Not Append-Only

Letta-Inspired Memory Architecture

Built for Prompt Optimization

DSRs Signature Architecture

Stack

Tools

Quick Start

Prerequisites

Option 1: Docker (Recommended)

Option 2: Build from Source

Configuration

Architecture

Privacy Model

Project Status

GEPA Prompt Optimization

Related Projects

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages