Skip to content

AnthonyRonning/sage

Repository files navigation

Sage

Privacy-first personal AI agent with persistent memory, built in Rust.

⚠️ Experimental - This is a proof of concept / personal project exploring ideas around private, memory-augmented AI agents. It works, but expect rough edges.

What is Sage?

Sage is an AI assistant that prioritizes privacy and data sovereignty. It's designed to be a trusted companion that remembers your conversations, learns about you over time, and can take actions on your behalf - all while keeping your data under your control.

Key Features:

  • End-to-end encrypted messaging via Signal
  • Image understanding - send photos and Sage can see and describe them
  • Long-term memory that persists across conversations
  • Confidential compute - LLM inference runs in a TEE (Trusted Execution Environment)
  • Self-hosted - all data stays on your machine
  • Multi-user support with isolated memory per conversation

Why Build This?

Most AI assistants are stateless - they forget everything after each conversation. The few that have memory send your data to cloud servers you don't control. Sage takes a different approach:

  • Your conversations stay on your PostgreSQL instance
  • LLM inference happens in confidential compute (Maple/TEE) - the inference provider can't see your prompts
  • Communication happens over Signal's E2E encryption
  • The agent runs in your container on your infrastructure

Technical Highlights

This project explores several unconventional design choices:

No Native Tool Calling

Instead of relying on LLM providers' function calling APIs (which are buggy and provider-specific), Sage uses structured output parsing via DSRs (DSPy in Rust) with BAML. The LLM outputs natural text that gets parsed into typed Rust structs. This approach:

  • Works identically across all LLM providers
  • Is immune to vLLM/provider-specific tool calling bugs
  • Is fully debuggable (just look at the text output)

Regenerated Context, Not Append-Only

Rather than maintaining an ever-growing message log, Sage regenerates the full context on each request:

  • Single system prompt with injected memory blocks
  • Recent conversation history (not the full log)
  • No KV cache dependency - works with any provider

Letta-Inspired Memory Architecture

Custom implementation of a 4-tier memory system (inspired by Letta/MemGPT):

Layer Purpose Storage
Core Memory Always in context (persona, user info) PostgreSQL
Recall Memory Searchable conversation history PostgreSQL + TEE embeddings
Archival Memory Long-term semantic storage pgvector + TEE embeddings
Summary Memory Auto-compaction when context overflows PostgreSQL

All embeddings are generated via Maple's TEE-based embedding API (nomic-embed-text), meaning your memory content stays private even during vector encoding.

Built for Prompt Optimization

The codebase is structured around DSRs signatures, enabling GEPA (Genetic-Pareto) optimization of prompts. Sage includes a working GEPA system where Claude analyzes test failures and proposes instruction improvements, which are then evaluated against Kimi. See the GEPA section below.

DSRs Signature Architecture

Sage uses typed DSRs signatures to define the contract between inputs and outputs. This makes the agent's interface explicit, debuggable, and optimizable.

Main Agent Signature (AgentResponse):

#[derive(dspy_rs::Signature)]
pub struct AgentResponse {
    // Inputs
    #[input(desc = "The input to respond to - either a user message or tool execution result")]
    pub input: String,

    #[input(desc = "Compacted summary of very old messages (only present for long conversations)")]
    pub previous_context_summary: String,

    #[input(desc = "Recent conversation history including your messages and tool results")]
    pub conversation_context: String,

    #[input]
    pub available_tools: String,

    // Outputs
    #[output(desc = "Your reasoning/thought process (think step by step)")]
    pub reasoning: String,

    #[output(desc = "Array of messages to send to the user (can be empty)")]
    pub messages: Vec<String>,

    #[output(desc = "Array of tool calls to execute (can be empty)")]
    pub tool_calls: Vec<ToolCall>,
}

How it works: DSRs compiles this signature + instruction into a single prompt with field markers ([[ ## field ## ]]). The LLM outputs structured text that gets parsed back into typed Rust structs via BAML.

Example: Compiled Prompt → LLM Response

When Sage processes a message, DSRs compiles the signature into a structured prompt. Here's what gets sent to the LLM:

System Prompt (generated by DSRs):

Your input fields are:
1. `input` (string): The input to respond to - either a user message or tool execution result
2. `previous_context_summary` (string): Compacted summary of very old messages (only present for long conversations). Ignore if empty.
3. `conversation_context` (string): Recent conversation history including your messages and tool results
4. `available_tools` (string)

Your output fields are:
1. `reasoning` (string): Your reasoning/thought process (think step by step)
2. `messages` (string[]): Array of messages to send to the user (can be empty)
3. `tool_calls` (ToolCall[]): Array of tool calls to execute (can be empty, or [{"name": "done", "args": {}}] if nothing to do)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## input ## ]]
input

[[ ## previous_context_summary ## ]]
previous_context_summary

[[ ## conversation_context ## ]]
conversation_context

[[ ## available_tools ## ]]
available_tools

[[ ## reasoning ## ]]
Output field `reasoning` should be of type: string

[[ ## messages ## ]]
Output field `messages` should be of type: string[]

[[ ## tool_calls ## ]]
Output field `tool_calls` should be of type: ToolCall[]

[
  {
    // A tool call requested by the agent
    name: string,
    args: map<string, string>,
  }
]

[[ ## completed ## ]]

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, 
then `[[ ## messages ## ]]`, then `[[ ## tool_calls ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.

In adhering to this structure, your objective is: 
    You are Sage, a helpful AI assistant communicating via Signal.
    
    MEMORY SYSTEM:
    You have full control over your memory. Use it proactively and autonomously:
    
    - **Core Memory Blocks** (<persona>, <human>): Always in your context. Edit anytime.
      - `memory_append`: Add new info to a block
      - `memory_replace`: Update/correct existing info
      - `memory_insert`: Insert at specific line
      
    - **Archival Memory**: Long-term storage for important facts, preferences, details.
      - `archival_insert`: Store information
      - `archival_search`: Search past memories semantically
    
    COMMUNICATION STYLE:
    You communicate via Signal chat. Adapt your message format to the content:
    
    CASUAL - Use multiple short messages:
    messages: ["Hey! Good question.", "The answer is pretty simple.", "It's X because Y."]
    
    DETAILED - Longer messages with paragraphs are fine:
    messages: ["Here's how that works:\n\nFirst, the system does X...\n\nThen Y happens."]
    
    ...

User Message (the actual turn):

[[ ## input ## ]]
What's the weather like in Austin today?

[[ ## previous_context_summary ## ]]


[[ ## conversation_context ## ]]
Current time: 02/01/2026 10:30:00 (Sunday) (America/Chicago)

<memory_blocks>
<persona>
I am Sage, a helpful AI assistant communicating via Signal.
</persona>
<human>
Name: Alex
Location: Austin, TX
Preferences: Prefers concise responses
</human>
</memory_blocks>

Recent conversation:
[user @ 01/31/2026 18:45:00]: hey sage, can you help me with something tomorrow?
[assistant @ 01/31/2026 18:45:12]: Of course! Just let me know what you need.
...

[[ ## available_tools ## ]]
Available tools:
web_search:
  Description: Search the web with AI summaries
  Args: {"query": "search query", "location": "city for local results"}
...

LLM Response:

[[ ## reasoning ## ]]
Alex is asking about weather in Austin. I should use web_search with their location
to get current conditions. I'll keep my response concise per their preferences.

[[ ## messages ## ]]
["Let me check the current weather for you."]

[[ ## tool_calls ## ]]
[{"name": "web_search", "args": {"query": "weather Austin TX today", "location": "Austin, TX"}}]

[[ ## completed ## ]]

DSRs parses this back into a typed AgentResponse struct that Sage uses to execute tools and send messages.

Other signatures in the codebase:

  • SummarizeConversation - Compacts old messages when context window fills
  • CorrectionResponse - Fixes malformed LLM outputs (self-healing)

Stack

Component Choice Why
Language Rust Performance, type safety, reliability
LLM Kimi K2 (thinking variant) Strong tool use, 128k context
Inference Maple TEE-based confidential compute
Embeddings nomic-embed-text Via Maple
Messaging Signal (signal-cli) E2E encrypted, works on mobile
Database PostgreSQL + pgvector Structured data + vector search
Framework DSRs (dspy-rs) Typed signatures, BAML parsing

Tools

Tool Description
web_search Brave Search with AI summaries
shell Execute commands in workspace
memory_replace/append/insert Edit core memory blocks
archival_insert/search Long-term semantic memory
conversation_search Search conversation history
schedule_task Reminders (cron or one-off)
set_preference User preferences (timezone, etc.)

Quick Start

Prerequisites

  • Podman or Docker
  • signal-cli registered with a phone number
  • Maple API access (or compatible OpenAI endpoint)

Option 1: Docker (Recommended)

Pre-built images are available for linux/amd64 and linux/arm64:

# Pull the latest image
docker pull ghcr.io/anthonyronning/sage:latest

# Clone for docker-compose and configs
git clone https://github.com/AnthonyRonning/sage.git
cd sage

# Configure environment
cp .env.example .env
# Edit .env with your settings

# Initialize signal-cli data volume (requires existing signal-cli registration)
just signal-init

# Start all services (postgres, signal-cli, sage)
docker compose up -d

Or use the image directly in your own compose setup:

services:
  sage:
    image: ghcr.io/anthonyronning/sage:latest
    environment:
      - DATABASE_URL=postgres://sage:sage@postgres:5432/sage
      - MAPLE_API_URL=https://your-maple-endpoint
      - MAPLE_API_KEY=your-api-key
      - SIGNAL_CLI_HOST=signal-cli
      - SIGNAL_CLI_PORT=7583
      - SIGNAL_PHONE_NUMBER=+1234567890

Option 2: Build from Source

Requires Nix with flakes enabled:

git clone https://github.com/AnthonyRonning/sage.git
cd sage
nix develop

cp .env.example .env
# Edit .env with your settings

just signal-init  # Copy signal-cli data to volume
just build        # Build container
just start        # Start all services

Configuration

# Required
MAPLE_API_URL=https://your-maple-endpoint
MAPLE_API_KEY=your-api-key
MAPLE_MODEL=maple/kimi-k2-5
SIGNAL_PHONE_NUMBER=+1234567890

# Optional
BRAVE_API_KEY=your-brave-key          # For web search
MAPLE_VISION_MODEL=maple/kimi-k2-5   # For image understanding (defaults to MAPLE_MODEL)
SIGNAL_ALLOWED_USERS=*                # Or comma-separated UUIDs

Architecture

┌─────────────────┐     Signal      ┌─────────────────┐
│   Your Phone    │◄──────────────►│   signal-cli    │
└─────────────────┘    (encrypted)  └────────┬────────┘
                                             │ JSON-RPC
                                             ▼
┌─────────────────────────────────────────────────────┐
│                    Sage (Rust)                      │
│  ┌─────────────┐  ┌─────────────┐  ┌────────────┐  │
│  │   Agent     │  │   Memory    │  │   Tools    │  │
│  │   Manager   │  │   System    │  │            │  │
│  └─────────────┘  └─────────────┘  └────────────┘  │
└─────────────────────────┬───────────────────────────┘
                          │
        ┌─────────────────┼─────────────────┐
        ▼                 ▼                 ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│  PostgreSQL   │ │    Maple      │ │ Brave Search  │
│  + pgvector   │ │    (TEE)      │ │               │
└───────────────┘ └───────────────┘ └───────────────┘

Privacy Model

Layer Protection
Transport Signal E2E encryption
Inference Maple TEE (confidential compute)
Embeddings Maple TEE (memory vectors generated privately)
Storage Local PostgreSQL (your machine)
Search Brave (privacy-respecting, no tracking)

Project Status

Working:

  • Multi-user conversations with memory isolation
  • Image understanding (send photos via Signal)
  • Web search, shell commands, scheduling
  • Auto-reconnect on Signal connection drops
  • Context compaction when approaching limits
  • GEPA prompt optimization (see below)

Future:

  • Gmail/Calendar integration
  • Group chat support
  • Voice messages

GEPA Prompt Optimization

Sage includes a GEPA (Genetic-Pareto) optimization system for automatically improving the agent instruction based on test cases and feedback.

How it works:

  1. Define training examples in examples/gepa/trainset.json with expected behaviors
  2. Run evaluation to get baseline score against current instruction
  3. Run optimization - Claude (judge) analyzes failures and proposes instruction improvements
  4. Kimi (program) is re-evaluated with the improved instruction
  5. Repeat until convergence or perfect score

Commands:

# Evaluate current instruction (baseline score)
just gepa-eval

# Run optimization loop (requires ANTHROPIC_API_KEY)
just gepa-optimize

# View optimized instruction
just gepa-show

# See training example categories
just gepa-examples

Environment:

# Required for GEPA optimization (Claude as judge)
ANTHROPIC_API_KEY=your-anthropic-key

# Program under test (Kimi via Maple)
MAPLE_API_URL=https://your-maple-endpoint
MAPLE_API_KEY=your-maple-key
MAPLE_MODEL=maple/kimi-k2-5

Training Examples: Training data is in examples/gepa/trainset.json. Each example includes:

  • Input scenario (user message or tool result)
  • Context (persona, human block, conversation history)
  • Expected behavior description
  • Good/bad response examples

Current categories: first-time users, casual chat, web search, memory storage, tool result processing, corrections.

Related Projects

  • Letta - Memory management inspiration
  • DSRs - DSPy in Rust
  • signal-cli - Signal CLI interface
  • Maple - Confidential compute LLM inference

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages