Autonomous AI player characters for Dungeons & Dragons 5th Edition. Locally hosted, fully offline, no cloud services required.
r.A.I.d lets a human Dungeon Master run a full adventuring party where some or all player characters are controlled by independent AI agents. Each character has its own personality, voice, memories, and decision-making. The entire system runs on a single gaming PC. Your campaign data never leaves your machine.
- What It Does
- Key Features
- Hardware Requirements
- Prerequisites
- Ollama Setup
- Installation
- Configuration
- Usage
- Random Party Generator
- Foundry VTT Setup
- Project Structure
- How It Works
- Contributing
- License
- Contact
Each AI-controlled character operates as an independent agent with its own:
- Personality and backstory that shapes how it speaks, acts, and makes decisions
- Character sheet awareness of abilities, spells, inventory, and hit points
- Private knowledge and secrets that other party members cannot access
- Long-term memory of past sessions, drawn from a searchable archive of everything the character has experienced
- Distinct speaking voice generated locally with per-character voice profiles
When a character's turn comes up, the system assembles that character's unique perspective (what they know, what they have seen, what they remember, what they care about) and generates a response. That response includes both in-character dialogue and a structured game action: attack a target, cast a spell, move to a position, investigate an object, and so on.
The system integrates directly with Foundry VTT. Each AI character logs into Foundry as its own user account, tied to its assigned actor. From inside your Foundry session, the AI characters can read and respond to the current game state, send in-character chat messages, execute dice rolls, move tokens on the map, and track their own hit points, spell slots, and equipment.
From the DM's perspective, running a session looks much the same as it would with human players. You narrate the scene, set encounters, and manage the world. The AI party members respond in turn, staying in character, acting on what they know (and only what they know), and making choices consistent with who they are.
- Fully offline - no API keys, no subscriptions, no internet connection needed during play
- Information hiding - characters only know what they should know. When the rogue scouts ahead alone, only the rogue receives those observations. No metagaming.
- Persistent memory - characters remember past sessions and develop over time through a three-tier memory system (working, episodic, reflective)
- Foundry VTT integration - AI characters appear as real players in your Foundry session with full access to game state, dice, chat, and token movement
- Per-character voices - each character speaks with a distinct voice via local text-to-speech
- Push-to-talk voice input - speak to your party and the system transcribes your words locally
- Single GPU footprint - the full stack fits comfortably in 16 GB of VRAM with headroom to spare
| Component | Minimum | Recommended |
|---|---|---|
| GPU | NVIDIA GPU, 12 GB VRAM | RTX 4080 Super or RTX 4090 (16 GB) |
| CPU | Modern multi-core x86_64 | AMD Ryzen 9 9950X3D or equivalent |
| RAM | 16 GB | 32 GB |
| OS | Windows 10/11 or Linux | Windows 11 |
| CUDA | CUDA 12 + cuDNN 9 | CUDA 12 + cuDNN 9 |
The full stack (LLM, STT, TTS, vector search, orchestrator) uses roughly 8-9 GB of VRAM in the default configuration, leaving ~7 GB of headroom on a 16 GB card.
Install these before setting up r.A.I.d:
1. Python 3.13+
Download from python.org or use your system package manager.
2. Ollama
Install from ollama.com. See the Ollama Setup section below for which models to pull and how to verify everything is working.
3. Foundry VTT (licensed, v12 or v13)
Purchase and install from foundryvtt.com. Install the foundry-rest-api module by ThreeHats from within Foundry's module browser. Enable the module in your world settings and note your API key.
4. Piper TTS (for default voice output)
Install from rhasspy/piper. Download voice models for each character you want to voice.
5. Faster-Whisper (for voice input, optional)
Installed automatically as a Python dependency. On Windows, the easiest CUDA setup is via Purfview's whisper-standalone-win which bundles all NVIDIA libraries.
r.A.I.d uses Ollama to run two models locally: one for agent decision-making (the LLM) and one for memory search (the embedding model). Both run on your machine with no cloud calls.
Download and run the installer from ollama.com. On Windows this installs Ollama as a background service that starts automatically. On Linux:
curl -fsSL https://ollama.com/install.sh | shAfter installation, verify Ollama is running:
curl http://localhost:11434/api/tagsYou should see a JSON response (possibly with an empty models list if you haven't pulled anything yet). If you get a connection error, start the service:
# Linux
systemctl start ollama
# Windows -- Ollama runs as a tray application. Launch it from the Start menu.r.A.I.d needs two models. Pull them both before your first session:
# 1. LLM -- generates dialogue, actions, and decisions for each character
ollama pull llama3.1:8b
# 2. Embedding model -- converts text into vectors for memory search
ollama pull nomic-embed-textWhat these models do:
| Model | Tag in Ollama | Size on disk | Purpose | Runs on |
|---|---|---|---|---|
| LLaMA 3.1 8B | llama3.1:8b |
~4.7 GB | Agent inference. Each character's turn sends a prompt to this model and gets back structured JSON with dialogue and actions. | GPU |
| Nomic Embed Text v1.5 | nomic-embed-text |
~275 MB | Memory embeddings. Every observation is converted to a 768-dimensional vector for semantic search. Retrieval queries are also embedded so the system can find relevant past memories. | CPU |
The LLM is the only component that requires a GPU. The embedding model runs on CPU and adds negligible latency (~5-15 ms per embedding).
ollama listYou should see both models in the output:
NAME ID SIZE MODIFIED
llama3.1:8b <hash> 4.7 GB ...
nomic-embed-text:latest <hash> 274 MB ...
Send a quick test prompt to confirm inference works:
ollama run llama3.1:8b "Say hello in one sentence."You should get a short text response within a few seconds. If this hangs or errors, check that your GPU drivers are up to date and that Ollama detected your GPU (ollama ps shows which device is in use).
curl http://localhost:11434/api/embed -d '{"model": "nomic-embed-text", "input": "test embedding"}'You should get a JSON response containing an embeddings array with one entry of 768 floating-point numbers. If you see {"error":"model not found"}, re-run ollama pull nomic-embed-text.
The default llama3.1:8b is a good balance of quality and speed on 16 GB cards. If you want to experiment:
| Model | Tag | Size | Notes |
|---|---|---|---|
| LLaMA 3.1 8B (default) | llama3.1:8b |
~4.7 GB | Recommended. Good instruction following, fits easily in 16 GB VRAM. |
| Mistral 7B | mistral:7b |
~4.1 GB | Slightly smaller. Good at structured output. |
| LLaMA 3.1 8B Q4_K_M | llama3.1:8b-instruct-q4_K_M |
~4.9 GB | More aggressive quantization tag. Marginally more accurate than default Q4. |
| Gemma 2 9B | gemma2:9b |
~5.4 GB | Larger but still fits 16 GB. Strong at roleplay. |
| LLaMA 3.3 70B Q4 | llama3.3:70b-instruct-q4_K_M |
~40 GB | Requires 48 GB+ VRAM (e.g. dual GPUs or A6000). Much higher quality output. |
To use a different model, pull it with ollama pull <tag> and update the ollama.model field in config.yaml. Do not change the embedding model unless you also update the vector dimension in the database schema (768 is specific to nomic-embed-text-v1.5).
With the default configuration, VRAM usage breaks down as follows:
| Component | Allocation |
|---|---|
| LLM weights (llama3.1:8b) | ~4.7 GB |
| KV cache | ~2.0-2.5 GB |
| CUDA overhead | ~0.5-1.0 GB |
| Total | ~7-8 GB |
| Remaining (on 16 GB card) | ~8-9 GB |
The embedding model and all other components (STT, TTS, orchestrator, SQLite) run on CPU and use zero VRAM.
# Clone the repository
git clone https://github.com/Heretyc/RAID.git
cd RAID
# Create and activate virtual environment
python -m venv .venv
# Windows
.venv\Scripts\activate
# Linux / macOS
source .venv/bin/activate
# Install runtime dependencies
python -m pip install -r raid/requirements.txt
# Install the sqlite-vec extension for memory search
python -m pip install sqlite-vecFor development:
# Install dev dependencies (testing, formatting, linting)
python -m pip install -r requirements-dev.txt# Confirm Ollama is reachable
python -c "import httpx; print(httpx.get('http://localhost:11434/api/tags').status_code)"
# Confirm sqlite-vec loads
python -c "import sqlite3, sqlite_vec; c = sqlite3.connect(':memory:'); c.enable_load_extension(True); sqlite_vec.load(c); print('sqlite-vec OK')"
# Run the test suite
pytest raid/tests/r.A.I.d uses a YAML configuration file that lives alongside the code at raid/config.yaml. A sample config ships with the repository. Copy and edit it for your setup.
The ollama section controls which LLM model the agents use and how the orchestrator talks to Ollama:
ollama:
base_url: "http://localhost:11434" # Ollama API address
model: "llama3.1:8b" # LLM for agent inference
timeout_seconds: 30 # HTTP timeout per request
max_retries: 2 # Retry count on server errorsbase_url -- The address where Ollama is listening. Default is http://localhost:11434. Change this only if you run Ollama on a different machine or port.
model -- The Ollama model tag used for all agent LLM calls (dialogue, actions, reflections). This must match a model you have already pulled with ollama pull. The default llama3.1:8b works well on 16 GB cards. See Choosing a different LLM above for alternatives.
The embedding model (nomic-embed-text-v1.5) is not configurable in config.yaml. It is hardcoded because changing it would require updating the vector dimension in the database schema.
game:
max_rounds: 100 # Safety limit on rounds per session
reflection_interval: 10 # Rounds between reflective memory generation
history_window: 10 # Number of recent events in working memory
session_name: "goblin_ambush_test" # Name used for log filesreflection_interval -- Every N rounds, each character generates a reflective summary of their recent experiences (e.g. "I'm starting to distrust the merchant"). Lower values produce more frequent character development at the cost of extra LLM calls. The default of 10 is a good balance.
Controls how much context each agent gets per LLM call. These are soft limits measured in estimated tokens (1 token ~ 4 characters):
token_budgets:
system_prompt: 1000 # Personality, backstory, character sheet
private_knowledge: 500 # Secrets only this character knows
shared_state: 1000 # Current scene, HP, initiative order
retrieved_memories: 1000 # RAG results from episodic/reflective memory
conversation_history: 4000 # Recent visible events
response_space: 500 # Reserved for model outputThe total per call is roughly 4,500-8,500 tokens, well within any 8K+ context model.
Each agent is defined with a personality, character sheet, and private knowledge:
agents:
- id: "warrior"
name: "Kael Ironbrand"
role: "pc"
personality: |
You are Kael Ironbrand, a gruff human fighter who speaks bluntly...
character_sheet:
class: "Fighter"
level: 5
hp: 52
ac: 18
abilities: {str: 18, dex: 12, con: 16, int: 8, wis: 13, cha: 10}
proficiencies: ["athletics", "intimidation", "perception"]
equipment: ["longsword", "shield", "chain_mail", "javelins"]
private_knowledge: |
You secretly carry a letter from your dead brother...
human_controlled: false
- id: "dm"
name: "Dungeon Master"
role: "dm"
personality: |
You are the Dungeon Master for a D&D 5e campaign...
scenario_seed: |
The party has arrived at the Crossroads Inn at dusk...
human_controlled: falseSet human_controlled: true on any agent to take manual control of that character via the CLI (or use --human warrior at launch).
scenario:
starting_phase: "exploration" # exploration, combat, or social
starting_scene: |
The Crossroads Inn. A two-story timber building at the junction of
the North Road and the Eastern Trail. Rain patters against the windows.| Key | Section | Description | Default |
|---|---|---|---|
ollama.base_url |
ollama |
Ollama API URL | http://localhost:11434 |
ollama.model |
ollama |
LLM model tag for agent inference | llama3.1:8b |
ollama.timeout_seconds |
ollama |
HTTP timeout per Ollama request | 30 |
ollama.max_retries |
ollama |
Retry count on server errors | 2 |
game.max_rounds |
game |
Maximum rounds before session ends | 100 |
game.reflection_interval |
game |
Rounds between reflective memory generation | 10 |
game.history_window |
game |
Recent events kept in working memory | 10 |
game.session_name |
game |
Name for log files | default_session |
token_budgets.* |
token_budgets |
Per-layer context token limits | See above |
scenario.starting_phase |
scenario |
Initial game phase | exploration |
scenario.starting_scene |
scenario |
Opening scene description | (empty) |
Make sure Ollama is running and both models are pulled (see Ollama Setup).
# Standard launch (package mode)
python -m raid --config raid/config.yaml
# Alternative: run the entry-point module directly
python -m raid.run --config raid/config.yaml
# Alternative: run the script directly
python raid/run.py --config raid/config.yaml
# Dry-run mode (no Ollama needed, uses placeholder responses)
python -m raid --config raid/config.yaml --dry-run
# Override a character as human-controlled
python -m raid --config raid/config.yaml --human warrior
# Debug logging (writes to ~/raid-debug.log)
python -m raid --config raid/config.yaml --debug
# Set log level without writing to file
python -m raid --config raid/config.yaml --log-level DEBUGAll three invocation methods (python -m raid, python -m raid.run, python raid/run.py) are equivalent and accept the same CLI flags.
- r.A.I.d loads the config and connects to Ollama to verify the configured model is available.
- It connects to Foundry VTT via the REST API and reads the current world state (if a Foundry bridge is configured; otherwise stubs are used).
- Each configured AI character authenticates as its own Foundry user account.
- The orchestrator enters the game loop, processing one character at a time in turn order.
The DM sets the scene in Foundry and starts combat. On each AI character's turn:
- The orchestrator reads the current game state from Foundry (initiative order, positions, HP, conditions).
- It assembles the character's context: system prompt, private knowledge, shared game state, relevant memories from the vector database, and recent conversation history.
- The LLM generates a structured response:
{
"dialogue": "Behind me, Elara! I will hold the line!",
"action": "attack",
"target": "goblin_warleader",
"weapon": "warhammer",
"movement": "move 15 feet to flank the goblin near Elara"
}- The orchestrator executes the action in Foundry: posts the dialogue as an in-character chat message, triggers the attack roll, and updates the token position.
- The character's observations from this turn are written to their episodic memory store.
pytest raid/tests/ # Run tests
mypy raid/ # Type checkr.A.I.d includes a built-in tool that generates a fully randomised adventuring party, places them in a tavern with an NPC, and optionally pushes everything into Foundry VTT -- all in one command.
- Player characters with random race (9 options), class (12 options), ability scores (4d6 drop lowest), equipment, proficiencies, spell slots, personality, backstory, and a private secret
- One tavern NPC (innkeeper, barmaid, or mysterious stranger) with a name, description, and hidden secret for the DM to use
- A tavern scene with atmosphere text and token positions
- A complete config.yaml ready to run with
run.py
# Generate 3 PCs at level 5 (writes raid/generated_config.yaml)
python raid/populate.py --num-pcs 3 --level 5
# Generate 4 PCs at level 3 with a fixed seed for reproducibility
python raid/populate.py --num-pcs 4 --level 3 --seed 42
# Preview without writing any files
python raid/populate.py --num-pcs 3 --dry-run
# Use your existing config.yaml for Ollama/Foundry settings
python raid/populate.py --num-pcs 3 --config raid/config.yaml
# Write to a custom output path
python raid/populate.py --num-pcs 3 --output raid/my_session.yaml# After generating:
python raid/run.py --config raid/generated_config.yaml
# Dry-run to test without Ollama:
python raid/run.py --config raid/generated_config.yaml --dry-run
# Take manual control of one character:
python raid/run.py --config raid/generated_config.yaml --human monk============================================================
The Red Rooster
============================================================
[monk] Elric Blackwood
Human Monk (Level 5)
HP: 28 AC: 14 Background: Acolyte
STR 18 DEX 19 CON 11 INT 13 WIS 10 CHA 11
[rogue] Lirael Galanodel
Elf Rogue (Level 5)
HP: 28 AC: 14 Background: Sage
STR 10 DEX 17 CON 11 INT 14 WIS 13 CHA 11
[sorcerer] Eryn Siannodel
Elf Sorcerer (Level 5)
HP: 27 AC: 13 Background: Folk Hero
STR 14 DEX 16 CON 12 INT 6 WIS 9 CHA 16
[NPC] Lurg Goresmasher (innkeeper)
Lurg Goresmasher, a matronly half-orc who treats every patron
like family
Scene: The Red Rooster. A cozy two-storey timber building on a
well-traveled road. A fire crackles in the hearth...
============================================================
If you have Foundry VTT running with the foundryvtt-rest-api module installed, the generator can create actors, a scene, and tokens directly in your world:
python raid/populate.py --num-pcs 3 --config raid/config.yaml --foundryThis requires an uncommented foundry section in your base config (see Foundry VTT Setup). The tool will:
- Connect to Foundry via the WebSocket relay
- Create an Actor document for each PC (type
character) and the NPC (typenpc) with full D&D 5e system data (abilities, HP, AC, movement, biography) - Create a tavern Scene (20x15 grid, 2000x1500 px)
- Place tokens for every character on the scene at pre-set tavern positions (PCs around tables, NPC behind the bar)
- Activate the scene so it appears for all connected players
- Write a config.yaml with all Foundry UUID mappings filled in (
agent_actor_map,agent_token_map,scene_id)
After population, launch the session and every AI character's chat messages, dice rolls, and actions will appear in Foundry:
python raid/run.py --config raid/generated_config.yamlpython raid/populate.py [options]
Options:
--config PATH Base config for Ollama/Foundry settings (optional)
--output PATH Output config path (default: raid/generated_config.yaml)
--num-pcs N Number of PCs to generate (default: 3)
--level N Character level, 1-10 (default: 5)
--seed N Random seed for reproducibility
--foundry Push actors, scene, and tokens to Foundry VTT
--dry-run Print generated scenario as JSON, write nothing
--debug Enable debug logging
| Races | Classes |
|---|---|
| Human, Elf, Dwarf, Halfling, Half-Elf | Fighter, Rogue, Wizard, Cleric, Ranger |
| Tiefling, Half-Orc, Gnome, Dragonborn | Bard, Paladin, Barbarian, Warlock, Sorcerer, Druid, Monk |
Each race has correct ability bonuses, size, speed, darkvision, and languages. Each class has correct hit dice, primary ability, starting equipment, AC calculation, proficiencies, and spell slots (for levels 1-10).
r.A.I.d integrates with Foundry VTT through the foundryvtt-rest-api module. RAID runs its own WebSocket server internally -- no external relay or middleware is needed.
Foundry VTT RAID
┌─────────────┐ WebSocket ┌──────────────┐
│ foundryvtt- │ ──────────────>│ FoundryRelay │
│ rest-api │ connects │ (ws server) │
│ module │<───────────────│ │
└─────────────┘ responses └──────┬───────┘
│
┌──────┴───────┐
│ FoundryGame │
│ Bridge │
│ (chat, dice, │
│ actors, │
│ tokens) │
└──────────────┘
The Foundry module connects outbound to RAID's WebSocket server. RAID sends requests (create actors, roll dice, post chat) and receives responses through this single connection.
In Foundry VTT, go to Settings > Manage Modules > Install Module and search for foundryvtt-rest-api by ThreeHats. Install and enable it in your world.
In the module settings inside Foundry:
- Set the WebSocket URL to
ws://localhost:3015(or whatever host:port RAID will listen on) - Set an API key -- this can be any string, but it must match what you put in RAID's config
- Note the Client ID (auto-generated, you don't need to change it)
Uncomment and fill in the foundry section of your config.yaml:
foundry:
api_key: "your-api-key" # Must match the module setting
ws_host: "localhost" # Host for RAID's WebSocket server
ws_port: 3015 # Port the Foundry module connects to
connection_timeout: 30.0 # Seconds to wait for the module
timeout_seconds: 10
max_retries: 3
scene_id: "Scene.your_scene_id"
agent_actor_map: # RAID agent ID -> Foundry actor UUID
warrior: "Actor.actor_id_for_warrior"
rogue: "Actor.actor_id_for_rogue"
mage: "Actor.actor_id_for_mage"
dm: "" # DM has no actor
agent_token_map: # RAID agent ID -> Foundry token UUID
warrior: "Scene.scene_id.Token.token_id_for_warrior"You can find actor and token UUIDs in Foundry by right-clicking an actor or token and selecting "Copy UUID".
Start RAID -- it will begin listening for the Foundry module to connect:
python raid/run.py --config raid/config.yamlThen open your Foundry world in a browser. The module will auto-connect to RAID's WebSocket server. You should see log output confirming the connection.
The easiest way to set up a Foundry session is to let populate.py create everything for you:
# 1. Make sure your config.yaml has the foundry section uncommented
# with at least api_key, ws_host, and ws_port filled in
# 2. Generate characters and push to Foundry
python raid/populate.py --num-pcs 3 --config raid/config.yaml --foundry
# 3. Run the session (config has all UUIDs pre-filled)
python raid/run.py --config raid/generated_config.yamlThis creates actors, a scene, and tokens automatically. No manual UUID copying required.
By default, run.py uses ResilientFoundryBridge which wraps the real bridge in a try/except. If Foundry is unreachable or the module disconnects mid-session, the game continues using stub fallbacks (local dice rolls, logged chat). Foundry being down never crashes the game loop.
raid/ # Main application package
run.py # CLI entry point (game sessions)
populate.py # CLI entry point (random party generation)
generate.py # D&D 5e character/scenario generation engine
orchestrator.py # Turn-based agent loop, sequential LLM calls
context_builder.py # Per-agent prompt assembly with info hiding
models.py # Pydantic schemas (config, actions, game state)
interfaces.py # Protocol ABCs (MemoryStore, GameBridge, Speech)
ollama_client.py # Synchronous Ollama HTTP client with retries
stubs.py # Stub implementations for standalone testing
config.yaml # Sample session configuration
requirements.txt # Runtime Python dependencies
foundry/ # Foundry VTT integration
bridge.py # GameBridge impl via foundryvtt-rest-api
relay.py # Internal WebSocket relay server
memory/ # RAG memory subsystem
store.py # SqliteMemoryStore (sqlite-vec + FTS5 hybrid)
tests/ # pytest-based tests
conftest.py # Shared fixtures
test_models.py # Model validation tests
test_context.py # Information hiding tests
test_orchestrator.py # Game loop and phase transition tests
test_foundry_bridge.py # Foundry bridge and relay tests
logs/ # Session JSONL logs (created at runtime)
One LLM instance (LLaMA 3.1 8B via Ollama) handles all characters sequentially. Each character gets a different prompt built from its own personality, knowledge, memories, and view of the game state. A lightweight Python orchestrator manages turn order and communicates with Foundry VTT. Memory is stored in SQLite with the sqlite-vec extension for semantic search and FTS5 for keyword search, merged via Reciprocal Rank Fusion. Voice I/O runs on CPU to keep all GPU memory available for the language model.
+--------------------------+ +--------------------------+
| Ollama | | Ollama |
| llama3.1:8b (GPU) | | nomic-embed-text (CPU) |
| Agent inference | | Memory embeddings |
+----------+---------------+ +----------+---------------+
| |
| Sequential calls, | Embed on store()
| one per character turn | and retrieve()
+-----+-----+-----+ |
| | | | +-------+--------+
Warrior Rogue Mage DM | |
| | | | v v
+-----+-----+-----+ +------+------+ +------+------+
| | sqlite-vec | | FTS5 |
+----------+---------------+ | KNN search | | BM25 search |
| Python Orchestrator | +------+------+ +------+------+
| Context assembly | | |
| Action parsing | +------- +-------+
| Memory read/write | |
+----------+---------------+ RRF merge (k=60)
| |
+----------+---------------+ +---------+---------+
| SQLite | | Retrieved |
| chunks table | <-----> | memories (top k) |
| Episodic + reflective | +-------------------+
+--------------------------+
|
+----------+---------------+ +---------------------+
| Foundry VTT | | Faster-Whisper |
| via foundry-rest-api | | (STT, CPU) |
| Game state, dice, chat | +---------------------+
+--------------------------+ +---------------------+
| Piper / Kokoro |
| (TTS, CPU) |
+---------------------+
Characters only know what they should know. This is enforced at the prompt construction level, not as a post-processing filter. Each character's context is assembled from shared pools (current scene, initiative order, publicly visible events) and private pools (character-specific secrets, whispered conversations, solo scouting observations).
When the rogue scouts ahead alone, the rogue's context includes those observations. The fighter's context does not. This happens because the orchestrator never writes those events to the fighter's memory store or includes them in the fighter's prompt. There is no "filter out what they shouldn't see" step because the information is never there to filter.
Working memory is the last 5-10 exchanges and the current scene, passed directly in the LLM prompt. Rebuilt every turn.
Episodic memory is every observation the character has made, stored as vector embeddings in sqlite-vec. Retrieved by semantic similarity to the current situation using nomic-embed-text-v1.5. Shared events go to all agents; private events go only to the relevant agent.
Reflective memory is generated every ~10 turns as a summary of recent experiences from the character's perspective. The orchestrator feeds the character's recent episodic memories back through the LLM and asks for a 2-4 sentence first-person reflection. These summaries capture character development ("I'm starting to distrust the merchant") and are stored as high-priority retrievable memories.
Memory retrieval uses hybrid search: both vector similarity (sqlite-vec) and keyword matching (FTS5) run independently, then results are merged using Reciprocal Rank Fusion (RRF). This means a relevant memory is found whether the match is semantic ("locked door with magic" finds "arcane runes on the gate") or lexical (exact keyword hit).
r.A.I.d is in active early development. Contributions are welcome.
- Read AGENT.md. It defines the project's mandatory coding standards, documentation requirements, and architectural rules. It is authoritative.
- All Python code targets 3.13+.
- Every function, method, and class must have a PEP 257 compliant docstring in reStructuredText style. See AGENT.md for the exact format and when full vs. summary docstrings are required.
- All code must include PEP 484 type hints.
- Fork the repository.
- Create a feature branch from
main. - Write your code. Write tests. Write docstrings.
- Run the full check suite before submitting:
mypy raid/ pytest raid/tests/
- Open a pull request against
mainwith a clear description of what your change does and why.
- Correctness over speed. Don't skip docs or tests to ship faster.
- Security. Never hardcode secrets. Validate all external input. See the Security Rules section in AGENT.md.
- Clarity. If your code needs a wall of comments to explain, it probably needs to be restructured.
- Consistency. Follow the patterns already in the codebase. If you think a pattern should change, open an issue first.
- Code without docstrings or type hints.
- Changes that weaken security without prior discussion.
- Additions of heavy multi-agent frameworks (AutoGen, CrewAI, LangGraph). The custom orchestrator is a deliberate architectural choice.
- Cloud service dependencies. r.A.I.d is offline by design.
This project is licensed under the Apache License 2.0. See LICENSE for the full text.
- Email: gitinquiry@ioc.dev
- Repository: github.com/Heretyc/RAID
- Issues: github.com/Heretyc/RAID/issues