Omnidex is a local-first agent runtime for evidence-led self-correcting development loops.
It turns model output into permissioned, evidence-checked work: plan, patch, verify, observe, and continue until the evidence says the task is done.
omni: deterministic local CLI for chat, command execution, research, install/update, and workspace-aware automation.- Specialist roles handle bounded jobs such as prompt interpretation, planning, shell command selection, summarization, done checks, retrieval, analysis, and verification.
- Model routing is configurable per role, so fast utility models and deeper reasoning models can be swapped independently.
- Skills and tools extend what Omnidex can do while deterministic code owns policy, execution, evidence, and state transitions.
- Evidence ledgers record objectives, commands, rejected commands, observed output, pending work, and final responses.
- Run traces summarize model calls, command counts, rejections, loop exhaustion, and completion-check pressure from existing session events.
- Development loops convert discovered failures into regression targets, make scoped changes, run targeted verification, and continue from concrete observations instead of starting over.
- Pathfinder is the high-reward problem-solving layer for stalled runs: it diagnoses the real blocker, scores candidate strategies, and hands one validated next action back to the normal runtime.
agent-core: API + Postgres queue + worker pipeline for service-backed workflows.agent-cli: queue/API CLI for enqueueing and inspecting core jobs; helper aliases expose it for advanced workflows.
License: MIT.
- Deterministic control plane: models propose structured outputs; code validates, gates, executes, and records evidence.
- Hot-swappable model roles: each specialist can use the model best suited to its job.
- Minimal context by default: specialists receive the narrow slice of memory, history, and artifacts they need.
- Evidence ledger by default: work is explainable after the fact, including rejected commands and remaining objectives.
- Declarative recipes: repeatable task patterns can define objectives, command classes, and evidence requirements without hardcoding task logic into the command loop.
- Relevance-first retrieval: tags + pgvector similarity (
memory_chunks.embedding) before analysis/response. - Queue-native processing: workers lease steps with
FOR UPDATE SKIP LOCKED. - Cognition routing in-core: fast models handle high-frequency utility steps; reasoning models are used for deeper synthesis.
mkdir demo-calculator && cd demo-calculator
omni chatPrompt:
Create a small npm frontend project using Stimulus, RecyclrJS, Tailwind CSS, and webpack.
Initialize npm, install dependencies, create a minimal calculator page, wire a Stimulus controller,
use RecyclrJS, run a build or smoke test, and summarize the evidence.
Then export the evidence ledger:
omni ledger export --out evidence-ledger.json
omni run:trace latest
omni bench reportUseful deterministic surfaces:
omni fastpath project.probe
omni index build
omni patch apply --file change.diff --dry-run
omni fingerprint --text "npm error code E404"
omni ollama prewarm --jsonSee docs/DEVELOPMENT_LOOPS.md, docs/VALIDATED_PLAYBOOKS.md, docs/EVIDENCE_LEDGER.md, docs/RUN_TRACE.md, docs/FAST_PATHS.md, docs/WORKSPACE_INDEX.md, docs/COMMAND_CACHE.md, docs/PATCH_MODE.md, docs/FAILURE_FINGERPRINTS.md, docs/OLLAMA_PREWARM.md, docs/COMMAND_POLICY.md, docs/RECIPES.md, docs/BENCHMARKS.md, docs/ROADMAP.md, and SECURITY.md.
For embedding Omnidex into other apps as a local memory-backed chat/RP/support service, see docs/LOCAL_SERVICE_CHANNELS.md.
Default step pipelines by type:
assistant:plan -> tooling -> workspace_scan -> tag -> retrieve -> web_search -> analyze -> assist -> verifychat:plan -> tooling -> workspace_scan -> tag -> retrieve -> web_search -> analyze -> roleplay -> verifystory:plan -> tooling -> workspace_scan -> tag -> retrieve -> web_search -> analyze -> narrate -> verify
Worker runtime uses a stage-driven orchestrator with stable per-stage context contracts:
tooling-> writestoolingworkspace_scan-> writesworkspacetag-> writestagsretrieve-> writesretrievalplan-> writesplanweb_search-> writesweb_searchanalyze-> writesanalyzerassist|roleplay|narrate-> writes that action keyverify-> writesverification
This keeps queue/API compatibility while making execution flow linear and easier to reason about/extend.
Pathfinder is Omnidex's meta-problem-solving specialist. It is invoked when the normal routine stalls, loops, hits repeated false completion, encounters invalid file assumptions, or needs a better strategy than asking the generic planner for another command.
Pathfinder receives a ProblemCase built from current evidence: the user goal, phase, pending objectives, objective ledger, worksite survey, prep context, codebase route, recent observations, failed/rejected commands, false-done count, loop state, available tools, constraints, and success condition.
It returns a BreakthroughPacket:
- diagnosis and real blocker
- assumptions and evidence used
- at least three candidate strategies with practical scores
- selected strategy and expected progress
- next action such as inspect, research, proof-contract update, patch, verification, external-agent delegation, objective/context adjustment, ask-user, or fail-with-evidence
- forbidden paths and proof needed after the action
Pathfinder cannot mark objectives complete. Its output must pass deterministic packet validation, then any action is dispatched through normal Omnidex policy. Completion still requires machine-checkable evidence from proof commands, artifact validation, scope checks, and objective predicates.
Live progress reports these pipeline phases explicitly:
planning:planexecution:tooling,workspace_scan,tag,retrieve,web_search,analyze,assist|roleplay|narratereview:verify
web_search uses fixed provider routes (Yahoo, Google, Reddit-via-Google) and query normalization (spaces/commas -> +, strip non-alphanumeric symbols, collapse duplicate +).
plan derives a deterministic JSON execution plan, tooling audits host capability and install hints, workspace_scan snapshots repository context, and verify enforces grounding checks (with optional persistent replan recovery).
Jobs can still pause/continue through feedback, be steered with interrupt, or be reset with replan.
See internal/worker/RUNTIME.md for stage contracts and extension points.
Memory is typed so retrieval can prioritize durable guidance:
instruction(rules/policies)procedural(how-to workflows)reference(books/docs/transcripts/subtitles)preference(user/project tendencies)episodic(interaction history)
After each response, core can infer long-term memories (procedural / instruction / preference) and store or correct near-duplicate inferred entries.
Tables are in migrations/001_init.sql and created automatically on startup when MIGRATE_ON_STARTUP=true.
cd omnidex
cp default.env .env
docker compose up --buildDo not set PATH= in .env for compose — see Docker troubleshooting and default.env comments.
Core API is exposed on http://localhost:8090.
Postgres stays on an internal Docker network (omnidex-internal) and is not published to the host by default.
This usually means host .env set PATH= (often for Cursor/npm on the host bridge) and Docker Compose injected it into the core container. A host PATH like /home/you/.local/share/mise/shims:/usr/bin:/bin does not include /usr/local/bin, where the image installs agent-core.
Fix:
- Remove
PATH=from.env(recommended). Put host-only PATH in~/.config/omni/host-bridge.envinstead:
# ~/.config/omni/host-bridge.env — host bridge only, not core .env
OMNI_CURSOR_NODE_BIN=/home/you/.local/share/mise/installs/node/VERSION/bin/node
OMNI_CURSOR_NPM_BIN=/home/you/.local/share/mise/installs/node/VERSION/bin/npm
PATH=/home/you/.local/share/mise/shims:/usr/bin:/bin- Rebuild and restart after pulling a fix or editing
.env:
docker compose up --build -d coreThe image runs /usr/local/bin/agent-core directly and docker-compose.yml pins container PATH; either should survive a stray host PATH in .env.
Verify the binary exists:
docker compose run --rm --entrypoint sh core -c 'ls -la /usr/local/bin/agent-core'The engine is not running, or your shell user cannot access the socket.
- Start Docker (Arch Linux example):
sudo systemctl start docker
sudo systemctl enable docker
systemctl is-active docker- Add your user to the
dockergroup (one-time), then open a new login session (log out/in, ornewgrp docker):
sudo usermod -aG docker "$USER"
newgrp docker # or restart the terminal / re-login
groups | grep docker- Check the socket:
ls -l /var/run/docker.sockYou want root docker ownership and your user in group docker.
-
Temporary workaround:
sudo docker compose up --build -d core(not ideal long-term). -
Rootless Docker: if you use rootless, point the client at the user socket:
export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/docker.sockOn macOS/Windows, start Docker Desktop before docker compose; the Homebrew docker package is client-only.
docker compose up --build -d core
docker compose ps
docker compose logs -f core --tail 50Or from an install path: ./update.sh / omni update (requires a working Docker daemon).
For safer agent work, you can run the core service in Docker and run the CLI from a separate disposable container on the same Docker network. In that setup, Omnidex sees only the files and tools mounted into the CLI container. Your host source tree, host package managers, and host system dependencies are not touched unless you explicitly mount them writable.
The recommended local topology is:
native host Ollama
<- reached through host.docker.internal:11434
Docker core container
<- reached through http://core:8090 on the compose network
disposable CLI/toolbox container
Keep Ollama running natively on the system when possible, especially for GPU access, driver stability, and model cache reuse. The core container should connect outward to that host Ollama instance with OLLAMA_BASE_URL=http://host.docker.internal:11434; Ollama does not need to run inside the Docker network.
Start the core:
docker compose up --build -d coreThen run a CLI/toolbox container on the compose network. The exact network name is usually <repo>_omnidex-internal; from this repo it is commonly omnidex_omnidex-internal:
docker network ls --filter name=omnidex-internalExample service-backed CLI run:
mkdir -p "$HOME/omnidex-sandbox"
docker run --rm -it \
--network omnidex_omnidex-internal \
-e CORE_URL=http://core:8090 \
-v "$PWD":/src/omnidex:ro \
-v "$HOME/omnidex-sandbox":/workspace \
-w /src/omnidex \
golang:1.24.1-bookworm \
go run ./cmd/cli enqueue --pipeline assistant --workspace on --approval auto --verify auto \
"Inspect /workspace and suggest the next safe development task."For local deterministic omni work inside the same kind of isolated container:
docker run --rm -it \
--network omnidex_omnidex-internal \
-e CORE_URL=http://core:8090 \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v "$PWD":/src/omnidex:ro \
-v "$HOME/omnidex-sandbox":/workspace \
-w /workspace \
golang:1.24.1-bookworm \
bash -lc 'cd /src/omnidex && go run ./cmd/omni chat'Use a writable mount only for the project you want Omnidex to edit. Mount the Omnidex source read-only, mount throwaway dependency caches if you want repeatable installs, and avoid mounting /, $HOME, Docker sockets, SSH keys, cloud credentials, or production repositories into the toolbox container.
Core exposes non-agent channel routes for applications that need a local assistant, support bot, roleplay participant, narrator, or instruction-following model with memory:
POST /v1/channelsGET /v1/channelsPOST /v1/channels/{id}/messagesGET /v1/channels/{id}/messages
Channels use configured model/persona/system/context/tags, retrieve channel-scoped memory, call the selected model, store recent messages, and persist conversation turns as memory. They do not run shell commands or agent jobs.
See docs/LOCAL_SERVICE_CHANNELS.md for install steps and JavaScript, Python, Go, support, and roleplay examples.
Recommended: run Ollama natively on the host and let the Dockerized core connect to it through Docker's host gateway. Keep:
OLLAMA_BASE_URL=http://host.docker.internal:11434
If Ollama runs in another container, set OLLAMA_BASE_URL to that service URL.
Linux note: if jobs fail with connect: connection refused to host.docker.internal:11434, Ollama is usually bound to 127.0.0.1 only. Expose it on all interfaces and restart:
OLLAMA_HOST=0.0.0.0:11434 ollama serveIf you run Ollama as a systemd service, set OLLAMA_HOST=0.0.0.0:11434 in the service environment override, then restart the service.
If chat or research jobs fail with Ollama timeouts, /api/tags errors, or generation provider ollama is unreachable from core:
- Rebuild and restart core
docker compose up --build -d core-
Check
.env— If you still haveOLLAMA_BASE_URL=http://172.20.0.1:11434and that gateway is wrong, either remove it (use the defaulthttp://host.docker.internal:11434) or fix the IP. Core also auto-probes fallbacks at startup (host.docker.internal, the Docker bridge gateway, then localhost); check core logs forollama endpoint resolved. -
Ensure Ollama listens on all interfaces on the host
OLLAMA_HOST=0.0.0.0:11434 ollama serve- Verify from inside the core container
docker compose exec core wget -qO- --timeout=5 http://host.docker.internal:11434/api/tagsIf your compose project uses a different container name, substitute it (for example docker exec omni-nxt-core-1 ...).
- Arch Linux + UFW — If probes time out (not
connection refused) from inside the container whilecurl http://127.0.0.1:11434/api/tagsworks on the host, UFW is usually blocking Docker bridge traffic. Allow host ports from Docker networks:
scripts/ufw-docker-host.sh
# or manually:
sudo ufw allow from 172.16.0.0/12 to any port 11434,8091 proto tcp
docker compose up -d core- Read the startup logs
- If all candidates failed, Ollama is not reachable from Docker at all — that is a host networking/binding issue, not the UI.
- If logs show a successful resolved URL, chat should queue immediately even when Status briefly still shows Ollama as down.
The web UI Status panel includes a Research Health card backed by GET /v1/status/research. It checks whether the core can reach Ollama at OLLAMA_BASE_URL, reports configured/missing generation and embedding models, probes configured web search providers, and surfaces Ollama reachability warnings.
If a configured Ollama generation model is missing, Omnidex pulls it through Ollama's /api/pull endpoint and retries the request. First use can take as long as the model download. You can avoid that delay by pre-pulling:
ollama pull qwen2.5:7b
ollama pull nomic-embed-textOmnidex can drive many sequential specialist calls. A weak or half-configured Ollama setup may look fine for one chat request, then fail during long planning, research, verification, or memory-indexing runs. Before using Omnidex for serious agent loops, verify Ollama under sustained load.
Recommended checks:
ollama --version
ollama list
ollama pull qwen2.5-coder:7b
ollama pull qwen2.5:7b
ollama pull nomic-embed-text
omni ollama prewarm --jsonFor memory retrieval, use an embedding model whose output dimension matches the database schema. The default local vector column is vector(768), so nomic-embed-text is a good Ollama default:
OLLAMA_EMBEDDING_MODEL=nomic-embed-textOn Linux AMD systems, confirm the server is actually using the GPU instead of silently falling back to CPU. Useful tools include:
ollama ps
journalctl -u ollama -f
ls -l /dev/dri /dev/kfd
groupsDuring a long generation, watch GPU utilization with one of:
amdgpu_top
radeontop
watch -n 1 rocm-smiThe exact AMD path depends on GPU generation, kernel, Mesa, ROCm, and Ollama build. Official Ollama docs currently describe AMD ROCm support on Linux and note additional AMD coverage through Vulkan. On Arch Linux, the most practical paths are usually:
ollama-rocmwhen the GPU is supported by ROCm and/dev/kfdaccess is working.- a Vulkan-enabled Ollama build/package when ROCm does not fully support the device or does not use the whole GPU.
- CPU fallback only as a last resort for small models or debugging.
For Arch Linux AMD laptops/desktops, including Framework Laptop 16 GPU configurations, check:
pacman -Qs 'ollama|rocm|vulkan|mesa|amdgpu'
vulkaninfo --summary
rocminfoIf Ollama runs as a systemd service, put GPU and networking settings in an override:
sudo systemctl edit ollamaExample override:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_KEEP_ALIVE=30m"
Environment="OLLAMA_EMBEDDING_MODEL=nomic-embed-text"Then reload and restart:
sudo systemctl daemon-reload
sudo systemctl restart ollama
journalctl -u ollama -fIf you use a Vulkan-enabled Ollama build, set the Vulkan flag required by that build/package in the same override. Some builds use:
[Service]
Environment="OLLAMA_VULKAN=1"Do not assume Vulkan or ROCm is active because the model runs. Prove it with utilization during generation and by checking Ollama logs for the selected backend. For Omnidex stress testing, run several long prompts or research jobs back to back and watch for:
context canceled- HTTP 500/connection reset errors from Ollama
- model pulls during active jobs
- repeated CPU fallback
- thermal throttling
- VRAM exhaustion or partial offload warnings
Helpful references:
- Ollama GPU docs:
https://docs.ollama.com/gpu - Ollama Linux docs:
https://docs.ollama.com/linux - Ollama troubleshooting:
https://docs.ollama.com/troubleshooting - Arch package notes for Ollama/ROCm/Vulkan from your distribution packages
Omnidex supports these model sources for generation:
| Provider | LLM_PROVIDER values |
Required credential | Default model |
|---|---|---|---|
| Ollama | ollama, local |
none | llama3.2 in core, CLI defaults vary by role |
| OpenAI-compatible OpenAI API | openai, chatgpt, chat-gpt |
OPENAI_API_KEY |
gpt-4.1-mini |
| Microsoft Azure AI / Azure OpenAI | azure, azureai, azure-openai, microsoft, windows-ai |
AZURE_AI_API_KEY or AZURE_OPENAI_API_KEY |
deployment/model from AZURE_AI_MODEL or AZURE_OPENAI_DEPLOYMENT |
| xAI Grok | xai, x-ai, grok, grock |
XAI_API_KEY or GROK_API_KEY |
grok-4.3 |
| Google Gemini | google, gemini, googleai, google-ai |
GOOGLE_API_KEY or GEMINI_API_KEY |
gemini-2.0-flash |
| Anthropic Claude | anthropic, claude |
ANTHROPIC_API_KEY |
claude-sonnet-4-20250514 |
| Hugging Face Inference Providers | huggingface, hugging-face, hf |
HUGGINGFACE_API_KEY or HF_TOKEN |
openai/gpt-oss-20b:fastest |
Generation and embeddings are routed separately. LLM_PROVIDER controls chat, planning, summarization, specialists, and response generation. EMBEDDING_PROVIDER controls memory vectors and retrieval embeddings. Anthropic and xAI/Grok are generation-only in Omnidex right now; use Ollama, OpenAI, Google, or Hugging Face for embeddings.
To run with OpenAI instead of Ollama:
LLM_PROVIDER=openaiOPENAI_API_KEY=...- optional
OPENAI_MODEL=gpt-4.1-mini - optional
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
To run with Microsoft Azure AI / Azure OpenAI:
LLM_PROVIDER=azure,LLM_PROVIDER=azure-openai,LLM_PROVIDER=microsoft, orLLM_PROVIDER=windows-aiAZURE_AI_API_KEY=...orAZURE_OPENAI_API_KEY=...- for the current Azure OpenAI v1-compatible API, set
AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com,AZURE_AI_API_STYLE=v1, andAZURE_OPENAI_DEPLOYMENT=<chat-deployment> - for older Azure OpenAI deployment routes, set
AZURE_AI_API_STYLE=azure_openai,AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com, andAZURE_OPENAI_DEPLOYMENT=<chat-deployment> - optional
AZURE_OPENAI_EMBEDDING_DEPLOYMENT=<embedding-deployment>when using Azure for memory vectors - optional
AZURE_AI_API_VERSION=2024-10-21for deployment routes - for Azure AI Foundry model inference, set
AZURE_AI_BASE_URL=https://<resource>.services.ai.azure.com,AZURE_AI_API_STYLE=foundry, andAZURE_AI_MODEL=<model-or-deployment>
To run with xAI Grok:
LLM_PROVIDER=xai,LLM_PROVIDER=grok, orLLM_PROVIDER=grockXAI_API_KEY=...orGROK_API_KEY=...- optional
XAI_BASE_URL=https://api.x.ai/v1 - optional
XAI_MODEL=grok-4.3 - keep
EMBEDDING_PROVIDER=ollama|openai|google|huggingface, because Grok generation uses xAI's OpenAI-compatible chat-completions API while Omnidex memory vectors need a configured embedding provider.
To run with Google Gemini:
LLM_PROVIDER=googleorLLM_PROVIDER=geminiGOOGLE_API_KEY=...orGEMINI_API_KEY=...- optional
GOOGLE_MODEL=gemini-2.0-flash - optional
GOOGLE_EMBEDDING_MODEL=text-embedding-004
To run with Anthropic Claude:
LLM_PROVIDER=anthropicorLLM_PROVIDER=claudeANTHROPIC_API_KEY=...- optional
ANTHROPIC_MODEL=claude-sonnet-4-20250514 - keep
EMBEDDING_PROVIDER=ollama|openai|google|huggingface, because Anthropic does not provide a native embeddings API.
To run with Hugging Face Inference Providers:
LLM_PROVIDER=huggingfaceorLLM_PROVIDER=hfHUGGINGFACE_API_KEY=...orHF_TOKEN=...- optional
HUGGINGFACE_MODEL=openai/gpt-oss-20b:fastest - optional
HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-mpnet-base-v2
EMBEDDING_PROVIDER can be set independently from LLM_PROVIDER when you want one provider for generation and another provider for memory vectors. This is required for Anthropic and useful when you want stable vector(768) memory dimensions while testing different generation models.
Common setups:
# Fully local generation and embeddings.
LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://host.docker.internal:11434
OLLAMA_MODEL=qwen2.5-coder:7b
EMBEDDING_PROVIDER=ollama
OLLAMA_EMBEDDING_MODEL=nomic-embed-text# OpenAI for generation, local Ollama embeddings for stable memory dimensions.
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4.1-mini
OPENAI_MODEL_REASONING=gpt-4.1
OPENAI_MODEL_PLANNER=gpt-4.1
EMBEDDING_PROVIDER=ollama
OLLAMA_EMBEDDING_MODEL=nomic-embed-text# Grok for generation, local Ollama embeddings.
LLM_PROVIDER=xai
XAI_API_KEY=xai-...
XAI_MODEL=grok-4.3
XAI_MODEL_FAST=grok-4.3
EMBEDDING_PROVIDER=ollama
OLLAMA_EMBEDDING_MODEL=nomic-embed-text# Claude for generation, OpenAI embeddings.
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-20250514
ANTHROPIC_MODEL_FAST=claude-3-5-haiku-latest
EMBEDDING_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_EMBEDDING_MODEL=text-embedding-3-small# Gemini for generation and embeddings.
LLM_PROVIDER=gemini
GEMINI_API_KEY=...
GEMINI_MODEL=gemini-2.0-flash
GEMINI_MODEL_REASONING=gemini-2.5-pro
EMBEDDING_PROVIDER=google
GEMINI_EMBEDDING_MODEL=text-embedding-004# Hugging Face Inference Providers.
LLM_PROVIDER=hf
HF_TOKEN=hf_...
HF_MODEL=openai/gpt-oss-20b:fastest
HF_MODEL_FAST=meta-llama/Llama-3.1-8B-Instruct:fireworks-ai
EMBEDDING_PROVIDER=huggingface
HF_EMBEDDING_MODEL=sentence-transformers/all-mpnet-base-v2Omnidex can delegate implementation work to Cursor or Codex as bounded external coding agents. These agents are not the planner, validator, or completion authority. Omnidex still owns workspace survey, memory lookup, context compaction, objective evidence, proof commands, artifact validation, scope validation, and final completion decisions.
The external-agent flow is:
Omnidex prepares a mission packet
-> Cursor/Codex receives exact task, edit surface, read-only context, forbidden actions, and proof contract
-> Cursor/Codex edits files as an implementation worker
-> Omnidex streams agent events into the timeline
-> Omnidex runs local proof commands and artifact checks
-> Omnidex accepts, rejects, or repairs from deterministic evidence
External agent output is recorded as implementation evidence only. A streamed completed event means the worker claims its implementation is ready for validation; it does not mean the Omnidex objective is done.
External agent delegation is opt-in. If no agent is explicitly enabled and selected, Omnidex uses its local coding system and deterministic handlers.
Cursor delegation requires both selection and enablement:
OMNI_ARCHITECT_AGENT=cursor
OMNI_ENABLE_CURSOR_ARCHITECT=true
CURSOR_API_KEY=...
OMNI_CURSOR_MODEL=composer-2
OMNI_CURSOR_TIMEOUT=90m
OMNI_CURSOR_INSTALL_TIMEOUT=10m
OMNI_CURSOR_SDK_RUNNER_DIR=
OMNI_CURSOR_NODE_BIN=node
OMNI_CURSOR_NPM_BIN=npm
OMNI_DISABLE_CURSOR_ARCHITECT=falseCodex delegation also requires both selection and enablement:
OMNI_ARCHITECT_AGENT=codex
OMNI_ENABLE_CODEX_ARCHITECT=true
CODEX_API_KEY=...
# OPENAI_API_KEY is also accepted when CODEX_API_KEY is unset.
OMNI_CODEX_MODEL=gpt-5.3-codex
OMNI_CODEX_TIMEOUT=90m
OMNI_CODEX_INSTALL_TIMEOUT=10m
OMNI_CODEX_SDK_RUNNER_DIR=
OMNI_CODEX_NODE_BIN=node
OMNI_CODEX_NPM_BIN=npm
OMNI_CODEX_BIN=codex
OMNI_DISABLE_CODEX_ARCHITECT=falseSet OMNI_ARCHITECT_AGENT=none or leave it unset to force local Omnidex execution even when SDK credentials are present.
Mission packets are compact by design. They include:
task,mode,workspace, andtarget_root- detected worksite state, package manager, and frameworks
- exact edit surface and read-only context files
- requested objectives
- proof commands, artifact checks, and evidence predicates
- forbidden actions such as sibling project creation, unrequested dependencies, backend/routing additions, test weakening, and completion claims
- prepared context from route planning, documentation briefs, and relevant memories
Human correction is treated as higher-authority current-run context. The safe baseline behavior is cancel and restart: Omnidex cancels the active external session, runs cleanup, revises the mission packet with the correction, refreshes the allowed/forbidden scope, and starts a new external session. Same-session interrupt/resume can be added per adapter when the underlying SDK supports it reliably.
Operational rules:
- Do not expose external-agent execution in untrusted/public environments.
- External agents must not push git, create sibling projects, or install unrequested dependencies.
- Shell/process cleanup matters after cancel, failure, or completion.
- Local proof gates remain mandatory: build/test/smoke commands, artifact validation, scope/dependency checks, and objective evidence predicates.
By default compose mounts your parent directory read-only into /workspace and the core scans from there.
Set HOST_WORKSPACE_PATH to control what gets mounted.
Environment variables:
WEB_SEARCH_ENABLED=true|falseWEB_SEARCH_PROVIDERS=duckduckgo,google,redditWEB_SEARCH_TIMEOUT=15sWEB_SEARCH_PER_SOURCE_BUDGET=3000WEB_SEARCH_TOTAL_BUDGET=6000WORKSPACE_SCAN_ENABLED=true|falseWORKSPACE_ROOT=/workspaceWORKSPACE_MAX_FILES=5000WORKSPACE_CONTEXT_BUDGET=6000
Environment variables use the selected generation provider as a prefix. For example, when LLM_PROVIDER=openai, MODEL_PLANNER is read from OPENAI_MODEL_PLANNER; when LLM_PROVIDER=google, it is read from GOOGLE_MODEL_PLANNER or GEMINI_MODEL_PLANNER; when LLM_PROVIDER=ollama, it is read from OLLAMA_MODEL_PLANNER or OMNI_PLANNER_MODEL.
Routing fallback order:
*_MODELis the default generation model.*_MODEL_FASTdefaults to*_MODEL.*_MODEL_REASONINGdefaults to*_MODEL.*_MODEL_TAGGERdefaults to fast.*_MODEL_SEARCHdefaults to fast.*_MODEL_MEMORYdefaults to fast.*_MODEL_ANALYZER,*_MODEL_PLANNER, and*_MODEL_RESPONDERdefault to reasoning.- specialist models default to the closest role model unless explicitly configured.
Role-specific model variables let you tune cost, speed, and quality without changing code:
# Example: cheap fast model, stronger planner, strong shell/code specialist.
OLLAMA_MODEL=qwen2.5-coder:7b
OLLAMA_MODEL_FAST=qwen2.5:7b
OLLAMA_MODEL_REASONING=qwen2.5:14b
OLLAMA_MODEL_PLANNER=qwen2.5-coder:14b
OLLAMA_MODEL_EVALUATOR=qwen2.5:7b
OLLAMA_MODEL_SPECIALIST_SHELL_EXECUTION=qwen2.5-coder:14b
OLLAMA_MODEL_SPECIALIST_WEB_RESEARCH=qwen2.5:7b
OLLAMA_MODEL_SPECIALIST_MEMORY_RETRIEVAL=qwen2.5:7bThe same suffixes work for hosted providers:
LLM_PROVIDER=openai
OPENAI_MODEL=gpt-4.1-mini
OPENAI_MODEL_FAST=gpt-4.1-mini
OPENAI_MODEL_REASONING=gpt-4.1
OPENAI_MODEL_PLANNER=gpt-4.1
OPENAI_MODEL_SPECIALIST_SHELL_EXECUTION=gpt-4.1
OPENAI_MODEL_SPECIALIST_WEB_RESEARCH=gpt-4.1-miniSupported environment variables:
LLM_PROVIDER=ollama|openai|azure|xai|google|anthropic|huggingfaceEMBEDDING_PROVIDER=ollama|openai|azure|google|huggingfaceOPENAI_API_KEY(required whenLLM_PROVIDER=openai)OPENAI_BASE_URL(defaulthttps://api.openai.com/v1)OPENAI_MODEL(default fallback when provider is OpenAI)OPENAI_MODEL_FASTOPENAI_MODEL_REASONINGOPENAI_MODEL_TAGGEROPENAI_MODEL_PLANNEROPENAI_MODEL_ANALYZEROPENAI_MODEL_RESPONDEROPENAI_MODEL_SEARCHOPENAI_MODEL_MEMORYOPENAI_EMBEDDING_MODELAZURE_AI_API_KEY/AZURE_OPENAI_API_KEY(required whenLLM_PROVIDER=azure)AZURE_AI_BASE_URL/AZURE_OPENAI_ENDPOINT(required whenLLM_PROVIDER=azure)AZURE_AI_API_VERSION/AZURE_OPENAI_API_VERSION(defaults to2024-10-21for Azure OpenAI deployment routes and2024-05-01-previewfor Foundry)AZURE_AI_API_STYLE/AZURE_OPENAI_API_STYLE(v1,azure_openai, orfoundry; defaults tofoundryfor*.services.ai.azure.com, tov1when the base URL contains/openai/v1, otherwise Azure OpenAI deployment routes)AZURE_AI_MODEL/AZURE_OPENAI_DEPLOYMENTAZURE_AI_MODEL_FAST,AZURE_AI_MODEL_REASONING,AZURE_AI_MODEL_TAGGER,AZURE_AI_MODEL_PLANNER,AZURE_AI_MODEL_ANALYZER,AZURE_AI_MODEL_RESPONDER,AZURE_AI_MODEL_SEARCH,AZURE_AI_MODEL_MEMORYAZURE_AI_EMBEDDING_MODEL/AZURE_OPENAI_EMBEDDING_DEPLOYMENTXAI_API_KEY/GROK_API_KEY(required whenLLM_PROVIDER=xai|grok|grock)XAI_BASE_URL/GROK_BASE_URL(defaulthttps://api.x.ai/v1)XAI_MODEL/GROK_MODELXAI_MODEL_FAST,XAI_MODEL_REASONING,XAI_MODEL_TAGGER,XAI_MODEL_PLANNER,XAI_MODEL_ANALYZER,XAI_MODEL_RESPONDER,XAI_MODEL_SEARCH,XAI_MODEL_MEMORYXAI_EMBEDDING_PROVIDER/GROK_EMBEDDING_PROVIDER(defaultollamawhenLLM_PROVIDER=xai|grok|grock)GOOGLE_API_KEY/GEMINI_API_KEY(required whenLLM_PROVIDER=google)GOOGLE_BASE_URL(defaulthttps://generativelanguage.googleapis.com/v1beta)GOOGLE_MODEL/GEMINI_MODELGOOGLE_MODEL_FAST,GOOGLE_MODEL_REASONING,GOOGLE_MODEL_TAGGER,GOOGLE_MODEL_PLANNER,GOOGLE_MODEL_ANALYZER,GOOGLE_MODEL_RESPONDER,GOOGLE_MODEL_SEARCH,GOOGLE_MODEL_MEMORYGOOGLE_EMBEDDING_MODEL/GEMINI_EMBEDDING_MODELANTHROPIC_API_KEY(required whenLLM_PROVIDER=anthropic)ANTHROPIC_BASE_URL(defaulthttps://api.anthropic.com/v1)ANTHROPIC_VERSION(default2023-06-01)ANTHROPIC_MAX_TOKENS(default4096)ANTHROPIC_MODEL/CLAUDE_MODELANTHROPIC_MODEL_FAST,ANTHROPIC_MODEL_REASONING,ANTHROPIC_MODEL_TAGGER,ANTHROPIC_MODEL_PLANNER,ANTHROPIC_MODEL_ANALYZER,ANTHROPIC_MODEL_RESPONDER,ANTHROPIC_MODEL_SEARCH,ANTHROPIC_MODEL_MEMORYANTHROPIC_EMBEDDING_PROVIDER(defaultollamawhenLLM_PROVIDER=anthropic)HUGGINGFACE_API_KEY/HF_TOKEN(required whenLLM_PROVIDER=huggingface)HUGGINGFACE_BASE_URL(defaulthttps://router.huggingface.co)HUGGINGFACE_MODEL/HF_MODELHUGGINGFACE_MODEL_FAST,HUGGINGFACE_MODEL_REASONING,HUGGINGFACE_MODEL_TAGGER,HUGGINGFACE_MODEL_PLANNER,HUGGINGFACE_MODEL_ANALYZER,HUGGINGFACE_MODEL_RESPONDER,HUGGINGFACE_MODEL_SEARCH,HUGGINGFACE_MODEL_MEMORYHUGGINGFACE_EMBEDDING_MODEL/HF_EMBEDDING_MODELOLLAMA_MODEL/OMNI_MODEL/OMNI_CONVERSATION_MODEL(default conversation fallback; CLI defaultqwen2.5-coder:7b)OLLAMA_MODEL_FASTOLLAMA_MODEL_REASONINGOLLAMA_MODEL_TAGGEROLLAMA_MODEL_ANALYZEROLLAMA_MODEL_RESPONDEROLLAMA_MODEL_SEARCHOLLAMA_MODEL_MEMORYOLLAMA_MODEL_PLANNER/OMNI_PLANNER_MODEL(structured command planner; CLI defaultqwen2.5-coder:14b)OLLAMA_MODEL_EVALUATOR/OMNI_EVALUATOR_MODEL(structured response self-evaluator; CLI defaultqwen2.5:7b)OLLAMA_MODEL_SPECIALIST_SHELL_EXECUTION/OMNI_SHELL_SPECIALIST_MODEL(shell command specialist; CLI defaultqwen2.5-coder:7b)OLLAMA_MODEL_SPECIALIST_PLANNEROLLAMA_MODEL_SPECIALIST_TOOLINGOLLAMA_MODEL_SPECIALIST_FILESYSTEM_RESEARCHOLLAMA_MODEL_SPECIALIST_INTENT_TAGGINGOLLAMA_MODEL_SPECIALIST_MEMORY_RETRIEVALOLLAMA_MODEL_SPECIALIST_WEB_RESEARCHOLLAMA_MODEL_SPECIALIST_ANALYSISOLLAMA_MODEL_SPECIALIST_RESPONSEOLLAMA_MODEL_SPECIALIST_REVIEW_VERIFICATIONOLLAMA_MODEL_SPECIALIST_MEDIA_CONTROLOLLAMA_MODEL_SPECIALIST_BROWSER_INSPECTIONOLLAMA_MODEL_SPECIALIST_SCREEN_VISIONOLLAMA_MODEL_SPECIALIST_AUDIO_NOTESOLLAMA_MODEL_VISION(used byscreen-read --vision; defaultllava:latest)OMNI_EVALUATOR_THRESHOLD(integer 0..100; default70)OMNI_PLANNER_NUM_CTX(default4096)OMNI_EVALUATOR_NUM_CTX(default2048)OMNI_DISABLE_EVALUATOR=truedisables the self-evaluator.STOP_ON_SUFFICIENT_CONTEXT=true|false(skip web search in auto mode when memory context is already sufficient)SUFFICIENT_CONTEXT_CHARS=1400MEMORY_INFERENCE_ENABLED=true|falseMEMORY_INFERENCE_MAX_ITEMS=3TOURNAMENT_ENABLED=true|false(defaulttrue; hierarchical long-context reduction)TOURNAMENT_CHUNK_CHARS=2200(leaf chunk size)TOURNAMENT_SUMMARY_CHARS=750(target output size per tournament summary)TOURNAMENT_MAX_ROUNDS=4(recursive summarization cap)TOURNAMENT_VERIFY_RELEVANCE=true|false(second-pass support check on original chunks)
Environment variables:
WRAPPER_ONLY=true|false(defaultfalse; whentrue, disables DB/worker/queue routes and exposes only stateless wrapper endpoints)WORKER_COUNT=3WORKER_POLL_INTERVAL=2sREQUEST_TIMEOUT=90sRETRIEVAL_LIMIT=8CONTEXT_CHAR_BUDGET=4000HALLUCINATION_RETRY_LIMIT=2(verification retries flagged as hallucination before forcing an Ollama restart attempt when provider is Ollama)OLLAMA_RESTART_COMMAND=(optional command or||-separated fallback chain, e.g.docker compose restart ollama || systemctl restart ollama)OLLAMA_RESTART_TIMEOUT=20s(per restart command timeout)MIGRATE_ON_STARTUP=true|false
Install host-side dependencies for core + local automations:
cd omnidex
./scripts/setup-host-deps.sh --profile all -y--profile local now includes networking diagnostics tools used by chat automation (for example ip/ifconfig, ss/netstat/lsof, dig/nslookup/host, traceroute, whois, nmap, nmcli where available).
Include local whisper transcription support (whisper CLI via pip):
./scripts/setup-host-deps.sh --profile all --with-whisper -yPreview only (no changes):
./scripts/setup-host-deps.sh --dry-run --profile all --with-whispermacOS uses the same shell script through Homebrew:
brew install git go make curl jq ripgrep node docker docker-compose
./scripts/setup-host-deps.sh --profile core -yDocker on macOS still requires Docker Desktop or another running Docker engine; the Homebrew docker package only installs the client tools. Start Docker Desktop before running compose-backed core workflows.
Windows has a native PowerShell dependency bootstrap for Git, Go, Node, Docker Desktop, jq, ripgrep, ffmpeg, VLC, Tesseract, Python, and optional Whisper:
Set-ExecutionPolicy -Scope Process Bypass
.\scripts\setup-host-deps.ps1 -Profile core -Yes
.\scripts\setup-host-deps.ps1 -Profile all -WithWhisper -YesThe Windows script prefers winget, then Scoop, then Chocolatey. Local automation support on Windows is partial because Linux desktop tools such as pactl, playerctl, iproute, nmcli, and screenshot utilities do not map directly.
Build release archives for macOS and Windows from any host with Go installed:
./scripts/build-release.sh --version v0.2.0 --codename Ivysaur --target darwin/arm64 --target windows/amd64Default release targets are Linux, macOS, and Windows for amd64 and arm64; outputs are written to dist/ with SHA256SUMS. The current release line is v0.2.0 Ivysaur; the first alpha release was v0.1.0-alpha Bulbasaur. Omnidex uses pride release codenames based on National Dex order; the mature "it got really good" release codename is reserved as Venusaur. See docs/RELEASE_VERSIONING.md.
Install Omnidex into a user-local directory (default: ~/.omnidex), build binaries, install dependencies, and auto-load aliases on shell startup:
./install.shThe installer places omni in ~/.omnidex/bin and prepends that directory to PATH through the managed shell-init block. Running omni from any directory uses that shell directory as the active working directory for deterministic file and command work.
Non-interactive install with explicit flags:
./install.sh --prefix ~/.omnidex --deps-profile all --yesUpdate an existing Omnidex repo/install to latest and rebuild the core Docker image:
cd ~/.omnidex
./update.shFrom any directory after install, the same managed updater is available through omni:
omni updateTo update only the installed source and host binaries (omni, agent-cli, agent-core) without requiring Docker Compose:
omni update --host-onlyOptional update flags:
./update.sh --branch main --service core --no-cacheYou can run the same workflow via CLI command wrappers:
omni update --branch main --service core --no-cache
acli build --race -v
acli uninstall --yes
acli migrate:fresh --yesNotes:
- Installer adds a managed shell-init block to existing
~/.bashrc,~/.bash_profile,~/.profile, and~/.zshrcfiles (or creates one fallback file if none exist). - Shell-init block exports
OMNIDEX_DIR, prepends~/.omnidex/bintoPATH, and sourcesagent_aliases.sh; this exposes the globalomnibinary plusagent-clihelper aliases. aupdateruns~/.omnidex/update.shthrough your loaded aliases.update.shexpects.gitin the install path; installer copies.gitwhen installing from a git checkout. It pulls latest refs, refreshes installed script permissions, rebuilds host binaries, and restarts the host bridge user service when installed (omni-host-bridge).- Skip dependency install with
--skip-deps. - Include whisper CLI bootstrap with
--with-whisper.
Uninstall (remove shell-init integration + install directory):
./uninstall.shOptional uninstall flags:
./uninstall.sh --prefix ~/.omnidex --purge-config --yescd omnidex
go mod tidy
./scripts/build-core.sh
go build -o bin/omni ./cmd/omni
go build -o bin/agent-cli ./cmd/cliRun core locally:
# use a host-reachable Postgres instance for local core runs
DATABASE_URL='postgres://agent:agent@localhost:5432/agent?sslmode=disable' \
OLLAMA_BASE_URL='http://localhost:11434' \
./coreIf you specifically need to use the compose-managed Postgres from the host, add a local docker-compose.override.yml that publishes 5433:5432.
Load helper aliases:
source ./agent_aliases.shAlias note: omni preserves your shell working directory for deterministic local work. acli and the a* helper aliases preserve your working directory while targeting the queue/API CLI.
Install dependencies via alias:
asetupdeps --profile all -ySet core URL (optional; defaults to http://localhost:8090):
asetcore http://localhost:8090Start deterministic local chat:
omni
# or explicitly:
omni chatThe deterministic CLI stores workspace sessions under ~/.omni/sessions, run logs under ~/.omni/runs, and uses the directory where you launched omni as the active working directory.
Legacy queue/API chat remains available through acli:
acli chat --session daily-chat
# architect profile (recommended for vague implementation requests):
# acli chat --profile architect --session build-thread
# live stage/event progress is shown by default (disable with --progress=false)
# progress output is rendered as an activity timeline (Inspect/Explore/Run) during each turn
# action confirmation is on by default: chat asks "So you want me to..." before execution (disable with --confirm-actions=false)
# local capability routing is semantic (examples below are illustrative, not exact trigger phrases)
# slash commands inside chat:
# /help, /session, /session <id>, /new, /last, /exit
# while waiting_input: /interrupt <...>, /replan <...>, /cancel [reason]
# local media automation (enabled by default in chat):
# "play the next episode of star trek"
# "what just happened in the show?"
# "what did they just say about warp core?"
# local browser automation (enabled by default in chat):
# "show my open browser tabs"
# "read the javascript console for 5 seconds"
# local screen automation (enabled by default in chat):
# "what's on my screen?"
# "read my screen text"
# local shell automation (enabled by default in chat):
# "create a file named test"
# "rename test to test-2"
# "run `pwd`"
# "run go test ./..."
# "run docker compose up --build -d"
# local shell edit actions now include git diff summaries/snippets when in a git repo
# "walk me through current changes in this repo"
# "where did I leave off in this project?"
# "show changed files in chronological order"
# repo walkthrough can discover/select a nearby repo when you're not inside one
# "what is my ip?"
# "what ports are open?"
# "what ports are open with process names?" # requires sudo permission + sudo auth
# "determine my location based on my connection"
# "am I on VPN right now?"
# "show network tools catalog"
# "install network tools" # runs setup-host-deps local profile if script exists
# "what were we just talking about?" # uses recent same-session conversation context
# host environment discovery (automatic):
# OS, arch, distro, discovered package managers, available tools, and selected installed packages
# capability snapshot is auto-synced to memory (procedural) for reuse in later planning/tooling steps
# quick service status checks:
# omni status
# omni core:status
# omni queue:status
# omni ollama:status
# omni web:status
# service lifecycle controls (compose):
# omni --service core up
# omni --service core build
# omni --service core restart
# omni --service core down
# omni service:core logs --follow
# omni service --service all down
# omni --service core migrate:fresh --yes
# edit runtime config (.env) in vim:
# omni config
# omni config --editor "vim"
# omni config --printRun a typical end-to-end flow:
- Enqueue a job:
aqd "Design a migration plan for auth service split"- Grab latest job id:
alast- Watch live progress with detailed step/context output:
awlatestv
# or: awv <job-id>- If the job asks for clarification/input:
afb <job-id> "Use PostgreSQL 16 and keep API surface unchanged."- If you want to steer a running step with extra context:
aint <job-id> "Prefer minimal diffs and avoid new dependencies."- If you need a full replan from the
planstep:
areplan <job-id> "Replan for a phased rollout with rollback checkpoints."- If you need to stop execution immediately:
acancel <job-id> "Cancel this run"- Inspect final state/result:
ashow <job-id>- Continue the thread with a follow-up instruction:
acont <job-id> "Now draft the implementation tasks for sprint planning."| Alias | Expands to |
|---|---|
omni ... |
deterministic local Omnidex CLI (bin/omni or go run ./cmd/omni) |
omnidex ... |
same as omni ... |
acli ... |
queue/API CLI (agent-cli or go run ./cmd/cli) |
asetcore <url> |
export CORE_URL=<url> |
asetupdeps ... |
./scripts/setup-host-deps.sh ... |
aq "..." |
enqueue --pipeline assistant --web auto --workspace auto |
aqf "..." |
enqueue assistant + --reasoning fast |
aqd "..." |
enqueue assistant + --reasoning deep |
aqarch "..." |
enqueue --profile architect --pipeline assistant ... |
achat "..." |
enqueue --pipeline chat --web auto --workspace auto |
achatarch ... |
chat --profile architect ... |
achatrepl ... |
chat ... |
astro "..." |
enqueue --pipeline story --web auto --workspace auto |
alist |
list |
arun |
list --status running |
awaiting |
list --status waiting_input |
ashow <id> |
show <id> |
awatch <id> |
watch <id> |
awv <id> |
watch --interval 2s --verbose --max-chars 1600 <id> |
afb <id> "..." |
feedback <id> "..." |
aint <id> "..." |
interrupt <id> "..." |
areplan <id> "..." |
replan <id> "..." |
acont <id> "..." |
continue <id> "..." |
acancel <id> ["reason"] |
cancel <id> ["reason"] |
aremember ... |
remember ... |
aingest ... |
ingest ... |
amediaindex ... |
media-index ... |
amediasearch ... |
media-search ... |
abrowserscan ... |
browser-scan ... |
ascreenread ... |
screen-read ... |
aresearch ... |
research ... |
aperms ... |
permissions ... |
anotes ... |
audio-notes ... |
alast |
print latest job id |
aslatest |
show <latest-id> |
awlatest |
watch <latest-id> |
awlatestv |
verbose watch <latest-id> |
Set core URL (optional; default is http://localhost:8090):
export CORE_URL=http://localhost:8090Queue instructions:
go run ./cmd/cli enqueue --pipeline assistant --web auto --workspace auto --approval auto --verify auto --verify-iterations 2 --session auth-thread "Refactor auth flow and suggest migration plan"
# architect profile for end-to-end implementation pressure:
go run ./cmd/cli enqueue --profile architect --session auth-thread "Implement the requested feature fully, run tests, and summarize verification evidence"Interactive chat mode:
go run ./cmd/cli chat --session daily-chat --reasoning fast
# architect profile (deep reasoning + workspace on + verify on + approval on + verbose):
# go run ./cmd/cli chat --profile architect --session build-thread
# disable local media automation if needed:
# go run ./cmd/cli chat --local-media=false
# disable local browser automation if needed:
# go run ./cmd/cli chat --local-browser=false
# disable local screen automation if needed:
# go run ./cmd/cli chat --local-screen=false
# disable local shell automation if needed:
# go run ./cmd/cli chat --local-shell=false
# disable local audio-notes automation if needed:
# go run ./cmd/cli chat --local-audio=falseHost discovery metadata is attached automatically to chat and enqueue jobs:
host_env_os,host_env_arch,host_env_kernel,host_env_distrohost_env_shell,host_env_user,host_env_identity,host_env_cwd,host_env_package_manager,host_env_package_managershost_clock_local,host_clock_utc,host_clock_tz,host_clock_weekday,host_clock_epochhost_tools_availablehost_packages_installed(lightweight curated package probe)
Chat sessions also include short-term recent conversation context (same session_id) in plan/analyze/response prompts so follow-up questions can reference what was just discussed.
Final model responses now include a Sources: section by default, summarizing which context blocks were used (instruction, recent conversation, retrieval, workspace, web search, tooling, and executed tests when applicable).
Time-sensitive instructions (latest, today, as of, current, etc.) are treated as freshness-sensitive:
- web-search auto mode prefers fresh search for those requests
- local clock/date-only questions (e.g., "what time is it") use host clock context without forcing web search
Chat-mode controls (entered at the prompt):
/help,/session,/session <id>,/new,/last,/exit- During waiting input:
/interrupt <...>,/replan <...>,/cancel [reason], or plain feedback text
Local invasive-tool permissions are stored in one registry file (default: ~/.config/omni/permissions.json, with fallback to .omni/permissions.json if needed):
go run ./cmd/cli permissions list
go run ./cmd/cli permissions grant local.shell.exec
go run ./cmd/cli permissions grant local.shell.sudo
go run ./cmd/cli permissions grant local.screen.capture
go run ./cmd/cli permissions deny local.browser.console
go run ./cmd/cli permissions unset local.screen.captureForce web-search on a job (or turn it off):
go run ./cmd/cli enqueue --pipeline assistant --web on "Find current PostgreSQL 16 pgvector indexing guidance"
go run ./cmd/cli enqueue --pipeline assistant --web off "Rewrite this paragraph"Control reasoning depth per job:
go run ./cmd/cli enqueue --pipeline assistant --reasoning deep "Design migration strategy with tradeoffs"
go run ./cmd/cli enqueue --pipeline assistant --reasoning fast "Summarize this note in 3 bullets"Override step models per job when needed:
go run ./cmd/cli enqueue --pipeline assistant --reasoning deep --model-plan llama3.2 --model-analyze llama3.2 --model-response llama3.1:8b "Compare tradeoffs and draft final recommendation"Queue-level behavior controls via metadata:
workspace_scan:auto|on|offallow_missing_tools:true|falseapproval_mode:auto|force|offverification_mode:auto|force|offverification_iterations:1..4hallucination_retry_limit:1..6(overridesHALLUCINATION_RETRY_LIMITper job)ollama_restart_command: optional command or||-separated fallback chainsession_id: string
Equivalent CLI flags:
--workspace auto|on|off--allow-missing-tools--approval auto|on|off--verify auto|on|off--verify-iterations 1-4--session <id>
When --workspace on is used and workspace settings are missing, the job pauses and asks for corrected workspace config or confirmation to continue without scan.
List jobs:
go run ./cmd/cli list --status runningInspect one job:
go run ./cmd/cli show 12Watch job progress:
go run ./cmd/cli watch --interval 2s 12
# live stage/event progress is on by default (disable with --progress=false)Watch with detailed step outputs and context updates:
go run ./cmd/cli watch --interval 2s --verbose --max-chars 1600 12If a job pauses for clarification/tooling input, continue it:
go run ./cmd/cli feedback 12 "Use the /srv/app workspace and proceed without playwright."Interrupt a running job with extra context:
go run ./cmd/cli interrupt 12 "Prefer TypeScript, and keep changes backward compatible."If a step is currently running, interrupt preempts it and re-queues that step with the injected context.
Force a full replan from the plan step:
go run ./cmd/cli replan 12 "Replan this for a phased rollout with rollback checkpoints."Kill switch for an in-flight job:
go run ./cmd/cli cancel 12 "No longer needed"Continue an existing thread/session with a follow-up instruction:
go run ./cmd/cli continue 12 "Now write implementation tasks for sprint planning."Approval workflow for risky actions:
go run ./cmd/cli enqueue --pipeline assistant --approval on "Reset production DB and recreate schema"
# when prompted:
go run ./cmd/cli feedback 12 "APPROVE: execute only after backup verification"Seed memory with tags and kind:
go run ./cmd/cli remember --kind instruction --tags auth,oauth "Always rotate refresh tokens before access token expiry."Ingest files directly into reference memory (supports .pdf, .docx, .srt, .vtt, and text-like files):
go run ./cmd/cli ingest --kind reference --tags lore,book ./docs/worldbook.pdf
go run ./cmd/cli ingest --kind reference --tags subtitles ./media/episode01.srtIndex an entire media library into memory using subtitle files (episode metadata + timestamped subtitle chunks):
go run ./cmd/cli media-index --root ~/Media/StarTrek --source media --tags tv,subtitles
# preview only:
go run ./cmd/cli media-index --root ~/Media/StarTrek --dry-runSearch subtitle lines directly with surrounding context:
go run ./cmd/cli media-search --root ~/Media/StarTrek --context 2 --limit 20 "engage"Scan local browser processes and read debuggable tabs:
go run ./cmd/cli browser-scan
go run ./cmd/cli browser-scan --jsonCapture live JavaScript console events from debuggable tabs:
go run ./cmd/cli browser-scan --console --seconds 5 --limit 120Note: tab URL and console capture requires a browser exposing a local DevTools endpoint (for example Chromium with --remote-debugging-port=9222).
Read the current screen (OCR text and optional vision summary):
go run ./cmd/cli screen-read --ocr
go run ./cmd/cli screen-read --vision --model llava:latest
go run ./cmd/cli screen-read --ocr --vision --prompt "focus on error messages and active window"Note: screen capture needs a local screenshot utility (grim, gnome-screenshot, maim, scrot, or ImageMagick import). OCR needs tesseract.
Screen/browser/media invasive actions prompt once for permission and persist decisions in the permissions registry.
Long-running call notes from mic/speaker audio (capture now, stop later, then transcript + memory):
go run ./cmd/cli audio-notes doctor
go run ./cmd/cli audio-notes start --mic --speaker
# ... after your call:
go run ./cmd/cli audio-notes stop --store-memory --tags meeting,notes
go run ./cmd/cli audio-notes search \"action items\"This stores timestamped quotes with source (mic / speaker) under .omni/audio-notes/<session>/.
In interactive chat, you can also use natural commands like take notes during this call, stop taking notes, and notes status when --local-audio is enabled (default).
Build a long-lived knowledge base for a topic (auto web research + memory ingest + freshness tracking):
go run ./cmd/cli research --tags games,rpg --refresh-days 14 "Cyberpunk 2077"
# force a refresh even if still fresh:
go run ./cmd/cli research --force "Cyberpunk 2077"This stores chunked research memories with topic tags and writes freshness metadata to .omni/research-index.json.
GET /healthzPOST /v1/instruct(stateless prompt wrapper)POST /v1/roleplay(stateless in-character wrapper)POST /v1/narrate(stateless narration wrapper)POST /v1/reasoning(3-stage stateless reasoning chain: parse -> deliberate -> final)POST /v1/jobsGET /v1/jobs?status=&limit=&offset=GET /v1/jobs/{id}POST /v1/jobs/{id}/feedbackPOST /v1/jobs/{id}/interruptPOST /v1/jobs/{id}/replanPOST /v1/jobs/{id}/cancelPOST /v1/memory
When WRAPPER_ONLY=true, only /healthz, /v1/instruct, /v1/roleplay, /v1/narrate, and /v1/reasoning are registered.
{
"model": "llama3.2",
"system": "You are the narrator for a grounded fantasy scene.",
"prompt": "Narrate what happens when the ranger opens the vault door.",
"context": {
"setting": "Ancient underground vault",
"characters": ["Ranger", "Scholar"],
"event_history": ["They bypassed the rune lock", "A low hum started in the chamber"]
},
"history": [
{"role": "user", "content": "The ranger checks for traps."},
{"role": "assistant", "content": "She finds a hidden wire and cuts it safely."}
]
}/v1/instruct can optionally bridge into the async job queue by sending an integration block.
When present, Omnidex will queue a job and return an integration payload (instead of running the stateless LLM wrapper path).
{
"prompt": "Create a migration plan for splitting monolith services",
"integration": {
"action": "enqueue_job",
"pipeline": "assistant",
"metadata": {
"source": "instruct-route",
"web_search": "auto",
"reasoning_level": "deep"
}
}
}Supported integration actions:
enqueue_job(aliases:queue_job,enqueue_task,job,task)
Notes:
integration.instructioncan overrideprompt; otherwisepromptis used as the queued instruction.integration.pipelinedefaults toassistant.integration.metadatamust be a JSON object.- Queue integration requires DB/worker mode (
WRAPPER_ONLY=false).
{
"instruction": "Create a migration plan for splitting monolith services",
"pipeline": "assistant",
"metadata": {
"source": "cli",
"web_search": "auto",
"search_query": "postgresql 16 pgvector indexing best practices",
"reasoning_level": "deep",
"workspace_scan": "auto",
"allow_missing_tools": false,
"approval_mode": "auto",
"verification_mode": "auto",
"verification_iterations": 2,
"hallucination_retry_limit": 2,
"session_id": "auth-thread",
"model_plan": "llama3.2:latest",
"model_analyze": "llama3.2:latest",
"model_response": "llama3.1:8b"
}
}