
Token-efficient terminal powerhouse: SoT Method + Multi-Agent + Unrestricted Tools.
**A pragmatic, limitless, multi-provider terminal assistant built for developers who hate bloated frameworks.**
sot-cli is a limitlessly local Python CLI designed to unleash the true reasoning power of modern LLMs on your projects. By combining a novel architectural pattern called the Source of Truth (SoT) Method with aggressive multi-tool batching, it drastically reduces API costs and model iterations while keeping output quality pristine. It acts as a powerful orchestration engine, empowering your AI with local tools and asynchronous sub-agents to solve complex problems seamlessly.
The name sot-cli is a direct nod to the architectural pattern it is built around โ the Source of Truth (SoT) Method โ and is intentionally unique so it does not get lost in the sea of generic AI tooling names.
- ๐ SoT Method: Fresh files from disk every turn. No token bloat, always up-to-date.
- ๐ค Async Multi-Agent: Delegate trial-and-error to cheap sub-agents (empty ctx).
- โก Batch Orchestration: Multi-tools + bash/Python scripts in ONE turn.
- ๐ง Full Tools: 21+ incl. unrestricted shell, regex code search, precise edits, MCP extensible.
- ๐ Multi-Provider: Switch Ollama/LMStudio/OpenRouter/NVIDIA live.
- ๐ฐ Native Prompt Caching: Payload architecture designed for prefix-matching, saving up to 50% API costs on long histories by caching static dialogue and keeping dynamic files at the bottom.
git clone https://github.com/SoftwareLogico/sot-cli.git
cd sot-cli#uv
uv venv <env_name> --python 3.10
source <env_name>/bin/activate
uv pip install -e .
uv run sot-cli
#conda
conda create -n <env_name> python=3.10
conda activate <env_name>
pip install -e .
sot-cli
#venv
python3 -m venv <env_name>
source <env_name>/bin/activate
pip install -e .
sot-clipip install -e .sot-cliFollow the steps the first time, have Fun!!
If you would rather wire things up by hand instead of going through the first-run wizard, you can prepare the config files yourself. After cloning and installing dependencies (see How to Run), follow the steps below.
# Copy the public configuration file
cp sot.example.toml sot.toml
# Copy the private keys file (this file is in .gitignore so your secrets never leak)
cp sot.keys.example.toml sot.keys.tomlEdit sot.keys.toml and fill in the providers you intend to use. Local providers (lmstudio, ollama) usually leave the key empty.
[providers.openrouter]
api_key = "sk-or-v1-your-key-here"
[providers.lmstudio]
# Usually doesn't need an API key for local models
api_key = ""
[providers.ollama]
# Usually doesn't need an API key for local models
api_key = ""
[providers.nvidia]
api_key = "nvapi-your-key-here"Edit sot.toml to set base URLs, models, and per-provider runtime options.
[providers.openrouter]
base_url = "https://openrouter.ai/api/v1"
model = "x-ai/grok-4.1-fast"
temperature = 0.7
max_output_tokens = 8192
[providers.lmstudio]
base_url = "http://localhost:1234/v1"
model = "model_name" # or "" to let the adapter auto-resolve
temperature = 0.5
max_output_tokens = 8192
[providers.ollama]
base_url = "http://localhost:11434/v1"
model = "model_name" # or "" to let the adapter auto-resolve
temperature = 0.5
max_output_tokens = 8192
[providers.nvidia]
base_url = "https://integrate.api.nvidia.com/v1"
model = "qwen/qwen3-coder-480b-a35b-instruct"
temperature = 1
max_output_tokens = 8192#RECOMMENDED Use the default provider set in sot.toml (or pick one from the interactive selector)
sot-cli
# Or override the provider explicitly
sot-cli --provider [ollama/lmstudio/openrouter/nvidia]
# e.g. sot-cli --provider ollama
sot-cli --provider [ollama/lmstudio/openrouter/nvidia] --model modelName
# e.g. sot-cli --provider openrouter --model deepseek/deepseek-v4-pro
# Resume a previous session
sot-cli <session_id>- โ macOS: Fully tested and compatible.
- โ Windows: Fully tested and compatible.
- โ Linux: Fully tested and compatible.
Most AI coding agents fail because they append every file read and every code change directly into the chat history. This leads to massive token bloat and "Lost in the Middle" hallucinations where the AI reads an outdated version of a file from 10 turns ago.
sot-cli fixes this by separating Permanent History from Ephemeral State.
- Permanent History (
chat_history): Only contains dialogue and lightweight tool metadata (e.g.,"read file X -> added to SoT"). - Ephemeral Source of Truth (SoT): This method tracks the latest state of your context files so the model always reads the most up-to-date version, and not 10 different versions of the same file from the chat history. When the model uses a tool to read or edit a file, the SoT updates that file's content. The model can then refer to the SoT for the latest state of any file, without bloating the chat history.
Smart Token Economy (Permanent vs. Ephemeral): You can attach core files (like database schemas or project guidelines) permanently to a session so the AI always knows them. Meanwhile, files the AI reads to fix a specific bug are treated as "ephemeral"โthey stay in the SoT while needed, and can be detached immediately after the bug is fixed to keep your token usage incredibly low.
Result: The model always sees the absolute latest state of your project. Context grows linearly, not exponentially. ๐ Read the full SoT Method explanation here.
Optional benchmark suite for post-launch validation.
- โ agent_test.md: Safe end-to-end benchmark. It validates background worker orchestration, file download and verification, local file create/edit flow, native OS command execution, fallback/retry behavior, and final cleanup/reporting.
โ ๏ธ seppuku_test.md: Intentionally destructive lab benchmark used to demonstrate raw model power without babysitting or guardrails.
We hate "Tool Ping-Pong" (when an AI calls list_dir, waits, calls read_file, waits, calls grep, waits). It burns hundreds of thousands of context tokens.
sot-cli is designed to batch operations. The system prompts drive the model to use run_command for bash one-liners or Python mini-scripts, list_dir for powerful filtered discovery (by name, extension, size, content), and search_code for regex pattern matching with line numbers across source files โ all in a single turn.
Why use 5 sequential tool calls when the model can batch list_dir + search_code + read_many_files in one response?
If you are coming from other trendy AI coding tools, you might be looking for features that we intentionally excluded. Here is why:
It's a gimmick. You don't need a hardcoded framework feature to make an AI read rules. If you have a project guidelines file, just tell the agent: "Read guidelines.md and follow it." The agent will add it to the SoT and obey it. We don't hardcode magic filenames.
A 'Skill' is just a glorified preprompt. We don't bloat the codebase with fake "skills" (e.g., a React Skill, a Docker Skill). Modern LLMs already know React and Docker. If they need to do something specific, they can write a bash or python script via run_command on the fly.
Because it causes lobotomies. Summarizing past turns makes the model forget crucial details. By using the SoT Method, our chat_history only contains metadata and dialogue. It grows so slowly that you will likely finish your task long before hitting the 200k token limit.
This is an autonomous agent, not a basic chatbot. If the model needs a file, it uses a tool to read it. You shouldn't be manually typing commands to manage its context.
sot-cli supports a Boss-Worker delegation model using Just-In-Time (JIT) sub-agents.
If your main SoT is heavily loaded (expensive context), the main agent can use delegate_task to spawn a sub-agent in the background with a clean, empty context.
The sub-agent does the dirty work (trial-and-error shell scripts, complex multi-step execution, compiling), logs everything silently to agent.log, and returns a clean report to the Boss via invisible IPC. For file discovery and code search, the Boss can use list_dir and search_code directly โ cheaper than spawning a sub-agent.
The Boss orchestrates. The Workers execute. Your terminal stays clean.
For full agent/sub-agent command reference (including CLI flags and orchestration tool parameters), see ARCHITECTURE.md.
For the complete and up-to-date tool and parameter reference, see ARCHITECTURE.md.
All runtime settings live in sot.toml under [tools]. In a nutshell:
- Output & detection:
output_limit,binary_check_size,default_command_timeout_seconds. - Streaming visibility:
show_thinking(model reasoning),show_full(tool call arguments in real time). - Loop limits:
max_rounds(boss),delegated_max_rounds(sub-agent),repeat_limit/delegated_repeat_limit(abort on identical consecutive rounds). - Reasoning budget:
reasoning_char_budget(boss),delegated_reasoning_char_budget(sub-agent) โ hard cap on streamed reasoning characters per turn; cuts the stream when exceeded so a model stuck in endless thinking can't hang the loop. Set to0to disable.
For the full reference table with defaults and descriptions, see ARCHITECTURE.md.
You can easily extend sot-cli with external tools using the Model Context Protocol (MCP). Just add them to your sot.toml:
[mcp.servers.test]
command = "python"
args = ["mcps/test.py"]The runtime will automatically start the server and expose its tools to the AI.
This tool is limitless. It is not built for end-users; it is built for power users. It really can do anything you ask as long as is within the capabilities of your system. It does not have a babysitter checking its actions. It will execute what you tell it to execute without hesitation. Use it responsibly.
โญ Star if it saves your API bill! Star Here
- ๐ PRs/issues welcome (see ROADMAP).
- ๐ข Share: "sot-cli: AI agent without token waste #AICoding"
Created by Ramses Mendoza (SoftwareLogico)
I built sot-cli and formalized the Source of Truth (SoT) Method for terminal agents out of frustration with existing tools. Most AI coding assistants on the market are bloated, burn through tokens, and collapse under the weight of their own context windows.
While the concept of maintaining a "state" is common in software engineering, the specific architectural pattern of decoupling a permanent metadata-only history from an ephemeral, fully-rebuilt file blockโand injecting it right before the user promptโis the core innovation of sot-cli.
LinkedIn: https://www.linkedin.com/in/ramsesisaid
This tool was designed for absolute power, raw speed, and extreme token efficiency, since it follows no agenda other than being truly useful. It doesn't babysit you, it doesn't enforce corporate safety rails on your local machine, and it doesn't waste your API credits on unnecessary framework overhead.