Skip to content

SoftwareLogico/sot-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

37 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

sot-cli ๐Ÿš€ Limitless Local AI Agent

Python Providers Stars License

sot-cli Demo
Token-efficient terminal powerhouse: SoT Method + Multi-Agent + Unrestricted Tools.

**A pragmatic, limitless, multi-provider terminal assistant built for developers who hate bloated frameworks.**

sot-cli is a limitlessly local Python CLI designed to unleash the true reasoning power of modern LLMs on your projects. By combining a novel architectural pattern called the Source of Truth (SoT) Method with aggressive multi-tool batching, it drastically reduces API costs and model iterations while keeping output quality pristine. It acts as a powerful orchestration engine, empowering your AI with local tools and asynchronous sub-agents to solve complex problems seamlessly.

The name sot-cli is a direct nod to the architectural pattern it is built around โ€” the Source of Truth (SoT) Method โ€” and is intentionally unique so it does not get lost in the sea of generic AI tooling names.

โœจ Key Features

  • ๐Ÿ“Š SoT Method: Fresh files from disk every turn. No token bloat, always up-to-date.
  • ๐Ÿค– Async Multi-Agent: Delegate trial-and-error to cheap sub-agents (empty ctx).
  • โšก Batch Orchestration: Multi-tools + bash/Python scripts in ONE turn.
  • ๐Ÿ”ง Full Tools: 21+ incl. unrestricted shell, regex code search, precise edits, MCP extensible.
  • ๐ŸŒ Multi-Provider: Switch Ollama/LMStudio/OpenRouter/NVIDIA live.
  • ๐Ÿ’ฐ Native Prompt Caching: Payload architecture designed for prefix-matching, saving up to 50% API costs on long histories by caching static dialogue and keeping dynamic files at the bottom.

๐Ÿ‘‰ SoT | Tools | Roadmap

๐Ÿš€ How to Run

Clone the repo

git clone https://github.com/SoftwareLogico/sot-cli.git
cd sot-cli

(Optional but recommended) Create and activate a virtual environment

#uv
uv venv <env_name> --python 3.10
source <env_name>/bin/activate
uv pip install -e .
uv run sot-cli

#conda
conda create -n <env_name> python=3.10
conda activate <env_name>
pip install -e .
sot-cli

#venv
python3 -m venv <env_name>
source <env_name>/bin/activate
pip install -e .
sot-cli

Install dependencies

pip install -e .

Run sot-cli

sot-cli

Follow the steps the first time, have Fun!!

๐Ÿ›  Manual Installation

If you would rather wire things up by hand instead of going through the first-run wizard, you can prepare the config files yourself. After cloning and installing dependencies (see How to Run), follow the steps below.

Rename TOML files

# Copy the public configuration file
cp sot.example.toml sot.toml

# Copy the private keys file (this file is in .gitignore so your secrets never leak)
cp sot.keys.example.toml sot.keys.toml

Add API keys

Edit sot.keys.toml and fill in the providers you intend to use. Local providers (lmstudio, ollama) usually leave the key empty.

[providers.openrouter]
api_key = "sk-or-v1-your-key-here"

[providers.lmstudio]
# Usually doesn't need an API key for local models
api_key = ""

[providers.ollama]
# Usually doesn't need an API key for local models
api_key = ""

[providers.nvidia]
api_key = "nvapi-your-key-here"

Configure providers

Edit sot.toml to set base URLs, models, and per-provider runtime options.

[providers.openrouter]
base_url = "https://openrouter.ai/api/v1"
model = "x-ai/grok-4.1-fast"
temperature = 0.7
max_output_tokens = 8192

[providers.lmstudio]
base_url = "http://localhost:1234/v1"
model = "model_name" # or "" to let the adapter auto-resolve
temperature = 0.5
max_output_tokens = 8192

[providers.ollama]
base_url = "http://localhost:11434/v1"
model = "model_name" # or "" to let the adapter auto-resolve
temperature = 0.5
max_output_tokens = 8192

[providers.nvidia]
base_url = "https://integrate.api.nvidia.com/v1"
model = "qwen/qwen3-coder-480b-a35b-instruct"
temperature = 1
max_output_tokens = 8192

Run the CLI with the most common parameters

#RECOMMENDED Use the default provider set in sot.toml (or pick one from the interactive selector)
sot-cli

# Or override the provider explicitly
sot-cli --provider [ollama/lmstudio/openrouter/nvidia]
# e.g. sot-cli --provider ollama

sot-cli --provider [ollama/lmstudio/openrouter/nvidia] --model modelName
# e.g. sot-cli --provider openrouter --model deepseek/deepseek-v4-pro

# Resume a previous session
sot-cli <session_id>

Platform Compatibility

  • โœ… macOS: Fully tested and compatible.
  • โœ… Windows: Fully tested and compatible.
  • โœ… Linux: Fully tested and compatible.

๐Ÿง  The Core Concept: The SoT Method

Most AI coding agents fail because they append every file read and every code change directly into the chat history. This leads to massive token bloat and "Lost in the Middle" hallucinations where the AI reads an outdated version of a file from 10 turns ago.

sot-cli fixes this by separating Permanent History from Ephemeral State.

  1. Permanent History (chat_history): Only contains dialogue and lightweight tool metadata (e.g., "read file X -> added to SoT").
  2. Ephemeral Source of Truth (SoT): This method tracks the latest state of your context files so the model always reads the most up-to-date version, and not 10 different versions of the same file from the chat history. When the model uses a tool to read or edit a file, the SoT updates that file's content. The model can then refer to the SoT for the latest state of any file, without bloating the chat history.

Smart Token Economy (Permanent vs. Ephemeral): You can attach core files (like database schemas or project guidelines) permanently to a session so the AI always knows them. Meanwhile, files the AI reads to fix a specific bug are treated as "ephemeral"โ€”they stay in the SoT while needed, and can be detached immediately after the bug is fixed to keep your token usage incredibly low.

Result: The model always sees the absolute latest state of your project. Context grows linearly, not exponentially. ๐Ÿ‘‰ Read the full SoT Method explanation here.

๐Ÿงช Benchmarks

Optional benchmark suite for post-launch validation.

  • โœ… agent_test.md: Safe end-to-end benchmark. It validates background worker orchestration, file download and verification, local file create/edit flow, native OS command execution, fallback/retry behavior, and final cleanup/reporting.
  • โš ๏ธ seppuku_test.md: Intentionally destructive lab benchmark used to demonstrate raw model power without babysitting or guardrails.

โš ๏ธWARNING: seppuku_test.md is for isolated lab VM use only .โš ๏ธ

๐Ÿ’ธ Token Economy: Scripts > Tool Ping-Pong

We hate "Tool Ping-Pong" (when an AI calls list_dir, waits, calls read_file, waits, calls grep, waits). It burns hundreds of thousands of context tokens.

sot-cli is designed to batch operations. The system prompts drive the model to use run_command for bash one-liners or Python mini-scripts, list_dir for powerful filtered discovery (by name, extension, size, content), and search_code for regex pattern matching with line numbers across source files โ€” all in a single turn.

Why use 5 sequential tool calls when the model can batch list_dir + search_code + read_many_files in one response?

๐Ÿ›‘ The Anti-Hype FAQ

If you are coming from other trendy AI coding tools, you might be looking for features that we intentionally excluded. Here is why:

"Where is my CLAUDE.md / rules file?"

It's a gimmick. You don't need a hardcoded framework feature to make an AI read rules. If you have a project guidelines file, just tell the agent: "Read guidelines.md and follow it." The agent will add it to the SoT and obey it. We don't hardcode magic filenames.

"Where are my 'Skills'?"

A 'Skill' is just a glorified preprompt. We don't bloat the codebase with fake "skills" (e.g., a React Skill, a Docker Skill). Modern LLMs already know React and Docker. If they need to do something specific, they can write a bash or python script via run_command on the fly.

"Why is there no Context Compaction / Summarization?"

Because it causes lobotomies. Summarizing past turns makes the model forget crucial details. By using the SoT Method, our chat_history only contains metadata and dialogue. It grows so slowly that you will likely finish your task long before hitting the 200k token limit.

"Where are the Slash Commands (/clear, /file)?"

This is an autonomous agent, not a basic chatbot. If the model needs a file, it uses a tool to read it. You shouldn't be manually typing commands to manage its context.


๐Ÿค– Asynchronous Multi-Agent Orchestration

sot-cli supports a Boss-Worker delegation model using Just-In-Time (JIT) sub-agents.

If your main SoT is heavily loaded (expensive context), the main agent can use delegate_task to spawn a sub-agent in the background with a clean, empty context. The sub-agent does the dirty work (trial-and-error shell scripts, complex multi-step execution, compiling), logs everything silently to agent.log, and returns a clean report to the Boss via invisible IPC. For file discovery and code search, the Boss can use list_dir and search_code directly โ€” cheaper than spawning a sub-agent.

The Boss orchestrates. The Workers execute. Your terminal stays clean.

For full agent/sub-agent command reference (including CLI flags and orchestration tool parameters), see ARCHITECTURE.md.


๐Ÿงฐ Available Tools

For the complete and up-to-date tool and parameter reference, see ARCHITECTURE.md.

โš™๏ธ Runtime Configuration

All runtime settings live in sot.toml under [tools]. In a nutshell:

  • Output & detection: output_limit, binary_check_size, default_command_timeout_seconds.
  • Streaming visibility: show_thinking (model reasoning), show_full (tool call arguments in real time).
  • Loop limits: max_rounds (boss), delegated_max_rounds (sub-agent), repeat_limit / delegated_repeat_limit (abort on identical consecutive rounds).
  • Reasoning budget: reasoning_char_budget (boss), delegated_reasoning_char_budget (sub-agent) โ€” hard cap on streamed reasoning characters per turn; cuts the stream when exceeded so a model stuck in endless thinking can't hang the loop. Set to 0 to disable.

For the full reference table with defaults and descriptions, see ARCHITECTURE.md.

๐Ÿ”Œ MCP Servers

You can easily extend sot-cli with external tools using the Model Context Protocol (MCP). Just add them to your sot.toml:

[mcp.servers.test]
command = "python"
args = ["mcps/test.py"]

The runtime will automatically start the server and expose its tools to the AI.


โš ๏ธ WARNING: No Guardrails. No Policies.

This tool is limitless. It is not built for end-users; it is built for power users. It really can do anything you ask as long as is within the capabilities of your system. It does not have a babysitter checking its actions. It will execute what you tell it to execute without hesitation. Use it responsibly.

๐ŸŒŸ Star & Contribute

โญ Star if it saves your API bill! Star Here

  • ๐Ÿ› PRs/issues welcome (see ROADMAP).
  • ๐Ÿ“ข Share: "sot-cli: AI agent without token waste #AICoding"

๐Ÿ‘ค Author & Credits

Created by Ramses Mendoza (SoftwareLogico)

I built sot-cli and formalized the Source of Truth (SoT) Method for terminal agents out of frustration with existing tools. Most AI coding assistants on the market are bloated, burn through tokens, and collapse under the weight of their own context windows.

While the concept of maintaining a "state" is common in software engineering, the specific architectural pattern of decoupling a permanent metadata-only history from an ephemeral, fully-rebuilt file blockโ€”and injecting it right before the user promptโ€”is the core innovation of sot-cli.

LinkedIn: https://www.linkedin.com/in/ramsesisaid

This tool was designed for absolute power, raw speed, and extreme token efficiency, since it follows no agenda other than being truly useful. It doesn't babysit you, it doesn't enforce corporate safety rails on your local machine, and it doesn't waste your API credits on unnecessary framework overhead.

About

A terminal AI that doesn't babysit you. sot-cli gives LLMs full power, a novel memory architecture that slashes token costs, and agent orchestration, all from a single Python CLI. No guardrails, no bloat, no frameworks. Just raw AI power at your fingertips.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages