Skip to content

V2.0.1 Release

Choose a tag to compare

@th3sanjai th3sanjai released this 02 Jun 16:12
· 8 commits to main since this release
90cec8f

RAI v2.0.1 Release Notes

What's New

Token Cost Reduction (60–80% on long sessions)

RAI now runs a 3-layer compression pipeline before every model call, keeping costs flat as sessions grow longer:

  1. History trim — clips conversation to a token budget before the model sees it
  2. Tool result compression — old bash/grep/file outputs are truncated; recent results stay verbatim
  3. Summarization — only fires when the first two layers aren't enough

Combined with a configurable cheap model for summarization, a typical VAPT or SAST session now costs significantly less than before.

Configure a cheaper summarization model:

rai agents config-set --compact-model "litellm:openai/bedrock-claude-haiku-4.5"

# With explicit credentials if different from main
rai agents config-set \
  --compact-model "litellm:openai/bedrock-claude-haiku-4.5 \
  --compact-api-key "sk-..." \
  --compact-base-url "https://llmproxy.example.com"

# Clear it (inherit main model)
rai agents config-set --compact-model ""

Also configurable via RAI_COMPACT_MODEL env var.


Loop Detection

The agent no longer gets stuck re-executing the same command. When identical tool calls are detected, RAI returns the cached result with a warning instead of executing again:

⚠ DUPLICATE CALL BLOCKED: 'bash' was already executed 5 times and returned:
0 matches for shell_exec|proc_open|...
This result is final. Accept it and proceed to the next step.

Configurable: RAI_LOOP_WINDOW=10 (default 10 recent calls tracked).


Plan Mode Improvements

  • Plan completion no longer repeats the full step history in the tool result — agent reads its own notes directly
  • list_plan_steps returns compact summaries for completed steps, full detail for pending ones
  • Memory phase at plan exit now explicitly guides the agent to write target-specific methodology to scope='target'

TUI Improvements

  • Thread resume now shows the most recent messages instead of the oldest
  • Internal control messages no longer appear as user messages in history

Model Call Diagnostics

Enable detailed per-call logging to verify token consumption:

RAI_DEBUG_LOG_CALLS=1 rai chat
# Logs to ~/.rai/debug/model-calls.jsonl

Each entry shows: message count, total chars, estimated tokens, truncated count, message type breakdown.


Bug Fixes

  • pip install revolt-rai now works — previous wheel was empty due to packaging misconfiguration
  • Session approval (approve_for_session) now persists correctly across tool calls
  • Server shuts down cleanly when TUI exits or crashes
  • Thread resume loads with the correct agent graph when multiple agents are registered
  • Audit log no longer blocks the event loop on busy servers
  • content_file in memory_write now restricted to safe paths only