-
-
Notifications
You must be signed in to change notification settings - Fork 6
Troubleshooting
Common issues and solutions for TITAN. If your problem isn't covered here, open an issue.
Error:
Port 48420 is already in use. Is TITAN already running?
Cause: Another process (or a previous TITAN instance) is already listening on the gateway port.
Fix:
# Find what's using the port
lsof -i :48420
# Kill the process
kill <PID>
# Or start TITAN on a different port
titan gateway --port 48421Symptom: All API requests return 401 Unauthorized.
Cause: auth.mode is set to "token" but no token is configured. TITAN logs:
Auth mode is "token" but no token configured — denying request.
Set gateway.auth.token or switch to mode "password".
Fix: Either set a token or switch auth mode:
// titan.json
{
"gateway": {
"auth": {
"mode": "none"
}
}
}| Auth Mode | When to Use |
|---|---|
"none" |
Local development, trusted networks |
"token" |
API access with a static bearer token |
"password" |
Browser-based access with login page |
| Limit | Value | Error |
|---|---|---|
| Max body size | 1 MB | 413 Payload too large (max 1MB) |
| Max concurrent requests | 5 | 503 Server busy — too many concurrent requests |
| Rate limit exceeded | varies |
429 Too many requests (check Retry-After header) |
Error:
fetch failed: connect ECONNREFUSED 127.0.0.1:11434
Cause: Ollama is not running or is on a different address.
Fix:
# Start Ollama
ollama serve
# Verify it's running
curl http://localhost:11434/api/tags
# If Ollama is on a different host, configure the URL:
# titan.json
{
"providers": {
"ollama": {
"baseUrl": "http://192.168.1.11:11434"
}
}
}The URL resolution order is:
-
config.providers.ollama.baseUrlin titan.json -
OLLAMA_BASE_URLenvironment variable -
http://localhost:11434(default)
Error (from model switch):
Model 'llama3.2:3b' not found in Ollama. Pull it first: ollama pull llama3.2:3b
Fix:
ollama pull llama3.2:3bError:
Cannot verify model 'qwen3.5:9b' — Ollama is unreachable at http://localhost:11434. Check Ollama is running: ollama serve
The health probe times out after 3 seconds. If Ollama is slow to respond (e.g., loading a large model), wait and retry.
TITAN does not set an explicit request timeout on Ollama chat calls — it relies on OS/network defaults. If inference is slow (especially on CPU), the request may appear to hang.
Tip: Check what Ollama is doing:
# See loaded models and memory usage
ollama ps
# Watch Ollama logs
journalctl -u ollama -f # systemd
# or
ollama serve # foreground with logs| Mode | num_ctx |
num_predict |
|---|---|---|
| Local models | 16,384 | 16,384 |
Cloud models (:cloud suffix) |
131,072 | 32,768 |
Models stay loaded in Ollama VRAM for 30 minutes after last use (keep_alive: '30m').
TITAN probes GPUs in this order: Apple Silicon → AMD ROCm → NVIDIA.
Check if TITAN sees your GPU:
# NVIDIA
nvidia-smi
# AMD
rocm-smi
# Apple Silicon — always detected on macOS
system_profiler SPDisplaysDataTypeIf nvidia-smi or rocm-smi fails or times out (5-second limit), TITAN falls back to CPU mode:
No GPU detected — stall timeout increased to 120s for CPU inference
CPU-only mode: maxConcurrentTasks auto-tuned to 2
Important: Integrated GPUs (AMD APUs, Intel UHD) are generally NOT used by Ollama.
ollama pswill show100% CPUeven if/dev/kfdexists.
TITAN emits a warning when free VRAM drops below 500 MB. The orchestrator reserves 1024 MB by default to prevent OOM.
Symptoms:
- Slow inference (model partially offloaded to CPU)
- Model swap failures
Fix:
# Check current VRAM usage
nvidia-smi # or rocm-smi
# See what Ollama has loaded
ollama ps
# Unload unused models
ollama stop <model-name>These errors come from the VRAM orchestrator API (POST /api/vram/acquire):
| Error | Meaning |
|---|---|
GPU state unavailable (no supported GPU detected — requires NVIDIA, AMD ROCm, or Apple Silicon) |
No GPU found at all |
Not enough VRAM: need XMB, available YMB (auto-swap disabled) |
Insufficient VRAM and vram.autoSwapModel is false
|
Not enough VRAM: need XMB, available YMB (no models to evict) |
Nothing to evict — reserved VRAM is consumed by non-TITAN processes |
Evicted <models> but still not enough VRAM: need XMB, have YMB |
Eviction freed some VRAM but still not enough |
// titan.json
{
"vram": {
"enabled": true,
"pollIntervalMs": 10000,
"reserveMB": 1024,
"autoSwapModel": true,
"fallbackModel": "qwen3:7b",
"ollamaUrl": "http://localhost:11434"
}
}Set autoSwapModel: true to let TITAN automatically evict models when VRAM is needed.
| Model | Size | Notes |
|---|---|---|
qwen3.5:4b |
4B | Native tool calling, 256K context |
qwen3.5:9b |
9B | Recommended default |
qwen3.5:35b |
35B | Most reliable tool calling |
qwen3-coder:32b |
32B | Best for code tasks |
llama3.2:3b |
3B | Fast but hallucinates extra tool calls |
devstral-small-2 |
~22B | Good for dev tasks |
| Model | Problem |
|---|---|
| DeepSeek-R1 (all sizes) | Malformed JSON schemas, ignores tool definitions |
| LLaMA 3.1 | Poor tool calling reliability |
| Mistral/Mixtral (local) | Inconsistent across quantizations |
| Phi-3/Phi-4 | No native tool calling in Ollama |
| Gemma 2 | Narrates instead of calling tools |
| dolphin3 | Returns "does not support tools" error |
| arcee-agent | No tool calling despite marketing claims |
Error from Ollama:
Model <name> does not support native tool calling — running in chat-only mode
TITAN automatically retries the request without tools. If the model truly doesn't support tools, it will run in chat-only mode (no skills, no autonomous actions).
If a model fails to generate tool calls for 3 consecutive rounds despite tools being available, the stall detector triggers tool_call_failure:
- First attempt: Switches to a fallback model that supports tool calling
- Second attempt: If still failing, returns an honest status to the user
| Hardware | Recommended Model | Speed |
|---|---|---|
| 8–12 GB RAM, CPU-only | llama3.2:3b |
~16 tok/s |
| 8 GB VRAM (laptop) | qwen3.5:4b |
~150 tok/s |
| 16 GB RAM | qwen3.5:9b |
~80–120 tok/s |
| 24 GB VRAM (RTX 4090) | qwen3-coder:32b |
~20–40 tok/s |
| 32 GB VRAM (RTX 5090) | qwen3.5:35b |
~20–40 tok/s |
When a provider fails with a retryable error, TITAN tries the fallback chain (configured in agent.fallbackChain), then falls back to a provider-level failover scan.
Retryable errors (triggers fallback):
- HTTP 429, 500, 502, 503
-
rate limit/rate_limit -
timeout/timed out/ETIMEDOUT -
ECONNREFUSED/ECONNRESET - Messages containing
overloaded
// titan.json
{
"agent": {
"fallbackChain": ["ollama/qwen3.5:9b", "anthropic/claude-sonnet-4-20250514"],
"fallbackMaxRetries": 3
}
}- Max retries:
fallbackMaxRetries(default: 3) - Fallback state expires after 5 minutes — TITAN retries the primary model after that
If the fallback chain is exhausted, TITAN scans providers in this order: anthropic → openai → google → ollama. It picks the first healthy provider with a model name matching the original prefix (e.g., claude-*).
Ollama cloud models (with -cloud or :cloud suffix) may be silently rerouted to OpenRouter when a mapping exists. Look for:
[CloudBypass] qwen3.5:9b-cloud → openrouter/<model> (parallel-capable)
Error:
Unknown provider: <name>. Available: anthropic, openai, ollama, google, ...
Check your model ID format: it must be provider/model-name (e.g., anthropic/claude-sonnet-4-20250514).
TITAN monitors agent sessions for stalls and automatically intervenes.
| Type | Trigger | Default Threshold |
|---|---|---|
silence |
No activity | 30s (120s on CPU-only / autonomous mode) |
tool_loop |
Same tool + same args repeated | 3 times in a row |
empty_response |
LLM returns < 3 characters | Immediate |
max_rounds |
Tool round budget exhausted | Depends on mode (up to 25 in autonomous) |
tool_call_failure |
Model ignores tools for N rounds | 3 consecutive rounds |
- TITAN sends a nudge message to the agent (up to 2 attempts, or 5 in autonomous mode)
- If nudges don't help, the agent gives up with: "I've been unable to make progress on this task."
Stall thresholds auto-adjust based on hardware:
- GPU detected: 30-second silence timeout
- No GPU (CPU-only): 120-second silence timeout
- Autonomous mode: 120-second silence timeout, 5 nudge attempts
| Code | Error | Cause | Fix |
|---|---|---|---|
| 400 | Invalid JSON |
Malformed request body | Check your JSON syntax |
| 401 | Unauthorized |
Missing or invalid auth token | Add Authorization: Bearer <token> header |
| 401 | Invalid password |
Wrong login password | Check gateway.auth.password in config |
| 404 | Model '<name>' not found in Ollama |
Model not pulled | Run ollama pull <name>
|
| 413 | Payload too large (max 1MB) |
Request body > 1 MB | Reduce payload size |
| 429 | Too many requests |
Rate limit exceeded | Wait for Retry-After seconds |
| 503 | Server busy — too many concurrent requests |
> 5 concurrent requests | Wait and retry |
| 503 | Cannot verify model — Ollama is unreachable |
Ollama down or unreachable | Start Ollama: ollama serve
|
| Close Code | Message | Cause |
|---|---|---|
| 1008 | Unauthorized |
Invalid or missing auth on WS connection |
| 1008 | Mesh auth failed |
Mesh peer HMAC authentication rejected |
Run through this list when something isn't working:
- Is Ollama running? →
curl http://localhost:11434/api/tags - Is the model pulled? →
ollama list - Is the gateway port free? →
lsof -i :48420 - Is auth configured correctly? → Check
gateway.auth.modein titan.json - Is the GPU detected? →
nvidia-smiorrocm-smi - Is there enough VRAM? →
nvidia-smiorollama ps - Does your model support tool calling? → See Section 4 above
- Is the model ID formatted correctly? → Must be
provider/model-name - Check TITAN logs → Look for
[warn]and[error]lines - Check Ollama logs →
journalctl -u ollama -for runollama servein foreground
# TITAN health check
curl http://localhost:48420/api/health
# TITAN system stats
curl http://localhost:48420/api/stats
# VRAM status
curl http://localhost:48420/api/vram
# Currently loaded Ollama models
ollama ps
# GPU memory
nvidia-smi --query-gpu=memory.used,memory.free --format=csv
# Run TITAN's built-in doctor
titan doctor --json- GitHub Issues: https://github.com/Djtony707/TITAN/issues
- Wiki: https://github.com/Djtony707/TITAN/wiki
-
Discord: Check the
#titan-supportchannel