Skip to content

Troubleshooting

Michael Elliott edited this page Apr 5, 2026 · 1 revision

Troubleshooting

Common issues and solutions for TITAN. If your problem isn't covered here, open an issue.


1. Gateway Startup Issues

Port Already in Use

Error:

Port 48420 is already in use. Is TITAN already running?

Cause: Another process (or a previous TITAN instance) is already listening on the gateway port.

Fix:

# Find what's using the port
lsof -i :48420

# Kill the process
kill <PID>

# Or start TITAN on a different port
titan gateway --port 48421

Auth Configuration Problems

Symptom: All API requests return 401 Unauthorized.

Cause: auth.mode is set to "token" but no token is configured. TITAN logs:

Auth mode is "token" but no token configured — denying request.
Set gateway.auth.token or switch to mode "password".

Fix: Either set a token or switch auth mode:

// titan.json
{
  "gateway": {
    "auth": {
      "mode": "none"
    }
  }
}
Auth Mode When to Use
"none" Local development, trusted networks
"token" API access with a static bearer token
"password" Browser-based access with login page

Request Limits

Limit Value Error
Max body size 1 MB 413 Payload too large (max 1MB)
Max concurrent requests 5 503 Server busy — too many concurrent requests
Rate limit exceeded varies 429 Too many requests (check Retry-After header)

2. Ollama Connection Problems

ECONNREFUSED

Error:

fetch failed: connect ECONNREFUSED 127.0.0.1:11434

Cause: Ollama is not running or is on a different address.

Fix:

# Start Ollama
ollama serve

# Verify it's running
curl http://localhost:11434/api/tags

# If Ollama is on a different host, configure the URL:
# titan.json
{
  "providers": {
    "ollama": {
      "baseUrl": "http://192.168.1.11:11434"
    }
  }
}

The URL resolution order is:

  1. config.providers.ollama.baseUrl in titan.json
  2. OLLAMA_BASE_URL environment variable
  3. http://localhost:11434 (default)

Model Not Found

Error (from model switch):

Model 'llama3.2:3b' not found in Ollama. Pull it first: ollama pull llama3.2:3b

Fix:

ollama pull llama3.2:3b

Ollama Unreachable During Model Switch

Error:

Cannot verify model 'qwen3.5:9b' — Ollama is unreachable at http://localhost:11434. Check Ollama is running: ollama serve

The health probe times out after 3 seconds. If Ollama is slow to respond (e.g., loading a large model), wait and retry.

Ollama Timeouts

TITAN does not set an explicit request timeout on Ollama chat calls — it relies on OS/network defaults. If inference is slow (especially on CPU), the request may appear to hang.

Tip: Check what Ollama is doing:

# See loaded models and memory usage
ollama ps

# Watch Ollama logs
journalctl -u ollama -f   # systemd
# or
ollama serve               # foreground with logs

Context Window Defaults

Mode num_ctx num_predict
Local models 16,384 16,384
Cloud models (:cloud suffix) 131,072 32,768

Models stay loaded in Ollama VRAM for 30 minutes after last use (keep_alive: '30m').


3. GPU / VRAM Troubleshooting

GPU Not Detected

TITAN probes GPUs in this order: Apple Silicon → AMD ROCm → NVIDIA.

Check if TITAN sees your GPU:

# NVIDIA
nvidia-smi

# AMD
rocm-smi

# Apple Silicon — always detected on macOS
system_profiler SPDisplaysDataType

If nvidia-smi or rocm-smi fails or times out (5-second limit), TITAN falls back to CPU mode:

No GPU detected — stall timeout increased to 120s for CPU inference
CPU-only mode: maxConcurrentTasks auto-tuned to 2

Important: Integrated GPUs (AMD APUs, Intel UHD) are generally NOT used by Ollama. ollama ps will show 100% CPU even if /dev/kfd exists.

Low VRAM

TITAN emits a warning when free VRAM drops below 500 MB. The orchestrator reserves 1024 MB by default to prevent OOM.

Symptoms:

  • Slow inference (model partially offloaded to CPU)
  • Model swap failures

Fix:

# Check current VRAM usage
nvidia-smi  # or rocm-smi

# See what Ollama has loaded
ollama ps

# Unload unused models
ollama stop <model-name>

VRAM Acquire Failures

These errors come from the VRAM orchestrator API (POST /api/vram/acquire):

Error Meaning
GPU state unavailable (no supported GPU detected — requires NVIDIA, AMD ROCm, or Apple Silicon) No GPU found at all
Not enough VRAM: need XMB, available YMB (auto-swap disabled) Insufficient VRAM and vram.autoSwapModel is false
Not enough VRAM: need XMB, available YMB (no models to evict) Nothing to evict — reserved VRAM is consumed by non-TITAN processes
Evicted <models> but still not enough VRAM: need XMB, have YMB Eviction freed some VRAM but still not enough

VRAM Configuration

// titan.json
{
  "vram": {
    "enabled": true,
    "pollIntervalMs": 10000,
    "reserveMB": 1024,
    "autoSwapModel": true,
    "fallbackModel": "qwen3:7b",
    "ollamaUrl": "http://localhost:11434"
  }
}

Set autoSwapModel: true to let TITAN automatically evict models when VRAM is needed.


4. Model Tool-Calling Issues

Models That Work Well

Model Size Notes
qwen3.5:4b 4B Native tool calling, 256K context
qwen3.5:9b 9B Recommended default
qwen3.5:35b 35B Most reliable tool calling
qwen3-coder:32b 32B Best for code tasks
llama3.2:3b 3B Fast but hallucinates extra tool calls
devstral-small-2 ~22B Good for dev tasks

Models to Avoid

Model Problem
DeepSeek-R1 (all sizes) Malformed JSON schemas, ignores tool definitions
LLaMA 3.1 Poor tool calling reliability
Mistral/Mixtral (local) Inconsistent across quantizations
Phi-3/Phi-4 No native tool calling in Ollama
Gemma 2 Narrates instead of calling tools
dolphin3 Returns "does not support tools" error
arcee-agent No tool calling despite marketing claims

"Does Not Support Tools" Error

Error from Ollama:

Model <name> does not support native tool calling — running in chat-only mode

TITAN automatically retries the request without tools. If the model truly doesn't support tools, it will run in chat-only mode (no skills, no autonomous actions).

Tool Call Failure Self-Heal

If a model fails to generate tool calls for 3 consecutive rounds despite tools being available, the stall detector triggers tool_call_failure:

  1. First attempt: Switches to a fallback model that supports tool calling
  2. Second attempt: If still failing, returns an honest status to the user

Hardware Recommendations

Hardware Recommended Model Speed
8–12 GB RAM, CPU-only llama3.2:3b ~16 tok/s
8 GB VRAM (laptop) qwen3.5:4b ~150 tok/s
16 GB RAM qwen3.5:9b ~80–120 tok/s
24 GB VRAM (RTX 4090) qwen3-coder:32b ~20–40 tok/s
32 GB VRAM (RTX 5090) qwen3.5:35b ~20–40 tok/s

5. Provider Routing & Fallback

How Fallback Works

When a provider fails with a retryable error, TITAN tries the fallback chain (configured in agent.fallbackChain), then falls back to a provider-level failover scan.

Retryable errors (triggers fallback):

  • HTTP 429, 500, 502, 503
  • rate limit / rate_limit
  • timeout / timed out / ETIMEDOUT
  • ECONNREFUSED / ECONNRESET
  • Messages containing overloaded

Fallback Chain

// titan.json
{
  "agent": {
    "fallbackChain": ["ollama/qwen3.5:9b", "anthropic/claude-sonnet-4-20250514"],
    "fallbackMaxRetries": 3
  }
}
  • Max retries: fallbackMaxRetries (default: 3)
  • Fallback state expires after 5 minutes — TITAN retries the primary model after that

Provider-Level Failover

If the fallback chain is exhausted, TITAN scans providers in this order: anthropicopenaigoogleollama. It picks the first healthy provider with a model name matching the original prefix (e.g., claude-*).

Cloud Model Bypass

Ollama cloud models (with -cloud or :cloud suffix) may be silently rerouted to OpenRouter when a mapping exists. Look for:

[CloudBypass] qwen3.5:9b-cloud → openrouter/<model> (parallel-capable)

Unknown Provider

Error:

Unknown provider: <name>. Available: anthropic, openai, ollama, google, ...

Check your model ID format: it must be provider/model-name (e.g., anthropic/claude-sonnet-4-20250514).


6. Agent Stall Detection

TITAN monitors agent sessions for stalls and automatically intervenes.

Stall Types

Type Trigger Default Threshold
silence No activity 30s (120s on CPU-only / autonomous mode)
tool_loop Same tool + same args repeated 3 times in a row
empty_response LLM returns < 3 characters Immediate
max_rounds Tool round budget exhausted Depends on mode (up to 25 in autonomous)
tool_call_failure Model ignores tools for N rounds 3 consecutive rounds

What Happens on Stall

  1. TITAN sends a nudge message to the agent (up to 2 attempts, or 5 in autonomous mode)
  2. If nudges don't help, the agent gives up with: "I've been unable to make progress on this task."

Adjusting Thresholds

Stall thresholds auto-adjust based on hardware:

  • GPU detected: 30-second silence timeout
  • No GPU (CPU-only): 120-second silence timeout
  • Autonomous mode: 120-second silence timeout, 5 nudge attempts

7. Common HTTP Error Codes

Code Error Cause Fix
400 Invalid JSON Malformed request body Check your JSON syntax
401 Unauthorized Missing or invalid auth token Add Authorization: Bearer <token> header
401 Invalid password Wrong login password Check gateway.auth.password in config
404 Model '<name>' not found in Ollama Model not pulled Run ollama pull <name>
413 Payload too large (max 1MB) Request body > 1 MB Reduce payload size
429 Too many requests Rate limit exceeded Wait for Retry-After seconds
503 Server busy — too many concurrent requests > 5 concurrent requests Wait and retry
503 Cannot verify model — Ollama is unreachable Ollama down or unreachable Start Ollama: ollama serve

WebSocket Errors

Close Code Message Cause
1008 Unauthorized Invalid or missing auth on WS connection
1008 Mesh auth failed Mesh peer HMAC authentication rejected

8. Quick Troubleshooting Checklist

Run through this list when something isn't working:

  • Is Ollama running?curl http://localhost:11434/api/tags
  • Is the model pulled?ollama list
  • Is the gateway port free?lsof -i :48420
  • Is auth configured correctly? → Check gateway.auth.mode in titan.json
  • Is the GPU detected?nvidia-smi or rocm-smi
  • Is there enough VRAM?nvidia-smi or ollama ps
  • Does your model support tool calling? → See Section 4 above
  • Is the model ID formatted correctly? → Must be provider/model-name
  • Check TITAN logs → Look for [warn] and [error] lines
  • Check Ollama logsjournalctl -u ollama -f or run ollama serve in foreground

Useful Diagnostic Commands

# TITAN health check
curl http://localhost:48420/api/health

# TITAN system stats
curl http://localhost:48420/api/stats

# VRAM status
curl http://localhost:48420/api/vram

# Currently loaded Ollama models
ollama ps

# GPU memory
nvidia-smi --query-gpu=memory.used,memory.free --format=csv

# Run TITAN's built-in doctor
titan doctor --json

Getting Help

Clone this wiki locally