-
Notifications
You must be signed in to change notification settings - Fork 0
Tool Error Recovery
LLMs produce malformed tool calls. Agents.KT lets you fix them -- with code, with another agent, or with both.
Large language models are probabilistic text generators. When they produce tool calls, things go wrong in predictable ways:
-
Trailing commas in JSON:
{"a": 1, "b": 2,} -
Markdown fencing around arguments:
```json\n{"a": 1}\n``` -
Wrong types: a number sent as a string
"42"instead of42 - Missing required fields: the model forgets a parameter
- Runtime failures: the tool itself throws because of bad input or transient errors
Without error recovery, any of these failures kills the agentic loop. The agent stops, the user gets nothing.
Most frameworks handle malformed tool calls with special parser classes, retry middleware, or string-cleaning utilities buried in utility packages.
Agents.KT takes a different approach: the fixer is an agent. The same Agent<IN, OUT> interface you use to build your application is the same interface you use to repair broken tool calls. No new abstraction. No special machinery.
This means repair logic gets the full power of the framework: it can be deterministic (a pure function), LLM-driven (an agent with its own model), or a composition of both.
Tool errors form a sealed hierarchy with four variants:
sealed interface ToolError {
data class InvalidArgs(
val rawArgs: String,
val parseError: String,
val expectedSchema: JsonSchema
) : ToolError
data class DeserializationError(
val rawValue: String,
val targetType: KType,
val cause: Throwable
) : ToolError
data class ExecutionError(
val args: ToolArgs,
val cause: Throwable
) : ToolError
data class EscalationError(
val source: AgentRef,
val reason: String,
val severity: Severity,
val originalError: ToolError,
val attempts: Int
) : ToolError
}| Error Type | When It Fires | Typical Cause |
|---|---|---|
InvalidArgs |
JSON parsing fails | Trailing commas, markdown fencing, truncated output |
DeserializationError |
JSON parses but cannot map to expected types |
"42" instead of 42, missing keys |
ExecutionError |
Tool executor throws | Bad input values, transient I/O failures, business logic errors |
EscalationError |
Repair itself fails and escalates up | Exhausted retries, unrecoverable state |
The sealed hierarchy means when expressions are exhaustive -- the compiler tells you if you miss a case.
Each tool can declare error handlers using the onError {} block. Inside, three verbs match the three non-escalation error types:
tool("write_file", "Write content to a file") { args ->
val path = args["path"] as String
val content = args["content"] as String
fileSystem.write(path, content)
}
onError {
invalidArgs { args, error ->
fix { args.trimMarkdownFencing() }
}
deserializationError { raw, error ->
sanitize { raw.normalizePathSeparators() }
}
executionError { e ->
retry(maxAttempts = 3, backoff = exponential())
}
}| Verb | Error Type | Purpose |
|---|---|---|
invalidArgs { } |
InvalidArgs |
Fix unparseable JSON |
deserializationError { } |
DeserializationError |
Fix type mismatches |
executionError { } |
ExecutionError |
Handle runtime failures |
The simplest recovery strategy is a pure function. No LLM, no network call -- just string manipulation.
onError {
invalidArgs { args, error ->
fix {
args
.trimMarkdownFencing() // strip ```json ... ```
.replace(Regex(",\\s*}"), "}") // remove trailing commas
.replace(Regex(",\\s*]"), "]") // remove trailing commas in arrays
}
}
}The lambda receives the raw argument string and returns a cleaned version. The framework re-parses the cleaned string and retries the tool call.
onError {
deserializationError { raw, error ->
sanitize {
raw.normalizePathSeparators() // backslash to forward slash
}
}
}Same idea: transform the raw value so it deserializes correctly.
onError {
executionError { e ->
retry(maxAttempts = 3, backoff = exponential())
}
}This re-runs the tool executor with the same arguments. The backoff parameter controls the delay between attempts. Use this for transient failures like network timeouts or rate limits.
When deterministic cleanup is not enough -- the JSON is too mangled, the error is too novel -- you can delegate repair to an agent.
A repair agent is a regular Agent<String, String>. It takes the broken input as a string and returns a fixed string:
val jsonFixer = agent<String, String>("json-fixer") {
prompt = """
You are a JSON repair tool. You receive malformed JSON and return
valid JSON. Do not add or remove fields. Only fix syntax errors.
Return ONLY the fixed JSON, no explanation.
""".trimIndent()
model {
ollama("qwen2.5:7b")
temperature = 0.0 // deterministic output
}
budget { maxTurns = 1 } // single-shot, no tool loop
skills {
skill<String, String>("fix-json", "Repairs broken JSON") {
implementedBy { input -> input } // LLM does the work via prompt
}
}
}tool("create_task", "Create a new task") { args ->
val title = args["title"] as String
taskService.create(title)
}
onError {
invalidArgs { args, error ->
fix(agent = jsonFixer, retries = 3)
}
}The framework sends the broken arguments to jsonFixer, takes the output, re-parses it, and retries the tool call. If the fix fails, it retries up to 3 times before giving up.
The most robust approach: try deterministic repair first, fall back to the LLM only if it returns null:
onError {
invalidArgs { args, error ->
fix {
// Attempt 1: simple cleanup
tryJsonCleanup(args) // returns null if cleanup is insufficient
} ?: fix(agent = jsonFixer, retries = 3)
// Attempt 2: LLM-driven repair if deterministic fix returned null
}
}This gives you the speed of string manipulation for common cases (trailing commas, fencing) and the intelligence of an LLM for edge cases.
fun tryJsonCleanup(raw: String): String? {
val cleaned = raw
.trim()
.removePrefix("```json").removePrefix("```")
.removeSuffix("```")
.trim()
.replace(Regex(",\\s*}"), "}")
.replace(Regex(",\\s*]"), "]")
return try {
// Verify it parses
JsonParser.parse(cleaned)
cleaned
} catch (e: Exception) {
null // signal: deterministic fix was not enough
}
}A repair agent does not have to use an LLM. You can build a fully deterministic agent using implementedBy:
val regexFixer = agent<String, String>("regex-fixer") {
skills {
skill<String, String>("fix", "Fix JSON with regex") {
implementedBy { input ->
input
.replace(Regex("(?s)```json\\s*(.+?)\\s*```"), "$1")
.replace(Regex(",\\s*([}\\]])"), "$1")
.replace(Regex("'"), "\"")
}
}
}
}Zero LLM calls, zero latency, zero cost -- but it conforms to the Agent<String, String> interface, so it plugs into fix(agent = ...) seamlessly. The framework does not care how the agent produces its output.
When many tools share the same error handling, define defaults at the tools {} level:
skills {
skill<String, String>("data-ops", "Data operations") {
tools("read_file", "write_file", "delete_file")
defaults {
onError {
invalidArgs { args, error ->
fix { tryJsonCleanup(args) } ?: fix(agent = jsonFixer, retries = 2)
}
executionError { e ->
retry(maxAttempts = 3, backoff = exponential())
}
}
}
tool("read_file", "Read a file") { args ->
fileSystem.read(args["path"] as String)
}
tool("write_file", "Write a file") { args ->
fileSystem.write(args["path"] as String, args["content"] as String)
}
tool("delete_file", "Delete a file") { args ->
fileSystem.delete(args["path"] as String)
}
// Per-tool override: delete_file has stricter handling
onError("delete_file") {
executionError { e ->
// No retry for destructive operations
escalate()
}
}
}
}The rule: per-tool onError overrides defaults for that specific tool. All other tools inherit the defaults.
When repair fails, the tool has two options:
executionError { e ->
escalate()
}escalate() does not throw. It wraps the error in an EscalationError and walks up the structure {} delegation tree. If the agent is part of a parent agent's structure, the parent can catch the escalation and decide what to do -- retry with a different skill, use a fallback, or escalate further.
executionError { e ->
throwException()
}throwException() throws immediately. The agentic loop stops. Use this for genuinely unrecoverable errors -- file system corruption, invalid credentials, logic bugs you want to surface during development.
Tool fails
|
v
onError handler runs
|
+--> fix/retry succeeds --> tool result returned, loop continues
|
+--> fix/retry fails
|
+--> escalate() --> EscalationError created
| |
| v
| Parent agent's structure handler (if exists)
| |
| +--> Parent handles it (fallback, retry, different skill)
| |
| +--> Parent escalates further (walks up the tree)
|
+--> throwException() --> Exception thrown, loop stops
An agent with multiple tools, each with tailored error recovery:
val jsonFixer = agent<String, String>("json-fixer") {
prompt = "Fix the malformed JSON. Return only valid JSON."
model { ollama("qwen2.5:7b"); temperature = 0.0 }
budget { maxTurns = 1 }
skills {
skill<String, String>("fix", "Fix JSON") {
implementedBy { it }
}
}
}
val fileAgent = agent<String, String>("file-manager") {
prompt = "You manage files. Use tools to read, write, and list files."
model { ollama("qwen2.5:7b") }
budget { maxTurns = 10 }
skills {
skill<String, String>("manage-files", "File management operations") {
tools("read_file", "write_file", "list_dir")
// Shared defaults
defaults {
onError {
invalidArgs { args, error ->
fix { tryJsonCleanup(args) } ?: fix(agent = jsonFixer, retries = 2)
}
}
}
tool("read_file", "Read file contents by path") { args ->
val path = args["path"] as String
File(path).readText()
}
onError("read_file") {
executionError { e ->
when (e.cause) {
is FileNotFoundException -> escalate()
is IOException -> retry(maxAttempts = 3, backoff = exponential())
else -> throwException()
}
}
}
tool("write_file", "Write content to a file") { args ->
val path = args["path"] as String
val content = args["content"] as String
File(path).writeText(content)
"Written ${content.length} bytes to $path"
}
onError("write_file") {
deserializationError { raw, error ->
sanitize { raw.normalizePathSeparators() }
}
executionError { e ->
retry(maxAttempts = 2, backoff = exponential())
}
}
tool("list_dir", "List files in a directory") { args ->
val path = args["path"] as String
File(path).listFiles()?.map { it.name } ?: emptyList<String>()
}
// list_dir inherits defaults -- no per-tool override needed
}
}
onToolUse { name, args, result ->
println("[$name] args=$args result=$result")
}
}
// Usage
val result = fileAgent("Read the contents of /tmp/config.json and summarize it")In this example:
- All three tools share the
invalidArgsdefault (deterministic cleanup, then LLM fixer). -
read_fileescalates on missing files, retries on I/O errors, and throws on unexpected failures. -
write_filesanitizes path separators and retries on execution errors. -
list_dirrelies entirely on the shared defaults.
- Model & Tool Calling -- understand the agentic loop that these errors occur in
- Skill Selection & Routing -- how agents pick which skill to run
- Budget Controls -- prevent runaway loops during error recovery
- Observability Hooks -- monitor recovery attempts
Project Links
Getting Started
Core Concepts
Composition Operators
LLM Integration
- Model & Tool Calling
- MCP Integration
- Agent Deployment Modes
- Swarm
- Tool Error Recovery
- Skill Selection & Routing
- Budget Controls
- Observability Hooks
Guided Generation
Agent Memory
Reference
- API Quick Reference
- Type Algebra Cheat Sheet
- Glossary
- Best Practices
- Cookbook & Recipes
- Troubleshooting & FAQ
- Roadmap
Contributing