Sort tool exceptions into a closed ErrorKind enum with LLM-friendly hint text. Zero deps.
from tool_error_classify import classify, ErrorKind
try:
result = run_tool(...)
except Exception as exc:
info = classify(exc)
match info.kind:
case ErrorKind.USER_INPUT:
# bad args from the LLM — surface so it can correct itself
return f"tool error: {info.hint}\n{exc}"
case ErrorKind.RATE_LIMITED:
back_off(seconds=info.retry_after_s or 5)
case ErrorKind.RETRYABLE:
retry()
case ErrorKind.AUTH | ErrorKind.INTERNAL:
raise # not the LLM's problemThe closed enum:
| Kind | Meaning | What to do |
|---|---|---|
USER_INPUT |
Bad args from LLM | Surface to model |
AUTH |
Auth failure | Stop, alert operator |
NOT_FOUND |
Missing resource | Tell LLM to try another |
RATE_LIMITED |
Throttled | Back off, retry |
TIMEOUT |
Request timed out | Smaller scope or wait |
RETRYABLE |
Transient | Retry with backoff |
EXTERNAL_PERMANENT |
External broken forever | Stop |
INTERNAL |
Our bug | Bubble up |
UNKNOWN |
Couldn't classify | Try once then give up |
The retry libs already exist. The fallback libs exist. The circuit breakers exist. What was missing was the decision layer: when a tool errors, which of those mechanisms should you invoke?
tool-error-classify is the one closed enum + a hint string per kind, so your agent loop can pattern-match on the result and route correctly.
In order:
- A user-supplied
custom_classifier(exc) -> ErrorKind | None(returnNoneto defer) exc.status_codeorexc.response.status_code(covers httpx, anthropic, openai, requests, etc.)- Native
TimeoutError/ConnectionError/OSError - The exception class name against a built-in keyword list (RateLimit, Throttling, Timeout, Unauthorized, Authentication, PermissionDenied, Forbidden, NotFound, DoesNotExist, Overloaded, ServiceUnavailable, APIConnection, InternalServer, ServiceQuotaExceeded, Validation, InvalidArgument, BadRequest)
- Built-in Python
ValueError/TypeError/KeyError/IndexError/AssertionError/AttributeError/LookupError→USER_INPUT - Fallback:
UNKNOWN
Retry-After header values (numeric seconds) on the response are extracted into ClassifiedError.retry_after_s when present.
pip install tool-error-classifyfrom tool_error_classify import (
classify, # main entrypoint
ErrorKind, # closed Enum (str-based, JSON-friendly)
ClassifiedError, # frozen dataclass result
DEFAULT_HINTS, # per-kind default hint strings
)
ClassifiedError(
kind: ErrorKind,
hint: str,
status_code: int | None,
retry_after_s: float | None,
)
classify(
exc,
custom_classifier=None, # Callable[[BaseException], ErrorKind | None]
hints=None, # dict[ErrorKind, str] override
) -> ClassifiedErrorllm-retry— once you know the error isRETRYABLEorRATE_LIMITED, this handles the backoff.llm-circuit-breaker— wireErrorKind.RETRYABLEcount to the breaker's failure threshold.llm-fallback-router—EXTERNAL_PERMANENTfor provider A means try provider B.
MIT