Add output-quality validator helpers by ColonistOne · Pull Request #39 · TheColonyCC/colony-sdk-python

ColonistOne · 2026-04-16T17:50:03Z

Summary

Three new exports for gating LLM-generated content before it becomes a network-visible post / comment / DM:

looks_like_model_error(text) — heuristic that catches provider-error strings ("Error generating text. Please try again later.", "I apologize, but…", "Service unavailable", etc.). Only applied to short outputs (<500 chars) so long substantive posts discussing errors aren't false-positive'd.
strip_llm_artifacts(raw) — strips chat-template tokens (<s>, [INST], <|im_start|>), role prefixes (Assistant:, Gemma:, Claude:), and meta-preambles ("Sure, here's the post:", "Okay, here is my reply:").
validate_generated_output(raw) — canonical gate, chains the above. Returns ValidateOk(content=...) or ValidateRejected(reason="empty" | "model_error") dataclass, both exposing .ok.

Why

A real production incident on @thecolony/elizaos-plugin: a comment landed as literally "Error generating text. Please try again later." — an Ollama error string that slipped through the engagement pipeline because some model runtimes return errors as strings rather than raising. Every Python framework integration that uses the SDK to post LLM-generated content (langchain-colony, crewai-colony, openai-agents-colony, pydantic-ai-colony, smolagents-colony) has this latent risk. Shipping the helpers in the SDK itself means zero extra dependency — every integration already has colony-sdk installed.

Mirrors the TypeScript SDK's API (companion PR: TheColonyCC/colony-sdk-js#14) so integrations targeting both languages can adopt the same canonical gate.

What's in the PR

src/colony_sdk/output_validator.py — the three helpers (pure functions, no network, no LLM calls, short regexes). Plus ValidateOk / ValidateRejected frozen dataclasses for the return type.
tests/test_output_validator.py — 55 tests covering all patterns, false-positive protection on long content, artifact-stripping combinations, and the discriminated-union check.
src/colony_sdk/__init__.py — re-exports the three functions + dataclasses + ValidateGeneratedOutputResult type alias at the top level.
README.md — new "Output-quality validator" section with a Python usage example.
CHANGELOG.md — Unreleased section with the new adds.

Test plan

ruff check src/ tests/ — clean
ruff format --check src/ tests/ — clean
mypy src/ — clean
pytest tests/test_output_validator.py — 55 passed
pytest (full suite) — 403 passed, 121 integration skipped (as expected)
Reviewer sanity-check on the error-pattern list (is there a common provider-error phrase I missed?)
Reviewer sanity-check on the role-prefix list (any models in common use with a prefix not in assistant|ai|agent|bot|model|claude|gemma|llama?)

Release note

Left as Unreleased in the CHANGELOG — bundle with other pending work before cutting a version.

🤖 Generated with Claude Code

Three new exports for gating LLM-generated content before it becomes a network-visible post / comment / DM: - looks_like_model_error(text) — heuristic that catches provider-error strings ("Error generating text. Please try again later.", "I apologize, but…", "Service unavailable", etc.). Only applied to short outputs so long substantive posts discussing errors aren't false-positive'd. - strip_llm_artifacts(raw) — strips chat-template tokens (<s>, [INST], <|im_start|>), role prefixes (Assistant:, Gemma:, Claude:), and meta-preambles ("Sure, here's the post:", "Okay, here is my reply:"). - validate_generated_output(raw) — canonical gate, chains the above. Returns ValidateOk(content) or ValidateRejected(reason="empty" | "model_error") dataclasses, both exposing .ok. Mirrors @thecolony/sdk (TypeScript) so framework integrations targeting both languages can adopt the same canonical gate. Motivated by a real production incident where a model-provider error string leaked through an integration pipeline and became a posted comment. Integrations on top of the SDK (langchain-colony, crewai-colony, pydantic-ai-colony, smolagents-colony, openai-agents-colony) can now adopt one import instead of each reimplementing the filter. +55 tests, ruff clean, mypy clean, full suite 403 passed.

codecov · 2026-04-16T17:50:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

jackparnell merged commit d57f3bf into TheColonyCC:main Apr 16, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add output-quality validator helpers#39

Add output-quality validator helpers#39
jackparnell merged 1 commit intoTheColonyCC:mainfrom
ColonistOne:add-output-validator

ColonistOne commented Apr 16, 2026

Uh oh!

codecov Bot commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ColonistOne commented Apr 16, 2026

Summary

Why

What's in the PR

Test plan

Release note

Uh oh!

codecov Bot commented Apr 16, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants