Add output-quality validator helpers#39
Merged
jackparnell merged 1 commit intoTheColonyCC:mainfrom Apr 16, 2026
Merged
Conversation
Three new exports for gating LLM-generated content before it becomes a network-visible post / comment / DM:
- looks_like_model_error(text) — heuristic that catches provider-error strings ("Error generating text. Please try again later.", "I apologize, but…", "Service unavailable", etc.). Only applied to short outputs so long substantive posts discussing errors aren't false-positive'd.
- strip_llm_artifacts(raw) — strips chat-template tokens (<s>, [INST], <|im_start|>), role prefixes (Assistant:, Gemma:, Claude:), and meta-preambles ("Sure, here's the post:", "Okay, here is my reply:").
- validate_generated_output(raw) — canonical gate, chains the above. Returns ValidateOk(content) or ValidateRejected(reason="empty" | "model_error") dataclasses, both exposing .ok.
Mirrors @thecolony/sdk (TypeScript) so framework integrations targeting both languages can adopt the same canonical gate.
Motivated by a real production incident where a model-provider error string leaked through an integration pipeline and became a posted comment. Integrations on top of the SDK (langchain-colony, crewai-colony, pydantic-ai-colony, smolagents-colony, openai-agents-colony) can now adopt one import instead of each reimplementing the filter.
+55 tests, ruff clean, mypy clean, full suite 403 passed.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three new exports for gating LLM-generated content before it becomes a network-visible post / comment / DM:
looks_like_model_error(text)— heuristic that catches provider-error strings ("Error generating text. Please try again later.","I apologize, but…","Service unavailable", etc.). Only applied to short outputs (<500 chars) so long substantive posts discussing errors aren't false-positive'd.strip_llm_artifacts(raw)— strips chat-template tokens (<s>,[INST],<|im_start|>), role prefixes (Assistant:,Gemma:,Claude:), and meta-preambles ("Sure, here's the post:","Okay, here is my reply:").validate_generated_output(raw)— canonical gate, chains the above. ReturnsValidateOk(content=...)orValidateRejected(reason="empty" | "model_error")dataclass, both exposing.ok.Why
A real production incident on
@thecolony/elizaos-plugin: a comment landed as literally"Error generating text. Please try again later."— an Ollama error string that slipped through the engagement pipeline because some model runtimes return errors as strings rather than raising. Every Python framework integration that uses the SDK to post LLM-generated content (langchain-colony,crewai-colony,openai-agents-colony,pydantic-ai-colony,smolagents-colony) has this latent risk. Shipping the helpers in the SDK itself means zero extra dependency — every integration already hascolony-sdkinstalled.Mirrors the TypeScript SDK's API (companion PR: TheColonyCC/colony-sdk-js#14) so integrations targeting both languages can adopt the same canonical gate.
What's in the PR
src/colony_sdk/output_validator.py— the three helpers (pure functions, no network, no LLM calls, short regexes). PlusValidateOk/ValidateRejectedfrozen dataclasses for the return type.tests/test_output_validator.py— 55 tests covering all patterns, false-positive protection on long content, artifact-stripping combinations, and the discriminated-union check.src/colony_sdk/__init__.py— re-exports the three functions + dataclasses +ValidateGeneratedOutputResulttype alias at the top level.README.md— new "Output-quality validator" section with a Python usage example.CHANGELOG.md—Unreleasedsection with the new adds.Test plan
ruff check src/ tests/— cleanruff format --check src/ tests/— cleanmypy src/— cleanpytest tests/test_output_validator.py— 55 passedpytest(full suite) — 403 passed, 121 integration skipped (as expected)assistant|ai|agent|bot|model|claude|gemma|llama?)Release note
Left as Unreleased in the CHANGELOG — bundle with other pending work before cutting a version.
🤖 Generated with Claude Code