refactor(evaluators)!: reorganize into builtin + extra tiers#5
refactor(evaluators)!: reorganize into builtin + extra tiers#5abhinav-galileo merged 21 commits intomainfrom
Conversation
a8c1800 to
109081a
Compare
|
The author of this PR, abhinav-galileo, is not an activated member of this organization on Codecov. |
|
Posting review notes from my local pass:
|
- Restructure evaluators into peer directories (regex, list, json, sql, galileo_luna2) - Split each evaluator into config.py and evaluator.py - Move Evaluator, EvaluatorMetadata, registry from models to evaluators package - Rename luna2 to galileo_luna2 following provider_evaluatorname convention - Move discovery and factory from engine to evaluators package - Update engine to delegate to evaluators package - Organize tests to mirror source structure (tests/json/, tests/sql/) - Fix SDK __all__: remove duplicate "control", remove non-existent tool exports - Update documentation with correct import paths - Remove stale TODO comments and add docstrings to empty __init__.py BREAKING CHANGE: Evaluator, EvaluatorMetadata, register_evaluator now imported from agent_control_evaluators instead of agent_control_models
EvaluatorConfig now extends agent_control_models.base.BaseModel instead of pydantic.BaseModel directly, inheriting standard model behavior: - populate_by_name, use_enum_values, validate_assignment - to_dict(), to_json(), from_dict(), from_json() helpers - extra="ignore" for forward compatibility
Update MockConfig to extend EvaluatorConfig instead of pydantic's BaseModel for consistency with the new evaluator config pattern.
- Fix EvaluatorConfig → EvaluatorSpec in examples and models README
- Fix luna2 → galileo-luna2 with proper config in customer_support example
- Fix luna2/ → galileo_luna2/ path in galileo example
- Rename luna2 entry point to galileo-luna2 in pyproject.toml
- Update CONTRIBUTING.md with flat directory structure and correct imports
- Update AGENTS.md with correct register_evaluator import source
- Add ensure_evaluators_discovered() call in EvaluatorSpec validation
- Remove stale TODOs in sdks/python/tests/conftest.py
- Add docstrings to evaluators/tests/{json,sql}/__init__.py
…cy API BREAKING CHANGE: `parse_evaluator_ref()` removed, use `parse_evaluator_ref_full()` or `is_agent_scoped()` instead. Evaluator naming conventions: - Built-in: "regex", "list", "json", "sql" (no namespace) - External: "galileo/luna2" (slash separator) - Agent-scoped: "my-agent:custom" (colon separator) Changes: - Rename galileo-luna2 → galileo/luna2 throughout codebase - Add ParsedEvaluatorRef dataclass with type detection - Remove deprecated parse_evaluator_ref() tuple API - Migrate endpoints to use parse_evaluator_ref_full() and is_agent_scoped() - Standardize XEvaluator + XEvaluatorConfig naming in docs - Fix pre-existing lint issues (import sorting, Union syntax, unused imports)
Change config_model = AcmeToxicityConfig to AcmeToxicityEvaluatorConfig to match the class definition used in the example.
- Bump evaluators and engine versions from 0.1.0 to 2.1.0 - Align requires-python to >=3.12 (was >=3.10 in evaluators) - Standardize license to Apache-2.0 (was MIT in evaluators) - Align authors format to "Agent Control Team" - Bump pydantic minimum to >=2.12.4 in evaluators
Split evaluators into two packages: - builtin (`agent-control-evaluators`): Core infrastructure + regex, list, json, sql - extra/galileo (`agent-control-evaluator-galileo`): Luna2 evaluator (calls external API) BREAKING CHANGES: - Luna2 import path changed from `agent_control_evaluators.galileo_luna2` to `agent_control_evaluator_galileo.luna2` - External evaluator names use dot notation instead of slash (e.g., `galileo.luna2` instead of `galileo/luna2`) - SDK and server now depend on `agent-control-evaluators` as a runtime dependency (not vendored) to avoid duplicate module conflicts Key changes: - Move builtin evaluators to `evaluators/builtin/` - Create `evaluators/extra/galileo/` as separate package - Add entry points for plugin discovery (`agent_control.evaluators`) - Update workspace to include only builtin (extras excluded for perf) - Add CI workflow for testing extra packages - Add template scaffold for creating new evaluator packages - Server build script no longer vendors evaluators
- Add ruff and mypy to galileo package dev dependencies - Update CI workflow to use `uv sync --extra dev` instead of `uv pip install` - Use `uv run --extra dev` to ensure dev tools are available - Update template with same dev dependencies
Update all documentation and code references from `galileo/luna2` to `galileo.luna2` to match the actual implementation. The dot separator is used for external evaluators to distinguish from agent-scoped evaluators (which use colon). Files updated: - docs/OVERVIEW.md, docs/REFERENCE.md - evaluator examples - CONTRIBUTING.md - naming convention docs - README.md, examples/ - usage examples - UI evaluator definition and test fixtures - Server evaluator_utils.py docstrings - Evaluator _base.py docstring
…e structure Bug fixes: - fix(sql): use args.get() instead of find() for LIMIT/OFFSET to prevent subquery clauses from being attributed to outer queries - fix(engine): bump dependency floor to >=3.0.0 for models and evaluators Documentation updates: - Update evaluators directory structure to reflect builtin/extra tiers - Update external evaluator example to use separate package pattern - Show both direct install and extras syntax for Luna-2 - Fix all outdated path references to use evaluators/builtin/ Package config: - Add TODO comments for commented-out extras (tracking PyPI publish)
7084632 to
3464063
Compare
Version is now single-sourced from pyproject.toml only.
Use importlib.metadata.version("package-name") for runtime access.
…ds allow-list When allow_extra_fields=false, fields referenced only in field_constraints or field_patterns were incorrectly flagged as "extra fields". Now all four config options (required_fields, field_types, field_constraints, field_patterns) contribute to the allow-list. Also fixes README to use direct package install (agent-control-evaluator-galileo) instead of non-existent [galileo] extra, with TODO for enabling it post-publish.
Wire galileo tests into the build system without affecting the default `make test` target. Developers can now run: - `make test-extras` for extra evaluators only - `make test-all` for core + extras - `make galileo-test/lint/typecheck` for galileo specifically Also adds pytest-cov to galileo dev dependencies for coverage support.
|
To support
Right now the extras are commented out, so |
Reads version from package metadata at runtime instead of hardcoding. This keeps pyproject.toml as the single source of truth for versions.
- Add build_evaluators() and build_evaluator_galileo() to build script - Add PyPI publish steps in dependency order: models -> evaluators -> sdk -> evaluator-galileo - Enable [galileo] convenience extra on evaluators, SDK, and server - Rename [luna2] extra to [galileo] for consistency with package naming - Add uv source overrides for local galileo development - Update documentation and examples to use [galileo] extra
## 1.0.0 (2026-03-04) ### ⚠ BREAKING CHANGES * **server:** Feature/56688 fix image bug (#48) * **sdk:** a bug in docker file (#46) * **server:** Feature/56688 fix docker and create bash (#45) * **evaluators:** Evaluator reorganization with new package structure Package Structure: - agent-control-evaluators (v3.0.0): core + regex, list, json, sql - agent-control-evaluator-galileo (v3.0.0): Luna2 evaluator Key Changes: - Entry points for evaluator discovery (agent_control.evaluators) - Dot notation for external evaluators (galileo.luna2 not galileo/luna2) - Dynamic __version__ via importlib.metadata - Server uses evaluators as runtime dep (no longer vendored) - Release workflow publishes both packages to PyPI Bug Fixes: - JSON evaluator: field_constraints/field_patterns in extra-fields allow-list - SQL evaluator: LIMIT/OFFSET bypass fix Migration: - Import: agent_control_evaluator_galileo.luna2 (not agent_control_evaluators.galileo_luna2) - DB: UPDATE controls SET evaluator.name replace('/', '.') * **server:** add time-series stats and split API endpoints (#6) * **evaluators:** rename plugin to evaluator throughout (#81) * **models:** simplify step model and schema (#70) ### Features * Add plugin auto-discovery via Python entry points ([#49](#49)) ([1521182](1521182)) * **docs:** add GitHub badges and CI coverage reporting ([#90](#90)) ([be1fa14](be1fa14)) * **evaluators:** add required_column_values for multi-tenant SQL validation ([#30](#30)) ([532386c](532386c)) * **sdk-ts:** automate semantic-release for npm publishing ([#52](#52)) ([2b43958](2b43958)) * **sdk:** Add PyPI packaging with semantic release ([#52](#52)) ([7c24f7f](7c24f7f)) * **sdk:** Auto-populate init() steps from [@control](https://github.com/control)() decorators ([#23](#23)) ([dc0f2a4](dc0f2a4)) * **sdk:** export ControlScope, ControlMatch, and EvaluatorResult models ([#18](#18)) ([0d49cad](0d49cad)) * **sdk:** Get Agent Controls from SDK Init ([#15](#15)) ([a485f93](a485f93)) * **sdk:** Refresh controls in a background loop ([#43](#43)) ([03f826d](03f826d)) * **sdk:** ship TypeScript SDK with deterministic method naming ([#32](#32)) ([a76e9b0](a76e9b0)) * **server:** add evaluator config store ([#78](#78)) ([cc14aa6](cc14aa6)) * **server:** add initAgent conflict_mode overwrite mode with SDK defaults ([#40](#40)) ([f3ed2b8](f3ed2b8)) * **server:** Add observability system for control execution tracking ([#44](#44)) ([fd0bddc](fd0bddc)) * **server:** add prometheus metrics for endpoints ([#68](#68)) ([775612c](775612c)) * **server:** add time-series stats and split API endpoints ([#6](#6)) ([a0fa597](a0fa597)) * **server:** hard-cut migrate to remove agent UUID ([#44](#44)) ([ee322c9](ee322c9)) * **server:** Optional Policy and many to many relationships ([#41](#41)) ([1a62746](1a62746)) * **ui:** add sql, luna2, json control forms and restructure the code ([#54](#54)) ([c4c1d4a](c4c1d4a)) * **ui:** allow to delete control ([#39](#39)) ([7dc4ca3](7dc4ca3)) * **ui:** Control Store Flow Updated ([#4](#4)) ([dda9f70](dda9f70)) * **ui:** stats dashboard ([#80](#80)) ([4cbb7fe](4cbb7fe)) * **ui:** Steps dropdown rendered based on api return values ([#36](#36)) ([a2aca43](a2aca43)) * **ui:** tests added and some minor ui changes, added error boundaries ([#61](#61)) ([009852b](009852b)) * **ui:** update agent control icon and favicon ([#42](#42)) ([19af8fa](19af8fa)) ### Bug Fixes * **ci:** Add ui scope to PR title validation ([#59](#59)) ([e0fdb52](e0fdb52)) * **ci:** correct galileo contrib path in release build script ([#51](#51)) ([2de6013](2de6013)) * **ci:** Enable pr title on prs ([#56](#56)) ([3d8b5fe](3d8b5fe)) * **ci:** Fix release ([#11](#11)) ([9dd3dd7](9dd3dd7)) * **ci:** Use galileo-automation bot for releases ([#57](#57)) ([bc8eea0](bc8eea0)) * **docs:** Add Example for Evaluator Extension ([#3](#3)) ([c2a70b3](c2a70b3)) * **docs:** add setup script ([#49](#49)) ([7a212c3](7a212c3)) * **docs:** Clean up Protect ([#76](#76)) ([99c16fd](99c16fd)) * **docs:** Fix Examples for LangGraph ([#64](#64)) ([23b30ae](23b30ae)) * **docs:** Improve documentation for open source release ([#47](#47)) ([9018fb3](9018fb3)) * **docs:** Remove old/unused examples ([#66](#66)) ([f417781](f417781)) * **docs:** Update Contributing Guide ([#8](#8)) ([10b34c8](10b34c8)) * **docs:** Update readme ([#37](#37)) ([7531d83](7531d83)) * **docs:** Update README ([#2](#2)) ([379bb15](379bb15)) * **examples:** Control sets cleanup with signed ([#65](#65)) ([af7b5fb](af7b5fb)) * **examples:** Update crew ai example to use evaluator ([#93](#93)) ([1c65084](1c65084)) * **infra:** Add plugins directory to Dockerfile ([#58](#58)) ([171d459](171d459)) * **infra:** install engine/evaluators in server image ([#14](#14)) ([d5ae157](d5ae157)) * **models:** use StrEnum for error enums ([#12](#12)) ([3f41c9f](3f41c9f)) * **sdk-ts:** add conventional commits preset dependency ([#55](#55)) ([540fe9d](540fe9d)) * **sdk-ts:** export npm token for semantic-release npm auth ([#54](#54)) ([1b6b993](1b6b993)) * **sdk:** 54253 add steer action and example ([#38](#38)) ([bf2380a](bf2380a)) * **sdk:** a bug in docker file ([#46](#46)) ([12d1794](12d1794)) * **sdk:** Add step_name as parameter to control ([#25](#25)) ([19ade9d](19ade9d)) * **sdk:** emit observability events for SDK-evaluated controls and fix non_matches propagation ([#24](#24)) ([6a9da69](6a9da69)) * **sdk:** enforce UUID agent IDs ([#9](#9)) ([5ccdbd0](5ccdbd0)) * **sdk:** Fix logging ([#77](#77)) ([b1f078c](b1f078c)) * **sdk:** plugin to evaluator.. agent_protect to agent_control ([#88](#88)) ([fc9b088](fc9b088)) * **server:** enforce public-safe API error responses ([#20](#20)) ([e50d817](e50d817)) * **server:** Feature/56688 fix docker and create bash ([#45](#45)) ([7277e27](7277e27)) * **server:** Feature/56688 fix image bug ([#48](#48)) ([71e6b44](71e6b44)) * **server:** fix alembic migrations ([#47](#47)) ([c19c17c](c19c17c)) * **server:** reject initAgent UUID/name mismatch ([#13](#13)) ([19d61ff](19d61ff)) * tighten evaluation error handling and preserve control data ([52a1ef8](52a1ef8)) * **ui:** Fix UI and clients for simplified step schema ([#75](#75)) ([be2aaf0](be2aaf0)) * **ui:** json validation ([#10](#10)) ([a0cd5af](a0cd5af)) * **ui:** selector subpaths issue ([#34](#34)) ([79cb776](79cb776)) * **ui:** UI feedback fixes ([#27](#27)) ([6004761](6004761)) ### Code Refactoring * **evaluators:** rename plugin to evaluator throughout ([#81](#81)) ([0134682](0134682)) * **evaluators:** split into builtin + extra packages for PyPI ([#5](#5)) ([0e0a78a](0e0a78a)) * **models:** simplify step model and schema ([#70](#70)) ([4c1d637](4c1d637))
Summary
Split evaluators into a two-tier package architecture:
agent-control-evaluators): Core infrastructure + regex, list, json, sql evaluatorsagent-control-evaluator-galileo): Luna2 evaluator (calls external Galileo API)Key Changes
Package Structure
evaluators/builtin/evaluators/extra/galileo/as a separate packageagent_control.evaluators).github/workflows/test-extras.yml)evaluators/extra/template/)Dependency Strategy
agent-control-evaluatorsas a runtime dependency (not vendored)Naming Convention
galileo.luna2instead ofgalileo/luna2Breaking Changes
1. Python Imports
2. Evaluator Names
3. Installation
Migration Guide
pip install agent-control-evaluator-galileoagent_control_evaluators.galileo_luna2toagent_control_evaluator_galileo.luna2data.evaluator.namecontains/to use.:Test Plan