refactor(evaluators)!: reorganize into builtin + extra tiers by abhinav-galileo · Pull Request #5 · agentcontrol/agent-control

abhinav-galileo · 2026-01-31T08:06:20Z

Summary

Split evaluators into a two-tier package architecture:

builtin (agent-control-evaluators): Core infrastructure + regex, list, json, sql evaluators
extra/galileo (agent-control-evaluator-galileo): Luna2 evaluator (calls external Galileo API)

Key Changes

Package Structure

Move builtin evaluators to evaluators/builtin/
Create evaluators/extra/galileo/ as a separate package
Add entry points for plugin discovery (agent_control.evaluators)
Update workspace to include only builtin (extras excluded for performance)
Add CI workflow for testing extra packages (.github/workflows/test-extras.yml)
Add template scaffold for creating new evaluator packages (evaluators/extra/template/)

Dependency Strategy

SDK and server depend on agent-control-evaluators as a runtime dependency (not vendored)
This avoids duplicate module conflicts when galileo extras are installed
Server build script no longer vendors evaluators (only models + engine)

Naming Convention

External evaluator names now use dot notation instead of slash
Example: galileo.luna2 instead of galileo/luna2

Breaking Changes

1. Python Imports

# OLD (will break):
from agent_control_evaluators.galileo_luna2 import Luna2Evaluator

# NEW:
from agent_control_evaluator_galileo.luna2 import Luna2Evaluator

# OR via SDK (still works):
from agent_control.evaluators import Luna2Evaluator

2. Evaluator Names

# OLD:
evaluator_name = "galileo/luna2"

# NEW:
evaluator_name = "galileo.luna2"

3. Installation

# Builtin evaluators (automatic with SDK/server):
pip install agent-control-evaluators

# Galileo Luna2 (optional):
pip install agent-control-evaluator-galileo

# Or via convenience extra:
pip install agent-control-evaluators[galileo]

Migration Guide

Install the new package: pip install agent-control-evaluator-galileo
Update Python imports from agent_control_evaluators.galileo_luna2 to agent_control_evaluator_galileo.luna2

Database migration required: Update controls where data.evaluator.name contains / to use .:

UPDATE controls
SET data = jsonb_set(data, '{evaluator,name}',
    to_jsonb(replace(data->'evaluator'->>'name', '/', '.')))
WHERE data->'evaluator'->>'name' LIKE '%/%'
  AND data->'evaluator'->>'name' NOT LIKE '%:%';

Test Plan

All existing tests pass (771 total)
Lint passes for all packages
Type check passes for all packages
Luna2 smoke test correctly skips when package not installed
Entry points register builtin evaluators automatically

codecov · 2026-02-03T13:08:54Z

The author of this PR, abhinav-galileo, is not an activated member of this organization on Codecov.
Please activate this user on Codecov to display this PR comment.
Coverage data is still being uploaded to Codecov.io for purposes of overall coverage calculations.
Please don't hesitate to email us at support@codecov.io with any questions.

lan17 · 2026-02-03T21:10:06Z

Posting review notes from my local pass:

JSON evaluator extra-field check can false-positive when allow_extra_fields=false because the allow-list only includes field_types and required_fields; fields referenced via field_constraints or field_patterns are treated as “extra”. File: evaluators/builtin/src/agent_control_evaluators/json/evaluator.py (extra_paths logic).
Docs/packaging mismatch: README suggests pip install agent-control-evaluators[galileo], but the [galileo] extra is commented out in evaluators/builtin/pyproject.toml, so that command currently fails. Suggest enabling the extra when published or pointing docs to agent-control-evaluator-galileo.
Extras tests not wired: make test only runs builtin evaluators; evaluators/extra/galileo/tests aren’t exercised by default (and I don’t see a test-extras workflow). Consider adding a target/workflow or documenting the omission.

evaluators/builtin/README.md

Makefile

- Restructure evaluators into peer directories (regex, list, json, sql, galileo_luna2) - Split each evaluator into config.py and evaluator.py - Move Evaluator, EvaluatorMetadata, registry from models to evaluators package - Rename luna2 to galileo_luna2 following provider_evaluatorname convention - Move discovery and factory from engine to evaluators package - Update engine to delegate to evaluators package - Organize tests to mirror source structure (tests/json/, tests/sql/) - Fix SDK __all__: remove duplicate "control", remove non-existent tool exports - Update documentation with correct import paths - Remove stale TODO comments and add docstrings to empty __init__.py BREAKING CHANGE: Evaluator, EvaluatorMetadata, register_evaluator now imported from agent_control_evaluators instead of agent_control_models

EvaluatorConfig now extends agent_control_models.base.BaseModel instead of pydantic.BaseModel directly, inheriting standard model behavior: - populate_by_name, use_enum_values, validate_assignment - to_dict(), to_json(), from_dict(), from_json() helpers - extra="ignore" for forward compatibility

Update MockConfig to extend EvaluatorConfig instead of pydantic's BaseModel for consistency with the new evaluator config pattern.

- Fix EvaluatorConfig → EvaluatorSpec in examples and models README - Fix luna2 → galileo-luna2 with proper config in customer_support example - Fix luna2/ → galileo_luna2/ path in galileo example - Rename luna2 entry point to galileo-luna2 in pyproject.toml - Update CONTRIBUTING.md with flat directory structure and correct imports - Update AGENTS.md with correct register_evaluator import source - Add ensure_evaluators_discovered() call in EvaluatorSpec validation - Remove stale TODOs in sdks/python/tests/conftest.py - Add docstrings to evaluators/tests/{json,sql}/__init__.py

…cy API BREAKING CHANGE: `parse_evaluator_ref()` removed, use `parse_evaluator_ref_full()` or `is_agent_scoped()` instead. Evaluator naming conventions: - Built-in: "regex", "list", "json", "sql" (no namespace) - External: "galileo/luna2" (slash separator) - Agent-scoped: "my-agent:custom" (colon separator) Changes: - Rename galileo-luna2 → galileo/luna2 throughout codebase - Add ParsedEvaluatorRef dataclass with type detection - Remove deprecated parse_evaluator_ref() tuple API - Migrate endpoints to use parse_evaluator_ref_full() and is_agent_scoped() - Standardize XEvaluator + XEvaluatorConfig naming in docs - Fix pre-existing lint issues (import sorting, Union syntax, unused imports)

Change config_model = AcmeToxicityConfig to AcmeToxicityEvaluatorConfig to match the class definition used in the example.

- Bump evaluators and engine versions from 0.1.0 to 2.1.0 - Align requires-python to >=3.12 (was >=3.10 in evaluators) - Standardize license to Apache-2.0 (was MIT in evaluators) - Align authors format to "Agent Control Team" - Bump pydantic minimum to >=2.12.4 in evaluators

Split evaluators into two packages: - builtin (`agent-control-evaluators`): Core infrastructure + regex, list, json, sql - extra/galileo (`agent-control-evaluator-galileo`): Luna2 evaluator (calls external API) BREAKING CHANGES: - Luna2 import path changed from `agent_control_evaluators.galileo_luna2` to `agent_control_evaluator_galileo.luna2` - External evaluator names use dot notation instead of slash (e.g., `galileo.luna2` instead of `galileo/luna2`) - SDK and server now depend on `agent-control-evaluators` as a runtime dependency (not vendored) to avoid duplicate module conflicts Key changes: - Move builtin evaluators to `evaluators/builtin/` - Create `evaluators/extra/galileo/` as separate package - Add entry points for plugin discovery (`agent_control.evaluators`) - Update workspace to include only builtin (extras excluded for perf) - Add CI workflow for testing extra packages - Add template scaffold for creating new evaluator packages - Server build script no longer vendors evaluators

- Add ruff and mypy to galileo package dev dependencies - Update CI workflow to use `uv sync --extra dev` instead of `uv pip install` - Use `uv run --extra dev` to ensure dev tools are available - Update template with same dev dependencies

Update all documentation and code references from `galileo/luna2` to `galileo.luna2` to match the actual implementation. The dot separator is used for external evaluators to distinguish from agent-scoped evaluators (which use colon). Files updated: - docs/OVERVIEW.md, docs/REFERENCE.md - evaluator examples - CONTRIBUTING.md - naming convention docs - README.md, examples/ - usage examples - UI evaluator definition and test fixtures - Server evaluator_utils.py docstrings - Evaluator _base.py docstring

…e structure Bug fixes: - fix(sql): use args.get() instead of find() for LIMIT/OFFSET to prevent subquery clauses from being attributed to outer queries - fix(engine): bump dependency floor to >=3.0.0 for models and evaluators Documentation updates: - Update evaluators directory structure to reflect builtin/extra tiers - Update external evaluator example to use separate package pattern - Show both direct install and extras syntax for Luna-2 - Fix all outdated path references to use evaluators/builtin/ Package config: - Add TODO comments for commented-out extras (tracking PyPI publish)

Version is now single-sourced from pyproject.toml only. Use importlib.metadata.version("package-name") for runtime access.

…ds allow-list When allow_extra_fields=false, fields referenced only in field_constraints or field_patterns were incorrectly flagged as "extra fields". Now all four config options (required_fields, field_types, field_constraints, field_patterns) contribute to the allow-list. Also fixes README to use direct package install (agent-control-evaluator-galileo) instead of non-existent [galileo] extra, with TODO for enabling it post-publish.

Wire galileo tests into the build system without affecting the default `make test` target. Developers can now run: - `make test-extras` for extra evaluators only - `make test-all` for core + extras - `make galileo-test/lint/typecheck` for galileo specifically Also adds pytest-cov to galileo dev dependencies for coverage support.

lan17 · 2026-02-04T20:07:04Z

To support agent-control-evaluators[xyz] installs:

Enable extras on the builtin package
- In evaluators/builtin/pyproject.toml, add entries under [project.optional-dependencies], e.g.
  - galileo = ["agent-control-evaluator-galileo>=3.0.0"]
  - nvidia = ["agent-control-evaluator-nvidia>=3.0.0"] (when published)
Publish provider packages
- Make sure the release workflow publishes agent-control-evaluator-<provider> packages before or alongside agent-control-evaluators, so the extras resolve on PyPI.
Optional (if you want singular syntax)
- If you prefer agent-control-evaluator[galileo], add a small meta package named agent-control-evaluator that depends on agent-control-evaluators and defines the same extras.

Right now the extras are commented out, so agent-control-evaluators[galileo] will not resolve until #1 + #2 are done.

Reads version from package metadata at runtime instead of hardcoding. This keeps pyproject.toml as the single source of truth for versions.

- Add build_evaluators() and build_evaluator_galileo() to build script - Add PyPI publish steps in dependency order: models -> evaluators -> sdk -> evaluator-galileo - Enable [galileo] convenience extra on evaluators, SDK, and server - Rename [luna2] extra to [galileo] for consistency with package naming - Add uv source overrides for local galileo development - Update documentation and examples to use [galileo] extra

## 1.0.0 (2026-03-04) ### ⚠ BREAKING CHANGES * **server:** Feature/56688 fix image bug (#48) * **sdk:** a bug in docker file (#46) * **server:** Feature/56688 fix docker and create bash (#45) * **evaluators:** Evaluator reorganization with new package structure Package Structure: - agent-control-evaluators (v3.0.0): core + regex, list, json, sql - agent-control-evaluator-galileo (v3.0.0): Luna2 evaluator Key Changes: - Entry points for evaluator discovery (agent_control.evaluators) - Dot notation for external evaluators (galileo.luna2 not galileo/luna2) - Dynamic __version__ via importlib.metadata - Server uses evaluators as runtime dep (no longer vendored) - Release workflow publishes both packages to PyPI Bug Fixes: - JSON evaluator: field_constraints/field_patterns in extra-fields allow-list - SQL evaluator: LIMIT/OFFSET bypass fix Migration: - Import: agent_control_evaluator_galileo.luna2 (not agent_control_evaluators.galileo_luna2) - DB: UPDATE controls SET evaluator.name replace('/', '.') * **server:** add time-series stats and split API endpoints (#6) * **evaluators:** rename plugin to evaluator throughout (#81) * **models:** simplify step model and schema (#70) ### Features * Add plugin auto-discovery via Python entry points ([#49](#49)) ([1521182](1521182)) * **docs:** add GitHub badges and CI coverage reporting ([#90](#90)) ([be1fa14](be1fa14)) * **evaluators:** add required_column_values for multi-tenant SQL validation ([#30](#30)) ([532386c](532386c)) * **sdk-ts:** automate semantic-release for npm publishing ([#52](#52)) ([2b43958](2b43958)) * **sdk:** Add PyPI packaging with semantic release ([#52](#52)) ([7c24f7f](7c24f7f)) * **sdk:** Auto-populate init() steps from [@control](https://github.com/control)() decorators ([#23](#23)) ([dc0f2a4](dc0f2a4)) * **sdk:** export ControlScope, ControlMatch, and EvaluatorResult models ([#18](#18)) ([0d49cad](0d49cad)) * **sdk:** Get Agent Controls from SDK Init ([#15](#15)) ([a485f93](a485f93)) * **sdk:** Refresh controls in a background loop ([#43](#43)) ([03f826d](03f826d)) * **sdk:** ship TypeScript SDK with deterministic method naming ([#32](#32)) ([a76e9b0](a76e9b0)) * **server:** add evaluator config store ([#78](#78)) ([cc14aa6](cc14aa6)) * **server:** add initAgent conflict_mode overwrite mode with SDK defaults ([#40](#40)) ([f3ed2b8](f3ed2b8)) * **server:** Add observability system for control execution tracking ([#44](#44)) ([fd0bddc](fd0bddc)) * **server:** add prometheus metrics for endpoints ([#68](#68)) ([775612c](775612c)) * **server:** add time-series stats and split API endpoints ([#6](#6)) ([a0fa597](a0fa597)) * **server:** hard-cut migrate to remove agent UUID ([#44](#44)) ([ee322c9](ee322c9)) * **server:** Optional Policy and many to many relationships ([#41](#41)) ([1a62746](1a62746)) * **ui:** add sql, luna2, json control forms and restructure the code ([#54](#54)) ([c4c1d4a](c4c1d4a)) * **ui:** allow to delete control ([#39](#39)) ([7dc4ca3](7dc4ca3)) * **ui:** Control Store Flow Updated ([#4](#4)) ([dda9f70](dda9f70)) * **ui:** stats dashboard ([#80](#80)) ([4cbb7fe](4cbb7fe)) * **ui:** Steps dropdown rendered based on api return values ([#36](#36)) ([a2aca43](a2aca43)) * **ui:** tests added and some minor ui changes, added error boundaries ([#61](#61)) ([009852b](009852b)) * **ui:** update agent control icon and favicon ([#42](#42)) ([19af8fa](19af8fa)) ### Bug Fixes * **ci:** Add ui scope to PR title validation ([#59](#59)) ([e0fdb52](e0fdb52)) * **ci:** correct galileo contrib path in release build script ([#51](#51)) ([2de6013](2de6013)) * **ci:** Enable pr title on prs ([#56](#56)) ([3d8b5fe](3d8b5fe)) * **ci:** Fix release ([#11](#11)) ([9dd3dd7](9dd3dd7)) * **ci:** Use galileo-automation bot for releases ([#57](#57)) ([bc8eea0](bc8eea0)) * **docs:** Add Example for Evaluator Extension ([#3](#3)) ([c2a70b3](c2a70b3)) * **docs:** add setup script ([#49](#49)) ([7a212c3](7a212c3)) * **docs:** Clean up Protect ([#76](#76)) ([99c16fd](99c16fd)) * **docs:** Fix Examples for LangGraph ([#64](#64)) ([23b30ae](23b30ae)) * **docs:** Improve documentation for open source release ([#47](#47)) ([9018fb3](9018fb3)) * **docs:** Remove old/unused examples ([#66](#66)) ([f417781](f417781)) * **docs:** Update Contributing Guide ([#8](#8)) ([10b34c8](10b34c8)) * **docs:** Update readme ([#37](#37)) ([7531d83](7531d83)) * **docs:** Update README ([#2](#2)) ([379bb15](379bb15)) * **examples:** Control sets cleanup with signed ([#65](#65)) ([af7b5fb](af7b5fb)) * **examples:** Update crew ai example to use evaluator ([#93](#93)) ([1c65084](1c65084)) * **infra:** Add plugins directory to Dockerfile ([#58](#58)) ([171d459](171d459)) * **infra:** install engine/evaluators in server image ([#14](#14)) ([d5ae157](d5ae157)) * **models:** use StrEnum for error enums ([#12](#12)) ([3f41c9f](3f41c9f)) * **sdk-ts:** add conventional commits preset dependency ([#55](#55)) ([540fe9d](540fe9d)) * **sdk-ts:** export npm token for semantic-release npm auth ([#54](#54)) ([1b6b993](1b6b993)) * **sdk:** 54253 add steer action and example ([#38](#38)) ([bf2380a](bf2380a)) * **sdk:** a bug in docker file ([#46](#46)) ([12d1794](12d1794)) * **sdk:** Add step_name as parameter to control ([#25](#25)) ([19ade9d](19ade9d)) * **sdk:** emit observability events for SDK-evaluated controls and fix non_matches propagation ([#24](#24)) ([6a9da69](6a9da69)) * **sdk:** enforce UUID agent IDs ([#9](#9)) ([5ccdbd0](5ccdbd0)) * **sdk:** Fix logging ([#77](#77)) ([b1f078c](b1f078c)) * **sdk:** plugin to evaluator.. agent_protect to agent_control ([#88](#88)) ([fc9b088](fc9b088)) * **server:** enforce public-safe API error responses ([#20](#20)) ([e50d817](e50d817)) * **server:** Feature/56688 fix docker and create bash ([#45](#45)) ([7277e27](7277e27)) * **server:** Feature/56688 fix image bug ([#48](#48)) ([71e6b44](71e6b44)) * **server:** fix alembic migrations ([#47](#47)) ([c19c17c](c19c17c)) * **server:** reject initAgent UUID/name mismatch ([#13](#13)) ([19d61ff](19d61ff)) * tighten evaluation error handling and preserve control data ([52a1ef8](52a1ef8)) * **ui:** Fix UI and clients for simplified step schema ([#75](#75)) ([be2aaf0](be2aaf0)) * **ui:** json validation ([#10](#10)) ([a0cd5af](a0cd5af)) * **ui:** selector subpaths issue ([#34](#34)) ([79cb776](79cb776)) * **ui:** UI feedback fixes ([#27](#27)) ([6004761](6004761)) ### Code Refactoring * **evaluators:** rename plugin to evaluator throughout ([#81](#81)) ([0134682](0134682)) * **evaluators:** split into builtin + extra packages for PyPI ([#5](#5)) ([0e0a78a](0e0a78a)) * **models:** simplify step model and schema ([#70](#70)) ([4c1d637](4c1d637))

abhinav-galileo requested review from lan17, nachiket-galileo and namrataghadi-galileo and removed request for lan17, nachiket-galileo and namrataghadi-galileo January 31, 2026 08:06

abhinav-galileo marked this pull request as draft February 2, 2026 18:45

abhinav-galileo force-pushed the abhi/reorg_evaluators branch from a8c1800 to 109081a Compare February 3, 2026 13:07

abhinav-galileo changed the title ~~refactor(evaluators)!: reorganize evaluators into flat folder structure~~ refactor(evaluators)!: reorganize into builtin + extra tiers Feb 3, 2026

abhinav-galileo marked this pull request as ready for review February 3, 2026 14:26

abhinav-galileo requested review from lan17, nachiket-galileo and namrataghadi-galileo February 3, 2026 14:26

lan17 reviewed Feb 3, 2026

View reviewed changes

evaluators/builtin/README.md Outdated Show resolved Hide resolved

lan17 reviewed Feb 3, 2026

View reviewed changes

Makefile Show resolved Hide resolved

abhinav-galileo added 13 commits February 4, 2026 15:17

test(evaluators): use EvaluatorConfig in test_base.py MockConfig

2b3713e

Update MockConfig to extend EvaluatorConfig instead of pydantic's BaseModel for consistency with the new evaluator config pattern.

docs: fix typo in CONTRIBUTING.md evaluator example

ee4208d

Change config_model = AcmeToxicityConfig to AcmeToxicityEvaluatorConfig to match the class definition used in the example.

style(galileo): fix import sorting order

dc9fa1c

chore(hooks): add galileo extras to pre-push checks

3b1a1ed

chore(ci): remove test-extras workflow (covered by pre-push hook)

d8e7025

fix(galileo): fix api_key type annotation for mypy

e6dd371

abhinav-galileo added 3 commits February 4, 2026 15:17

fix(engine): sync __version__ with pyproject.toml (2.1.0)

3464063

abhinav-galileo force-pushed the abhi/reorg_evaluators branch from 7084632 to 3464063 Compare February 4, 2026 09:48

abhinav-galileo added 3 commits February 4, 2026 15:35

refactor: remove __version__ from all packages

3fbd870

Version is now single-sourced from pyproject.toml only. Use importlib.metadata.version("package-name") for runtime access.

abhinav-galileo requested a review from lan17 February 4, 2026 15:17

lan17 approved these changes Feb 4, 2026

View reviewed changes

abhinav-galileo added 2 commits February 5, 2026 13:10

feat: add dynamic __version__ to all packages using importlib.metadata

0b68789

Reads version from package metadata at runtime instead of hardcoding. This keeps pyproject.toml as the single source of truth for versions.

abhinav-galileo merged commit 0e0a78a into main Feb 5, 2026
4 of 5 checks passed

abhinav-galileo deleted the abhi/reorg_evaluators branch February 5, 2026 09:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(evaluators)!: reorganize into builtin + extra tiers#5

refactor(evaluators)!: reorganize into builtin + extra tiers#5
abhinav-galileo merged 21 commits intomainfrom
abhi/reorg_evaluators

abhinav-galileo commented Jan 31, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 3, 2026

Uh oh!

lan17 commented Feb 3, 2026

Uh oh!

Uh oh!

Uh oh!

lan17 commented Feb 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

abhinav-galileo commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Package Structure

Dependency Strategy

Naming Convention

Breaking Changes

1. Python Imports

2. Evaluator Names

3. Installation

Migration Guide

Test Plan

Uh oh!

codecov bot commented Feb 3, 2026

Uh oh!

lan17 commented Feb 3, 2026

Uh oh!

Uh oh!

Uh oh!

lan17 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abhinav-galileo commented Jan 31, 2026 •

edited

Loading

lan17 commented Feb 4, 2026 •

edited

Loading