Skip to content

refactor(evaluators)!: reorganize into builtin + extra tiers#5

Merged
abhinav-galileo merged 21 commits intomainfrom
abhi/reorg_evaluators
Feb 5, 2026
Merged

refactor(evaluators)!: reorganize into builtin + extra tiers#5
abhinav-galileo merged 21 commits intomainfrom
abhi/reorg_evaluators

Conversation

@abhinav-galileo
Copy link
Collaborator

@abhinav-galileo abhinav-galileo commented Jan 31, 2026

Summary

Split evaluators into a two-tier package architecture:

  • builtin (agent-control-evaluators): Core infrastructure + regex, list, json, sql evaluators
  • extra/galileo (agent-control-evaluator-galileo): Luna2 evaluator (calls external Galileo API)

Key Changes

Package Structure

  • Move builtin evaluators to evaluators/builtin/
  • Create evaluators/extra/galileo/ as a separate package
  • Add entry points for plugin discovery (agent_control.evaluators)
  • Update workspace to include only builtin (extras excluded for performance)
  • Add CI workflow for testing extra packages (.github/workflows/test-extras.yml)
  • Add template scaffold for creating new evaluator packages (evaluators/extra/template/)

Dependency Strategy

  • SDK and server depend on agent-control-evaluators as a runtime dependency (not vendored)
  • This avoids duplicate module conflicts when galileo extras are installed
  • Server build script no longer vendors evaluators (only models + engine)

Naming Convention

  • External evaluator names now use dot notation instead of slash
  • Example: galileo.luna2 instead of galileo/luna2

Breaking Changes

1. Python Imports

# OLD (will break):
from agent_control_evaluators.galileo_luna2 import Luna2Evaluator

# NEW:
from agent_control_evaluator_galileo.luna2 import Luna2Evaluator

# OR via SDK (still works):
from agent_control.evaluators import Luna2Evaluator

2. Evaluator Names

# OLD:
evaluator_name = "galileo/luna2"

# NEW:
evaluator_name = "galileo.luna2"

3. Installation

# Builtin evaluators (automatic with SDK/server):
pip install agent-control-evaluators

# Galileo Luna2 (optional):
pip install agent-control-evaluator-galileo

# Or via convenience extra:
pip install agent-control-evaluators[galileo]

Migration Guide

  1. Install the new package: pip install agent-control-evaluator-galileo
  2. Update Python imports from agent_control_evaluators.galileo_luna2 to agent_control_evaluator_galileo.luna2
  3. Database migration required: Update controls where data.evaluator.name contains / to use .:
    UPDATE controls
    SET data = jsonb_set(data, '{evaluator,name}',
        to_jsonb(replace(data->'evaluator'->>'name', '/', '.')))
    WHERE data->'evaluator'->>'name' LIKE '%/%'
      AND data->'evaluator'->>'name' NOT LIKE '%:%';

Test Plan

  • All existing tests pass (771 total)
  • Lint passes for all packages
  • Type check passes for all packages
  • Luna2 smoke test correctly skips when package not installed
  • Entry points register builtin evaluators automatically

@codecov
Copy link

codecov bot commented Feb 3, 2026

The author of this PR, abhinav-galileo, is not an activated member of this organization on Codecov.
Please activate this user on Codecov to display this PR comment.
Coverage data is still being uploaded to Codecov.io for purposes of overall coverage calculations.
Please don't hesitate to email us at support@codecov.io with any questions.

@abhinav-galileo abhinav-galileo changed the title refactor(evaluators)!: reorganize evaluators into flat folder structure refactor(evaluators)!: reorganize into builtin + extra tiers Feb 3, 2026
@abhinav-galileo abhinav-galileo marked this pull request as ready for review February 3, 2026 14:26
@lan17
Copy link
Contributor

lan17 commented Feb 3, 2026

Posting review notes from my local pass:

  1. JSON evaluator extra-field check can false-positive when allow_extra_fields=false because the allow-list only includes field_types and required_fields; fields referenced via field_constraints or field_patterns are treated as “extra”. File: evaluators/builtin/src/agent_control_evaluators/json/evaluator.py (extra_paths logic).

  2. Docs/packaging mismatch: README suggests pip install agent-control-evaluators[galileo], but the [galileo] extra is commented out in evaluators/builtin/pyproject.toml, so that command currently fails. Suggest enabling the extra when published or pointing docs to agent-control-evaluator-galileo.

  3. Extras tests not wired: make test only runs builtin evaluators; evaluators/extra/galileo/tests aren’t exercised by default (and I don’t see a test-extras workflow). Consider adding a target/workflow or documenting the omission.

- Restructure evaluators into peer directories (regex, list, json, sql, galileo_luna2)
- Split each evaluator into config.py and evaluator.py
- Move Evaluator, EvaluatorMetadata, registry from models to evaluators package
- Rename luna2 to galileo_luna2 following provider_evaluatorname convention
- Move discovery and factory from engine to evaluators package
- Update engine to delegate to evaluators package
- Organize tests to mirror source structure (tests/json/, tests/sql/)
- Fix SDK __all__: remove duplicate "control", remove non-existent tool exports
- Update documentation with correct import paths
- Remove stale TODO comments and add docstrings to empty __init__.py

BREAKING CHANGE: Evaluator, EvaluatorMetadata, register_evaluator now
imported from agent_control_evaluators instead of agent_control_models
EvaluatorConfig now extends agent_control_models.base.BaseModel instead
of pydantic.BaseModel directly, inheriting standard model behavior:
- populate_by_name, use_enum_values, validate_assignment
- to_dict(), to_json(), from_dict(), from_json() helpers
- extra="ignore" for forward compatibility
Update MockConfig to extend EvaluatorConfig instead of pydantic's
BaseModel for consistency with the new evaluator config pattern.
- Fix EvaluatorConfig → EvaluatorSpec in examples and models README
- Fix luna2 → galileo-luna2 with proper config in customer_support example
- Fix luna2/ → galileo_luna2/ path in galileo example
- Rename luna2 entry point to galileo-luna2 in pyproject.toml
- Update CONTRIBUTING.md with flat directory structure and correct imports
- Update AGENTS.md with correct register_evaluator import source
- Add ensure_evaluators_discovered() call in EvaluatorSpec validation
- Remove stale TODOs in sdks/python/tests/conftest.py
- Add docstrings to evaluators/tests/{json,sql}/__init__.py
…cy API

BREAKING CHANGE: `parse_evaluator_ref()` removed, use `parse_evaluator_ref_full()` or `is_agent_scoped()` instead.

Evaluator naming conventions:
- Built-in: "regex", "list", "json", "sql" (no namespace)
- External: "galileo/luna2" (slash separator)
- Agent-scoped: "my-agent:custom" (colon separator)

Changes:
- Rename galileo-luna2 → galileo/luna2 throughout codebase
- Add ParsedEvaluatorRef dataclass with type detection
- Remove deprecated parse_evaluator_ref() tuple API
- Migrate endpoints to use parse_evaluator_ref_full() and is_agent_scoped()
- Standardize XEvaluator + XEvaluatorConfig naming in docs
- Fix pre-existing lint issues (import sorting, Union syntax, unused imports)
Change config_model = AcmeToxicityConfig to AcmeToxicityEvaluatorConfig
to match the class definition used in the example.
- Bump evaluators and engine versions from 0.1.0 to 2.1.0
- Align requires-python to >=3.12 (was >=3.10 in evaluators)
- Standardize license to Apache-2.0 (was MIT in evaluators)
- Align authors format to "Agent Control Team"
- Bump pydantic minimum to >=2.12.4 in evaluators
Split evaluators into two packages:
- builtin (`agent-control-evaluators`): Core infrastructure + regex, list, json, sql
- extra/galileo (`agent-control-evaluator-galileo`): Luna2 evaluator (calls external API)

BREAKING CHANGES:
- Luna2 import path changed from `agent_control_evaluators.galileo_luna2`
  to `agent_control_evaluator_galileo.luna2`
- External evaluator names use dot notation instead of slash
  (e.g., `galileo.luna2` instead of `galileo/luna2`)
- SDK and server now depend on `agent-control-evaluators` as a runtime
  dependency (not vendored) to avoid duplicate module conflicts

Key changes:
- Move builtin evaluators to `evaluators/builtin/`
- Create `evaluators/extra/galileo/` as separate package
- Add entry points for plugin discovery (`agent_control.evaluators`)
- Update workspace to include only builtin (extras excluded for perf)
- Add CI workflow for testing extra packages
- Add template scaffold for creating new evaluator packages
- Server build script no longer vendors evaluators
- Add ruff and mypy to galileo package dev dependencies
- Update CI workflow to use `uv sync --extra dev` instead of `uv pip install`
- Use `uv run --extra dev` to ensure dev tools are available
- Update template with same dev dependencies
Update all documentation and code references from `galileo/luna2`
to `galileo.luna2` to match the actual implementation. The dot
separator is used for external evaluators to distinguish from
agent-scoped evaluators (which use colon).

Files updated:
- docs/OVERVIEW.md, docs/REFERENCE.md - evaluator examples
- CONTRIBUTING.md - naming convention docs
- README.md, examples/ - usage examples
- UI evaluator definition and test fixtures
- Server evaluator_utils.py docstrings
- Evaluator _base.py docstring
…e structure

Bug fixes:
- fix(sql): use args.get() instead of find() for LIMIT/OFFSET to prevent
  subquery clauses from being attributed to outer queries
- fix(engine): bump dependency floor to >=3.0.0 for models and evaluators

Documentation updates:
- Update evaluators directory structure to reflect builtin/extra tiers
- Update external evaluator example to use separate package pattern
- Show both direct install and extras syntax for Luna-2
- Fix all outdated path references to use evaluators/builtin/

Package config:
- Add TODO comments for commented-out extras (tracking PyPI publish)
Version is now single-sourced from pyproject.toml only.
Use importlib.metadata.version("package-name") for runtime access.
…ds allow-list

When allow_extra_fields=false, fields referenced only in field_constraints
or field_patterns were incorrectly flagged as "extra fields". Now all four
config options (required_fields, field_types, field_constraints, field_patterns)
contribute to the allow-list.

Also fixes README to use direct package install (agent-control-evaluator-galileo)
instead of non-existent [galileo] extra, with TODO for enabling it post-publish.
Wire galileo tests into the build system without affecting the default
`make test` target. Developers can now run:
- `make test-extras` for extra evaluators only
- `make test-all` for core + extras
- `make galileo-test/lint/typecheck` for galileo specifically

Also adds pytest-cov to galileo dev dependencies for coverage support.
@abhinav-galileo abhinav-galileo requested a review from lan17 February 4, 2026 15:17
@lan17
Copy link
Contributor

lan17 commented Feb 4, 2026

To support agent-control-evaluators[xyz] installs:

  1. Enable extras on the builtin package

    • In evaluators/builtin/pyproject.toml, add entries under [project.optional-dependencies], e.g.
      • galileo = ["agent-control-evaluator-galileo>=3.0.0"]
      • nvidia = ["agent-control-evaluator-nvidia>=3.0.0"] (when published)
  2. Publish provider packages

    • Make sure the release workflow publishes agent-control-evaluator-<provider> packages before or alongside agent-control-evaluators, so the extras resolve on PyPI.
  3. Optional (if you want singular syntax)

    • If you prefer agent-control-evaluator[galileo], add a small meta package named agent-control-evaluator that depends on agent-control-evaluators and defines the same extras.

Right now the extras are commented out, so agent-control-evaluators[galileo] will not resolve until #1 + #2 are done.

Reads version from package metadata at runtime instead of hardcoding.
This keeps pyproject.toml as the single source of truth for versions.
- Add build_evaluators() and build_evaluator_galileo() to build script
- Add PyPI publish steps in dependency order: models -> evaluators -> sdk -> evaluator-galileo
- Enable [galileo] convenience extra on evaluators, SDK, and server
- Rename [luna2] extra to [galileo] for consistency with package naming
- Add uv source overrides for local galileo development
- Update documentation and examples to use [galileo] extra
@abhinav-galileo abhinav-galileo merged commit 0e0a78a into main Feb 5, 2026
4 of 5 checks passed
@abhinav-galileo abhinav-galileo deleted the abhi/reorg_evaluators branch February 5, 2026 09:26
galileo-automation pushed a commit that referenced this pull request Mar 4, 2026
## 1.0.0 (2026-03-04)

### ⚠ BREAKING CHANGES

* **server:** Feature/56688 fix image bug (#48)
* **sdk:** a bug in docker file (#46)
* **server:** Feature/56688 fix docker and create bash (#45)
* **evaluators:** Evaluator reorganization with new package structure

Package Structure:
- agent-control-evaluators (v3.0.0): core + regex, list, json, sql
- agent-control-evaluator-galileo (v3.0.0): Luna2 evaluator

Key Changes:
- Entry points for evaluator discovery (agent_control.evaluators)
- Dot notation for external evaluators (galileo.luna2 not galileo/luna2)
- Dynamic __version__ via importlib.metadata
- Server uses evaluators as runtime dep (no longer vendored)
- Release workflow publishes both packages to PyPI

Bug Fixes:
- JSON evaluator: field_constraints/field_patterns in extra-fields allow-list
- SQL evaluator: LIMIT/OFFSET bypass fix

Migration:
- Import: agent_control_evaluator_galileo.luna2 (not agent_control_evaluators.galileo_luna2)
- DB: UPDATE controls SET evaluator.name replace('/', '.')
* **server:** add time-series stats and split API endpoints (#6)
* **evaluators:** rename plugin to evaluator throughout  (#81)
* **models:** simplify step model and schema (#70)

### Features

* Add plugin auto-discovery via Python entry points ([#49](#49)) ([1521182](1521182))
* **docs:** add GitHub badges and CI coverage reporting ([#90](#90)) ([be1fa14](be1fa14))
* **evaluators:** add required_column_values for multi-tenant SQL validation ([#30](#30)) ([532386c](532386c))
* **sdk-ts:** automate semantic-release for npm publishing ([#52](#52)) ([2b43958](2b43958))
* **sdk:** Add PyPI packaging with semantic release ([#52](#52)) ([7c24f7f](7c24f7f))
* **sdk:** Auto-populate init() steps from [@control](https://github.com/control)() decorators ([#23](#23)) ([dc0f2a4](dc0f2a4))
* **sdk:** export ControlScope, ControlMatch, and EvaluatorResult models ([#18](#18)) ([0d49cad](0d49cad))
* **sdk:** Get Agent Controls from SDK Init ([#15](#15)) ([a485f93](a485f93))
* **sdk:** Refresh controls in a background loop ([#43](#43)) ([03f826d](03f826d))
* **sdk:** ship TypeScript SDK with deterministic method naming ([#32](#32)) ([a76e9b0](a76e9b0))
* **server:** add evaluator config store ([#78](#78)) ([cc14aa6](cc14aa6))
* **server:** add initAgent conflict_mode overwrite mode with SDK defaults ([#40](#40)) ([f3ed2b8](f3ed2b8))
* **server:** Add observability system for control execution tracking ([#44](#44)) ([fd0bddc](fd0bddc))
* **server:** add prometheus metrics for endpoints ([#68](#68)) ([775612c](775612c))
* **server:** add time-series stats and split API endpoints ([#6](#6)) ([a0fa597](a0fa597))
* **server:** hard-cut migrate to remove agent UUID ([#44](#44)) ([ee322c9](ee322c9))
* **server:** Optional Policy and many to many relationships ([#41](#41)) ([1a62746](1a62746))
* **ui:** add sql, luna2, json control forms and restructure the code ([#54](#54)) ([c4c1d4a](c4c1d4a))
* **ui:** allow to delete control ([#39](#39)) ([7dc4ca3](7dc4ca3))
* **ui:** Control Store Flow Updated ([#4](#4)) ([dda9f70](dda9f70))
* **ui:** stats dashboard ([#80](#80)) ([4cbb7fe](4cbb7fe))
* **ui:** Steps dropdown rendered based on api return values ([#36](#36)) ([a2aca43](a2aca43))
* **ui:** tests added and some minor ui changes, added error boundaries ([#61](#61)) ([009852b](009852b))
* **ui:** update agent control icon and favicon ([#42](#42)) ([19af8fa](19af8fa))

### Bug Fixes

* **ci:** Add ui scope to PR title validation ([#59](#59)) ([e0fdb52](e0fdb52))
* **ci:** correct galileo contrib path in release build script ([#51](#51)) ([2de6013](2de6013))
* **ci:** Enable pr title on prs ([#56](#56)) ([3d8b5fe](3d8b5fe))
* **ci:** Fix release ([#11](#11)) ([9dd3dd7](9dd3dd7))
* **ci:** Use galileo-automation bot for releases ([#57](#57)) ([bc8eea0](bc8eea0))
* **docs:** Add Example for Evaluator Extension ([#3](#3)) ([c2a70b3](c2a70b3))
* **docs:** add setup script ([#49](#49)) ([7a212c3](7a212c3))
* **docs:** Clean up Protect  ([#76](#76)) ([99c16fd](99c16fd))
* **docs:** Fix Examples for LangGraph ([#64](#64)) ([23b30ae](23b30ae))
* **docs:** Improve documentation for open source release ([#47](#47)) ([9018fb3](9018fb3))
* **docs:** Remove old/unused examples ([#66](#66)) ([f417781](f417781))
* **docs:** Update Contributing Guide ([#8](#8)) ([10b34c8](10b34c8))
* **docs:** Update readme  ([#37](#37)) ([7531d83](7531d83))
* **docs:** Update README ([#2](#2)) ([379bb15](379bb15))
* **examples:** Control sets cleanup with signed ([#65](#65)) ([af7b5fb](af7b5fb))
* **examples:** Update crew ai example to use evaluator ([#93](#93)) ([1c65084](1c65084))
* **infra:** Add plugins directory to Dockerfile ([#58](#58)) ([171d459](171d459))
* **infra:** install engine/evaluators in server image ([#14](#14)) ([d5ae157](d5ae157))
* **models:** use StrEnum for error enums ([#12](#12)) ([3f41c9f](3f41c9f))
* **sdk-ts:** add conventional commits preset dependency ([#55](#55)) ([540fe9d](540fe9d))
* **sdk-ts:** export npm token for semantic-release npm auth ([#54](#54)) ([1b6b993](1b6b993))
* **sdk:** 54253 add steer action and example ([#38](#38)) ([bf2380a](bf2380a))
* **sdk:** a bug in docker file ([#46](#46)) ([12d1794](12d1794))
* **sdk:** Add step_name as parameter to control ([#25](#25)) ([19ade9d](19ade9d))
* **sdk:** emit observability events for SDK-evaluated controls and fix non_matches propagation ([#24](#24)) ([6a9da69](6a9da69))
* **sdk:** enforce UUID agent IDs ([#9](#9)) ([5ccdbd0](5ccdbd0))
* **sdk:** Fix logging  ([#77](#77)) ([b1f078c](b1f078c))
* **sdk:** plugin to evaluator.. agent_protect to agent_control ([#88](#88)) ([fc9b088](fc9b088))
* **server:** enforce public-safe API error responses ([#20](#20)) ([e50d817](e50d817))
* **server:** Feature/56688 fix docker and create bash ([#45](#45)) ([7277e27](7277e27))
* **server:** Feature/56688 fix image bug ([#48](#48)) ([71e6b44](71e6b44))
* **server:** fix alembic migrations ([#47](#47)) ([c19c17c](c19c17c))
* **server:** reject initAgent UUID/name mismatch ([#13](#13)) ([19d61ff](19d61ff))
* tighten evaluation error handling and preserve control data ([52a1ef8](52a1ef8))
* **ui:** Fix UI and clients for simplified step schema ([#75](#75)) ([be2aaf0](be2aaf0))
* **ui:** json validation ([#10](#10)) ([a0cd5af](a0cd5af))
* **ui:** selector subpaths issue ([#34](#34)) ([79cb776](79cb776))
* **ui:** UI feedback fixes ([#27](#27)) ([6004761](6004761))

### Code Refactoring

* **evaluators:** rename plugin to evaluator throughout  ([#81](#81)) ([0134682](0134682))
* **evaluators:** split into builtin + extra packages for PyPI ([#5](#5)) ([0e0a78a](0e0a78a))
* **models:** simplify step model and schema ([#70](#70)) ([4c1d637](4c1d637))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants