Skip to content

feat(crew): add consensual process with pluggable consensus engine#5691

Closed
gkotsia wants to merge 1 commit into
crewAIInc:mainfrom
gkotsia:feat/consensual-process
Closed

feat(crew): add consensual process with pluggable consensus engine#5691
gkotsia wants to merge 1 commit into
crewAIInc:mainfrom
gkotsia:feat/consensual-process

Conversation

@gkotsia
Copy link
Copy Markdown

@gkotsia gkotsia commented May 3, 2026

Title: feat(crew): add consensual process with pluggable consensus engine

Labels: llm-generated (required — this PR was authored with AI assistance)

Why merge this

Process.consensual has been a TODO in process.py since the original three-process design. This PR ships it, with three properties that matter to maintainers and users:

  1. Removes a single point of failure. The manager-LLM is the only path in CrewAI today for selecting a handler dynamically; if it picks badly, the whole crew degrades. A vote across the agents themselves is auditable (every ranking is logged), resists single-agent error (a majority outvotes one bad ranking), and the aggregation step is deterministic for a given set of inputs — useful for debugging and replay even though the inputs themselves come from stochastic LLM calls.
  2. Opens a plugin ecosystem with zero new CrewAI dependencies. A ConsensusEngine Protocol + entry-point discovery lets third-party libraries plug in via pip install. The reference engine — Snowveil, a probabilistic Borda-CHB protocol — is published on PyPI today; the same pattern accommodates future plugins (Ranked Pairs, weighted voting, capability-based scoring, etc.). CrewAI itself ships only MajorityVoteConsensus — no new runtime imports, no version constraints to maintain.
  3. Removes the manager-LLM dependency. Process.hierarchical requires configuring a separate manager_llm (typically a stronger, costlier model) to dispatch each unowned task. Process.consensual polls the existing agents in parallel instead — no extra model to configure, audit, or pay separately for. Trade-off: more total LLM calls per task (N agent rankings vs 1 manager call), but on the agents' existing model configs and in parallel — net spend depends on which side has the more expensive model.

Backward-compatible. Existing crews are untouched. The new process is opt-in (process=Process.consensual), the new field is opt-in (consensus=... defaults to None), and unmodified Protocol clients continue to work.

What the user sees

from crewai import Crew, Process

# Default — works out of the box
crew = Crew(agents=..., tasks=..., process=Process.consensual)

# Or plug in a richer engine via string shorthand (after `pip install snowveil`)
crew = Crew(agents=..., tasks=..., process=Process.consensual, consensus="snowveil")

# Or pass an instance for custom config
from snowveil.integrations.crewai import SnowveilConsensus
crew = Crew(..., consensus=SnowveilConsensus(config=...))

Snowveil is the reference third-party engine — a probabilistic ranked-preference protocol from arxiv:2512.18444 (Kotsialou). Already published on PyPI (pip install snowveil); works against this PR today.

Summary of changes

  • Process.consensual — implements the third process mode. Tasks without an explicit agent are dispatched by polling every other agent for a ranked preference and aggregating ballots.
  • ConsensusEngine Protocol + MajorityVoteConsensus default@runtime_checkable, typed, pluggable. CrewAI ships only the trivial baseline; richer engines live in third-party packages.
  • Plugin discovery (discover_engines()) — supports both Python entry points (crewai.consensus_engines group) and a small built-in fallback registry. Crew(consensus="snowveil") resolves automatically when Snowveil is installed; broken plugins log a warning and skip rather than crash an unrelated crew.
  • Prompt-injection hardening — task descriptions are wrapped in <task> tags, length-capped at 2000 chars, and explicitly marked as untrusted input. Centralised in build_handler_ranking_prompt() so all consensus engines share the same hardening.
  • Self-promotion bias removal — voters rank only other agents and pin themselves last to keep ballots complete without skewing the aggregator.

What this PR is not about

Cross-host or cross-organisational crews. CrewAI today is a single-process framework — every Crew runs in one Python address space. Process.consensual operates within one crew's agents and uses Snowveil's in-process mode (InMemoryTransport); it does not touch the network. A future integration could use Snowveil's distributed WebSocketTransport to enable federated decision-making across organisations or hosts, but that's a separate design conversation (likely a FederatedCrew primitive, not a Process.consensual configuration) and is intentionally out of scope here.

Files

File Status Lines Notes
lib/crewai/src/crewai/consensus.py new 289 Protocol, default engine, plugin discovery, parser, prompt builder. Module docstring includes a Snowveil wiring snippet so help(crewai.consensus) surfaces the integration path.
lib/crewai/src/crewai/crew.py +166 / −1 New consensus field + validator (instance or string name), _run_consensual_process, _collect_handler_rankings (parallel via ThreadPoolExecutor), _agent_by_role, _require_unique_agent_roles.
lib/crewai/src/crewai/process.py +1 / −1 Enables Process.consensual enum value.
lib/crewai/tests/test_consensus.py new 728 45 tests covering the consensus module, plugin discovery, and Process.consensual end-to-end.

Total: ~1,184 insertions, 2 deletions across 4 files (uv.lock excluded).

Design notes

  • consensus: Any field type. Pydantic can't generate a schema for a Protocol, so the field is annotated Any and validated structurally at runtime via isinstance against the @runtime_checkable Protocol. Strings are resolved by name first.
  • Two-pass plugin discovery. discover_engines() iterates importlib.metadata.entry_points(group="crewai.consensus_engines") first (the future path for any plugin), then merges in _KNOWN_ENGINE_IMPORT_PATHS (a small dict — currently just snowveil, which was published before adopting the entry-point convention). Entry points always win. Failed loads log a WARNING and are skipped — a broken third-party engine never crashes an unrelated crew. Cached via functools.cache.
  • Quorum. _MIN_RANKING_RATIO = 0.5. If fewer than half of agents return a parseable ballot, _collect_handler_rankings raises rather than pick a handler from a tiny minority.
  • Parallel ranking. Agents are polled concurrently via ThreadPoolExecutor since agent.execute_task is synchronous.
  • Deliberate duplication. parse_role_ranking is algorithmically equivalent to a parser in Snowveil; CrewAI cannot depend on Snowveil, so the duplication is intentional.

Test plan

45 tests in lib/crewai/tests/test_consensus.py, all passing locally (~1s, no network):

  • MajorityVoteConsensus — single voter, majority winner, candidate-order tie-break (and reversed), empty rankings, empty ballot, unknown candidate, runtime_checkable Protocol matching.
  • _validate_ballots — accepts complete ballot, rejects empty rankings / per-voter ballot / unknown candidate.
  • parse_role_ranking — strict JSON, JSON in surrounding text, first-appearance fallback, partial JSON falls through, unparseable raises, partial text match raises.
  • build_handler_ranking_prompt — task and roles included, marked UNTRUSTED, length-capped, empty description handled.
  • Consensus field validator — default is None; accepts engine instance; accepts string name; rejects non-engine, unknown name (with installed-engines list), and empty string (dedicated error); instance path does not call discover_engines.
  • discover_engines() happy paths — built-in majority always present; entry points discovered; fallback registry resolves when module importable; cache returns same dict until cleared; two named plugins coexist; entry points override fallback for the same name.
  • discover_engines() defensive paths — fallback skipped silently when not installed; fallback raising non-ImportError logs warning; fallback missing attribute logs warning; entry-point load raising logs warning; entry point returning a non-class is rejected with a warning; duplicate entry-point names log a collision warning.
  • Process.consensual — unanimous winner assigned, explicit task.agent not overridden, duplicate roles raise, low quorum raises, custom ConsensusEngine honoured over default.
  • uv run ruff check lib/ clean.
  • uv run ruff format --check lib/ clean.
  • uv run mypy lib/crewai/ — not verified locally; uv sync fails on macOS x86_64 because lancedb (pinned >=0.29.2,<0.30.1) ships no x86_64 macOS wheel. Direct mypy on consensus.py reports no issues; relying on CI mypy across 3.10–3.13 for the rest.
  • pytest lib/crewai/tests/test_consensus.py — 45 passed locally.

Open questions / follow-ups

  • Should consensus become a typed field once Pydantic gains better Protocol support, or stay Any with the runtime validator?
  • Should the default be MajorityVoteConsensus, or should Process.consensual require an explicit engine to avoid surprising users with naive plurality?
  • Mintlify docs under docs/en/concepts/processes.mdx (and translations in ar, ko, pt-BR) are not in this PR — tracked separately.
  • A runnable end-to-end sample using Snowveil will follow as a separate submission to crewAIInc/crewAI-examples, matching upstream's separation of core and samples.

@greysonlalonde
Copy link
Copy Markdown
Contributor

Hey, two things:

  • this is too large a PR for us to consider
  • please create an issue discussing or requesting this feature prior to creating a PR

@gkotsia
Copy link
Copy Markdown
Author

gkotsia commented May 4, 2026

@greysonlalonde Ok, I've opened the related issue here: #5708

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants