Skip to content

Fix Phase.on_error signature#23574

Merged
luisorofino merged 4 commits into
loa/openmetrics-ai-genfrom
loa/phase-on-error
May 4, 2026
Merged

Fix Phase.on_error signature#23574
luisorofino merged 4 commits into
loa/openmetrics-ai-genfrom
loa/phase-on-error

Conversation

@luisorofino
Copy link
Copy Markdown
Contributor

@luisorofino luisorofino commented May 4, 2026

What does this PR do?

Fixes three issues in the AI phase framework:

  1. Phase.on_error signature — the override had (self, message: PhaseTrigger, error: Exception) but the base class BaseProcessor.on_error now only takes (self, error: MessageProcessingError | ProcessorHookError) (see Route event bus hook errors through on_error with fail_fast policy #23489 and Tighten on_error handler type to OrchestratorHookError #23575). The orchestrator's _task_wrapper always calls processor.on_error(wrapped_error) with a single argument, so the old signature meant Phase.on_error was never actually invoked — it silently failed with a TypeError that was swallowed by the fail_fast=False policy.

  2. Pipeline abort propagationPhaseOrchestrator.on_finalize was raising RuntimeError, but the base EventBusOrchestrator.finalize only lets FatalProcessingError and CancelledError pass through. Any other exception gets wrapped in OrchestratorHookError and swallowed by the fail_fast=False policy, so the abort error was silently dropped and run() appeared to succeed after a phase failure. Changed to raise FatalProcessingError so it propagates correctly. Updated the two affected tests accordingly.

  3. PhaseTrigger.id simplification — the id was constructed as f"{phase_id}_finished_{message.id}", causing the string to accumulate the full causal chain of every preceding trigger (e.g. phase_B_finished_phase_A_finished_start). The id only needs to be unique within the queue, and since each phase emits at most one success trigger (guarded by _executed), f"{phase_id}_finished" is sufficient.

Motivation

The on_error mismatch was a latent bug introduced when the orchestrator was refactored to wrap errors before calling on_error. As a result, a failing phase would never write its failure checkpoint or emit PhaseFailedMessage, so the pipeline would silently stall instead of aborting. The other two changes were discovered while fixing this.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@dd-octo-sts dd-octo-sts Bot added the ddev label May 4, 2026
@datadog-prod-us1-3

This comment has been minimized.

@luisorofino luisorofino added the qa/skip-qa Automatically skip this PR for the next QA label May 4, 2026
@luisorofino luisorofino marked this pull request as ready for review May 4, 2026 10:28
@luisorofino luisorofino requested a review from a team as a code owner May 4, 2026 10:28
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (loa/openmetrics-ai-gen@9f40f50). Learn more about missing BASE report.

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Collaborator

@AAraKKe AAraKKe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I am approving since the comment is not a blocker but I would like to undrestand a bit what the intention is with the final raise.

My Feedback Legend

Here's a quick guide to the prefixes I use in my comments:

praise: no action needed, just celebrate!
note: just a comment/information, no need to take any action.
question: I need clarification or I'm seeking to understand your approach.
nit: A minor, non-blocking issue (e.g., style, typo). Feel free to ignore.
suggestion: I'm proposing an improvement. This is optional but recommended.
request: A change I believe is necessary before this can be merged.

The only blocking comments are request, any other type of comment can be applied at discretion of the developer.

Comment thread ddev/src/ddev/ai/phases/orchestrator.py Outdated
async def on_finalize(self, exception: Exception | None) -> None:
if self._failed_phase is not None:
raise RuntimeError(
raise FatalProcessingError(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why are we raising the error again? This is the finalization step of the orchestrator. We can define the on_error hook in the orchestrator and see where we have failed and do whatever we decide there. Raising a FatalProcessingError again here does not make a lot of sense, I believe, since it is going to blow up at the end, nothing else can catch this.

Not sure if I am missing something but what is the intention of raising here? If the intention is to allow us to catch an error from the command when we launch the run method, maybe we can check whether on_finalize has been called with an exception right? If on_finalize does not receive an exception is that whatever happened we are ok with it. And then we can just printout an error message and raise again so we can catch it later if we need to.

Copy link
Copy Markdown
Contributor Author

@luisorofino luisorofino May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment!! The raise was redundant: when a phase fails, on_message_received raises FatalProcessingError, which propagates out of process_messages() and is re-raised in _entry_point's except block before finalize() is even called. The raise in on_finalize was just replacing one FatalProcessingError with another — the caller of run() sees an exception either way.

Fixed in 07a42b0: on_finalize now uses exception as the primary guard (if it's None, the run was clean) and logs the pipeline failure message instead of raising.

@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 4, 2026

Validation Report

All 20 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and Codecov settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

@luisorofino luisorofino merged commit e2aefac into loa/openmetrics-ai-gen May 4, 2026
319 of 330 checks passed
@luisorofino luisorofino deleted the loa/phase-on-error branch May 4, 2026 13:48
luisorofino added a commit that referenced this pull request May 26, 2026
* Fix Phase.on_error signature and test_orchestrator

* Tighten on_error's error type

* Log instead of raise in on_finalize
luisorofino added a commit that referenced this pull request Jun 1, 2026
* Fix Phase.on_error signature and test_orchestrator

* Tighten on_error's error type

* Log instead of raise in on_finalize
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ddev qa/skip-qa Automatically skip this PR for the next QA team/agent-integrations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants