Fix Phase.on_error signature by luisorofino · Pull Request #23574 · DataDog/integrations-core

luisorofino · 2026-05-04T10:20:29Z

What does this PR do?

Fixes three issues in the AI phase framework:

Phase.on_error signature — the override had (self, message: PhaseTrigger, error: Exception) but the base class BaseProcessor.on_error now only takes (self, error: MessageProcessingError | ProcessorHookError) (see Route event bus hook errors through on_error with fail_fast policy #23489 and Tighten on_error handler type to OrchestratorHookError #23575). The orchestrator's _task_wrapper always calls processor.on_error(wrapped_error) with a single argument, so the old signature meant Phase.on_error was never actually invoked — it silently failed with a TypeError that was swallowed by the fail_fast=False policy.
Pipeline abort propagation — PhaseOrchestrator.on_finalize was raising RuntimeError, but the base EventBusOrchestrator.finalize only lets FatalProcessingError and CancelledError pass through. Any other exception gets wrapped in OrchestratorHookError and swallowed by the fail_fast=False policy, so the abort error was silently dropped and run() appeared to succeed after a phase failure. Changed to raise FatalProcessingError so it propagates correctly. Updated the two affected tests accordingly.
PhaseTrigger.id simplification — the id was constructed as f"{phase_id}_finished_{message.id}", causing the string to accumulate the full causal chain of every preceding trigger (e.g. phase_B_finished_phase_A_finished_start). The id only needs to be unique within the queue, and since each phase emits at most one success trigger (guarded by _executed), f"{phase_id}_finished" is sufficient.

Motivation

The on_error mismatch was a latent bug introduced when the orchestrator was refactored to wrap errors before calling on_error. As a result, a failing phase would never write its failure checkpoint or emit PhaseFailedMessage, so the pipeline would silently stall instead of aborting. The other two changes were discovered while fixing this.

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

codecov · 2026-05-04T11:03:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (loa/openmetrics-ai-gen@9f40f50). Learn more about missing BASE report.

Additional details and impacted files

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

AAraKKe

Thanks! I am approving since the comment is not a blocker but I would like to undrestand a bit what the intention is with the final raise.

My Feedback Legend

Here's a quick guide to the prefixes I use in my comments:

praise: no action needed, just celebrate!
note: just a comment/information, no need to take any action.
question: I need clarification or I'm seeking to understand your approach.
nit: A minor, non-blocking issue (e.g., style, typo). Feel free to ignore.
suggestion: I'm proposing an improvement. This is optional but recommended.
request: A change I believe is necessary before this can be merged.

The only blocking comments are request, any other type of comment can be applied at discretion of the developer.

AAraKKe · 2026-05-04T13:28:02Z

    async def on_finalize(self, exception: Exception | None) -> None:
        if self._failed_phase is not None:
-            raise RuntimeError(
+            raise FatalProcessingError(


question: why are we raising the error again? This is the finalization step of the orchestrator. We can define the on_error hook in the orchestrator and see where we have failed and do whatever we decide there. Raising a FatalProcessingError again here does not make a lot of sense, I believe, since it is going to blow up at the end, nothing else can catch this.

Not sure if I am missing something but what is the intention of raising here? If the intention is to allow us to catch an error from the command when we launch the run method, maybe we can check whether on_finalize has been called with an exception right? If on_finalize does not receive an exception is that whatever happened we are ok with it. And then we can just printout an error message and raise again so we can catch it later if we need to.

Thank you for your comment!! The raise was redundant: when a phase fails, on_message_received raises FatalProcessingError, which propagates out of process_messages() and is re-raised in _entry_point's except block before finalize() is even called. The raise in on_finalize was just replacing one FatalProcessingError with another — the caller of run() sees an exception either way.

Fixed in 07a42b0: on_finalize now uses exception as the primary guard (if it's None, the run was clean) and logs the pipeline failure message instead of raising.

dd-octo-sts · 2026-05-04T13:47:32Z

Validation Report

All 20 validations passed.

Show details

Validation	Description	Status
`agent-reqs`	Verify check versions match the Agent requirements file	✅
`ci`	Validate CI configuration and Codecov settings	✅
`codeowners`	Validate every integration has a CODEOWNERS entry	✅
`config`	Validate default configuration files against spec.yaml	✅
`dep`	Verify dependency pins are consistent and Agent-compatible	✅
`http`	Validate integrations use the HTTP wrapper correctly	✅
`imports`	Validate check imports do not use deprecated modules	✅
`integration-style`	Validate check code style conventions	✅
`jmx-metrics`	Validate JMX metrics definition files and config	✅
`labeler`	Validate PR labeler config matches integration directories	✅
`legacy-signature`	Validate no integration uses the legacy Agent check signature	✅
`license-headers`	Validate Python files have proper license headers	✅
`licenses`	Validate third-party license attribution list	✅
`metadata`	Validate metadata.csv metric definitions	✅
`models`	Validate configuration data models match spec.yaml	✅
`openmetrics`	Validate OpenMetrics integrations disable the metric limit	✅
`package`	Validate Python package metadata and naming	✅
`readmes`	Validate README files have required sections	✅
`saved-views`	Validate saved view JSON file structure and fields	✅
`version`	Validate version consistency between package and changelog	✅

View full run

* Fix Phase.on_error signature and test_orchestrator * Tighten on_error's error type * Log instead of raise in on_finalize

Fix Phase.on_error signature and test_orchestrator

1bec3b9

dd-octo-sts Bot added the ddev label May 4, 2026

This comment has been minimized.

Sign in to view

luisorofino added the qa/skip-qa Automatically skip this PR for the next QA label May 4, 2026

luisorofino marked this pull request as ready for review May 4, 2026 10:28

luisorofino requested a review from a team as a code owner May 4, 2026 10:28

dd-octo-sts Bot added the team/agent-integrations label May 4, 2026

luisorofino added 2 commits May 4, 2026 15:03

Merge branch 'loa/openmetrics-ai-gen' into loa/phase-on-error

735c15d

Tighten on_error's error type

1e44a16

AAraKKe approved these changes May 4, 2026

View reviewed changes

Log instead of raise in on_finalize

07a42b0

luisorofino merged commit e2aefac into loa/openmetrics-ai-gen May 4, 2026
319 of 330 checks passed

luisorofino deleted the loa/phase-on-error branch May 4, 2026 13:48

luisorofino added a commit that referenced this pull request May 26, 2026

Fix Phase.on_error signature (#23574)

5bcbdeb

* Fix Phase.on_error signature and test_orchestrator * Tighten on_error's error type * Log instead of raise in on_finalize

luisorofino added a commit that referenced this pull request Jun 1, 2026

Fix Phase.on_error signature (#23574)

a656650

* Fix Phase.on_error signature and test_orchestrator * Tighten on_error's error type * Log instead of raise in on_finalize

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Phase.on_error signature#23574

Fix Phase.on_error signature#23574
luisorofino merged 4 commits into
loa/openmetrics-ai-genfrom
loa/phase-on-error

luisorofino commented May 4, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

codecov Bot commented May 4, 2026 •

edited

Loading

Uh oh!

AAraKKe left a comment

Uh oh!

AAraKKe May 4, 2026

Uh oh!

luisorofino May 4, 2026 •

edited

Loading

Uh oh!

dd-octo-sts Bot commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

luisorofino commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Review checklist (to be filled by reviewers)

Uh oh!

This comment has been minimized.

codecov Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

AAraKKe left a comment

Choose a reason for hiding this comment

Uh oh!

AAraKKe May 4, 2026

Choose a reason for hiding this comment

Uh oh!

luisorofino May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dd-octo-sts Bot commented May 4, 2026

Validation Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luisorofino commented May 4, 2026 •

edited

Loading

codecov Bot commented May 4, 2026 •

edited

Loading

luisorofino May 4, 2026 •

edited

Loading