Skip to content

Tag retried-test non-final attempts as skip in JUnit report#11443

Draft
mhdatie wants to merge 1 commit into
masterfrom
mo.atie/apmlp-1297-skip-on-retried
Draft

Tag retried-test non-final attempts as skip in JUnit report#11443
mhdatie wants to merge 1 commit into
masterfrom
mo.atie/apmlp-1297-skip-on-retried

Conversation

@mhdatie
Copy link
Copy Markdown
Contributor

@mhdatie mhdatie commented May 21, 2026

What Does This Do

Add JUnitReport.tagRetriedTests() and wire it into the ResultCollector post-processing pipeline between tagSyntheticFailures() and tagFinalStatuses().

The new method scans every <testcase> element and, when a later sibling shares the same (classname, name), marks the earlier one with dd_tags[test.final_status]=skip via the existing addFinalStatusProperty(..., APPEND_TO_TESTCASE) helper.

  • Net effect: only the final attempt of a retried test keeps its real status; earlier attempts are tagged skip.

Ordering matters — tagRetriedTests must run before tagFinalStatuses, otherwise the per-testcase fallback tagger would overwrite the skip with fail.

Motivation

When Develocity's testRetry plugin retries a failed test, the JUnit XML contains one <testcase> per attempt, all sharing the same (classname, name). CI treats the test as green if the final attempt passes, but Test Optimization was tagging the earlier failed attempts as final_status=fail and surfacing them as real failures.

Example trace where a retried-then-passed test shows up as a failure

Additional Notes

Alternatives considered

Path B — generalize and delete the initializationError branch from tagSyntheticFailures. The existing initializationError handling is a strict subset of
the new logic: same group-and-skip-all-but-last pattern, just restricted to one literal name. Folding it into tagRetriedTests would remove a few lines, but was
rejected because:

  1. The overlap is cost-free: addFinalStatusProperty short-circuits when a final_status property already exists, so the second tag against the same testcase is
    a no-op.
  2. The initializationError branch carries source-pinned links to JUnit 4 and Gradle that justify why that specific literal is treated as a framework synthetic. Deleting it sheds context that was deliberately added in Stop skipping "test exception" test cases, they are not synthetic #11427.

Scope

Touches only .gitlab/collect-result/ (the build-time JUnit XML post-processor that uploads results to Test Optimization). No agent / tracer code paths are affected — CI tooling, not user-visible.

Jira ticket: APMLP-1297

The Develocity testRetry plugin emits one <testcase> per attempt sharing
the same (classname, name); CI treats only the final attempt as
authoritative but Test Optimization was surfacing earlier failed
attempts as real failures. Mark non-final attempts with
final_status=skip in the JUnit XML post-processor so reports match what
CI considers the outcome.

Jira: APMLP-1297

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mhdatie mhdatie added tag: ai generated Largely based on code generated by an AI or LLM tag: no release notes Changes to exclude from release notes type: bug Bug report and fix comp: tooling Build & Tooling labels May 21, 2026
@mhdatie mhdatie requested a review from bric3 May 21, 2026 20:04
@mhdatie
Copy link
Copy Markdown
Contributor Author

mhdatie commented May 21, 2026

@bric3 Tagged you for early review to compare between this change and the alternative provided in the description before I open it up for review 🙇 .

@datadog-datadog-prod-us1

This comment has been minimized.

@bric3 bric3 requested a review from cbeauchesne May 22, 2026 15:58
Copy link
Copy Markdown
Contributor

@bric3 bric3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting a tad concerned by the lack of visibility on retries.

Also as a matter of followup on the initiative here's the previous work from @cbeauchesne

var reportChangedBeforeFinalStatus = report.addFileAttribute(sourceFile);
reportChangedBeforeFinalStatus |= report.normalizeStableTestNames();
report.tagSyntheticFailures();
report.tagRetriedTests();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: This happens after normalizeStableTestNames();, so basically two different tests could end up with the same name, and one can get silently skipped.

E.g.

  • localhost:12345 and final_status=fail => after normalization : localhost:PORT and this one would incorrectly switch to final_status=skip
  • localhost:23456 end final_status=pass => after normalization : localhost:PORT

Copy link
Copy Markdown
Contributor Author

@mhdatie mhdatie May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch..

  • Do we normalize localhost failures and then just look at the count in Test Optim.?
  • What do you think of excluding the normalized cases from this operation?

I'm getting a tad concerned by the lack of visibility on retries.

Can you elaborate? It would be great to autodetect retries. Are these manually detected by us today? (cc @cbeauchesne) - I know that Flaky tests are labelled in Datadog and from there we can detect if they were retried (grouped by name)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On catching retries proactively, I'm not sure if there are plans to auto-detect these retries by @DataDog/ci-app-libraries team.

I did some planning with Claude, and the suggestion was to:

  • Spend time understanding how detection can be implemented with Develocity's testRetry plugin (Gradle TestListener, JUnit 5 TestExecutionListener, etc)
  • Use CiVisibilityMetricCollector and CiVisibilityCountMetric since they have the metadata/tags to detect retries
  • Document ownership
Claude Details

Data model — trivial

Add a RetryReason.develocity enum value in internal-api/.../telemetry/tag/RetryReason.java. The existing TEST_EVENT_FINISHED count metric and Tags.TEST_RETRY_REASON span tag pick it up automatically.

Detection — the real work

Identifying that a given test execution is a Develocity retry requires one of:

  • A new instrumentation module hooking the plugin's retry executor class.
  • Extending the existing Gradle build-system instrumentation in agent-ci-visibility to detect the testRetry extension on the Test task and propagate a sysprop into forked test JVMs.
  • A same-session re-execution heuristic — fallback only, if neither of the above is viable.

Wiring

The resulting signal lands in TestEventsHandlerImpl.java:263, and must not override existing atr / efd / attemptToFix reasons.

Open decisions

  • Whether to also set IsRetry — probably yes.
  • Whether to set HasFailedAllRetries — probably no, since Develocity controls the retry budget out-of-process.

Validation

A smoke test under dd-smoke-tests/ with the plugin applied to a flaky test, plus a telemetry assertion that event_finished carries retry_reason:develocity_test_retry.

Ownership

RetryReason.java, TestEventsHandlerImpl.java, and anything under agent-ci-visibility/ are owned by @DataDog/ci-app-libraries — the change will require their review even if apm-lang-platform-java drives
it.

Recommendation

Do a short Phase-0 spike on the Develocity plugin's internals before committing to a detection approach.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mhdatie! Unfortunately what Claude suggested here seems related to the testing instrumentation inside the tracer itself. In our case, it wouldn't apply because we don't instrument the tracer's tests with the tracer itself. The test reporting is entirely based on the JUnitXML report generated by the testing framework. I've discussed with the team and I'm not sure there would be an easy way of detecting the retries on our side during ingestion (there are several limitations like the ingestion order not necessarily following the execution order, which could then incorrectly tag retries) unless we were able to include some additional information in the report to inform this.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the .gitlab/collect-result JUnit XML post-processor so that when the same test (classname, name) appears multiple times (as produced by Develocity’s testRetry), only the final attempt retains its computed status while earlier attempts are tagged with dd_tags[test.final_status]=skip to avoid surfacing them as real failures in Test Optimization.

Changes:

  • Adds JUnitReport.tagRetriedTests() to tag non-final retry attempts as skip.
  • Wires tagRetriedTests() into ResultCollector between tagSyntheticFailures() and tagFinalStatuses() to prevent later fallback tagging from overriding the skip status.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
.gitlab/collect-result/ResultCollector.java Calls tagRetriedTests() during report post-processing, before final status tagging.
.gitlab/collect-result/JUnitReport.java Implements tagRetriedTests() to identify duplicate (classname, name) testcases and tag earlier ones as skip.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +128 to +132
for (var i = 0; i < all.size(); i++) {
var current = all.get(i);
var classname = current.getAttribute("classname");
var name = current.getAttribute("name");
for (var j = i + 1; j < all.size(); j++) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will apply based on outcome of above convo #11443 (review)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: tooling Build & Tooling tag: ai generated Largely based on code generated by an AI or LLM tag: no release notes Changes to exclude from release notes type: bug Bug report and fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants