Skip to content

CMP-4115: Fix ARF report PVC checker timeout in e2e test#1093

Merged
rhmdnd merged 2 commits intomasterfrom
fix-arf-report-test-timeout
Feb 28, 2026
Merged

CMP-4115: Fix ARF report PVC checker timeout in e2e test#1093
rhmdnd merged 2 commits intomasterfrom
fix-arf-report-test-timeout

Conversation

@Vincent056
Copy link

@Vincent056 Vincent056 commented Feb 24, 2026

Summary

  • Increase AssertARFReportExistsInPVC poll timeout from 10 seconds to 2 minutes (with RetryInterval), aligning it with other framework polling operations. The 10s timeout was insufficient for the checker pod to schedule, pull its image, and mount the PVC — especially when the result-server pod is still terminating after scale-down.
  • Add diagnostic logging to AssertARFReportExistsInPVC (pod phase on each poll, timeout details) and TestSingleScanWithStorageSucceeds (step-by-step t.Logf messages) to make future failures easier to triage.
  • Enhance the checker container command to output a verbose ls -la before the grep, so pod logs show the actual PVC contents on failure.
  • Update e2e helper pod images from ubi8/ubi-minimal to ubi9/ubi-minimal.

@openshift-ci openshift-ci bot requested review from rhmdnd and yuumasato February 24, 2026 16:59
@Vincent056 Vincent056 force-pushed the fix-arf-report-test-timeout branch from 9a9a4b2 to 00c12f9 Compare February 24, 2026 17:01
The AssertARFReportExistsInPVC helper used a hardcoded 10-second timeout
which is insufficient for the checker pod to schedule, pull its image,
mount the PVC (which may still be releasing from the result server
scale-down), and run. Increase the timeout to 2 minutes with the
standard RetryInterval to match other polling operations in the
framework.

Also add diagnostic logging throughout TestSingleScanWithStorageSucceeds
and AssertARFReportExistsInPVC to make future timeout failures easier
to diagnose, including the checker pod's phase at each poll iteration
and a verbose ls command in the checker container.
@github-actions
Copy link

🤖 To deploy this PR, run the following command:

make catalog-deploy CATALOG_IMG=ghcr.io/complianceascode/compliance-operator-catalog:1093-9a9a4b27151c026ffc8b301190de40d800974809

@github-actions
Copy link

🤖 To deploy this PR, run the following command:

make catalog-deploy CATALOG_IMG=ghcr.io/complianceascode/compliance-operator-catalog:1093-00c12f9ae5dabfe2d909c8c7831b191e8667ab45

@github-actions
Copy link

🤖 To deploy this PR, run the following command:

make catalog-deploy CATALOG_IMG=ghcr.io/complianceascode/compliance-operator-catalog:1093-44d0657ff55ba3ed748ad34acc2cc2f7dc76a2d3

@Vincent056
Copy link
Author

/retest

@Vincent056 Vincent056 changed the title Fix ARF report PVC checker timeout in e2e test CMP-4115: Fix ARF report PVC checker timeout in e2e test Feb 26, 2026
@openshift-ci-robot
Copy link
Collaborator

@Vincent056: This pull request references CMP-4115 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Summary

  • Increase AssertARFReportExistsInPVC poll timeout from 10 seconds to 2 minutes (with RetryInterval), aligning it with other framework polling operations. The 10s timeout was insufficient for the checker pod to schedule, pull its image, and mount the PVC — especially when the result-server pod is still terminating after scale-down.
  • Add diagnostic logging to AssertARFReportExistsInPVC (pod phase on each poll, timeout details) and TestSingleScanWithStorageSucceeds (step-by-step t.Logf messages) to make future failures easier to triage.
  • Enhance the checker container command to output a verbose ls -la before the grep, so pod logs show the actual PVC contents on failure.
  • Update e2e helper pod images from ubi8/ubi-minimal to ubi9/ubi-minimal.

Test plan

  • Run TestSingleScanWithStorageSucceeds e2e test and verify it passes
  • Confirm new log output appears in test output (scan name, phase transitions, checker pod status)
  • Verify no regression in other storage-related e2e tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

t.Parallel()
f := framework.Global
scanName := framework.GetObjNameFromTest(t)
t.Logf("Creating ComplianceScan %s with storage size 2Gi", scanName)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - this should get logged with TestSingleScanWithStorageSucceeds when running the test in verbose mode.

Go's testing framework already prefixes log output with the test
name in verbose mode, so the hardcoded function-name prefix is
redundant.

Made-with: Cursor
@github-actions
Copy link

🤖 To deploy this PR, run the following command:

make catalog-deploy CATALOG_IMG=ghcr.io/complianceascode/compliance-operator-catalog:1093-e9239ef212a6ba6718ef89e6d81a5f7d18ad188f

@openshift-ci
Copy link

openshift-ci bot commented Feb 26, 2026

@Vincent056: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-rosa e9239ef link true /test e2e-rosa
ci/prow/e2e-aws-serial e9239ef link true /test e2e-aws-serial

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@rhmdnd rhmdnd added this to the 1.9.0 milestone Feb 27, 2026
Copy link
Collaborator

@rhmdnd rhmdnd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The serial failure looks unrelated and this fixes the issues in the parallel test.

Image: "registry.access.redhat.com/ubi8/ubi-minimal",
Command: []string{"/bin/bash", "-c", "ls /scan-results/0 2>/dev/null | grep -q '.xml.bzip2' && exit 0 || exit 1"},
Image: "registry.access.redhat.com/ubi9/ubi-minimal",
Command: []string{"/bin/bash", "-c", "ls -la /scan-results/0 2>&1 || echo 'directory /scan-results/0 not found'; ls /scan-results/0 2>/dev/null | grep -q '.xml.bzip2' && exit 0 || exit 1"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could trade this long line for a script that's easier to read, but we can do that in a follow up since getting this in fixes issues with CI.

@openshift-ci
Copy link

openshift-ci bot commented Feb 27, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rhmdnd, Vincent056

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhmdnd
Copy link
Collaborator

rhmdnd commented Feb 27, 2026

/lgtm

@rhmdnd
Copy link
Collaborator

rhmdnd commented Feb 27, 2026

/docs-approved since this is a e2e testing stability fix.

@rhmdnd
Copy link
Collaborator

rhmdnd commented Feb 27, 2026

The serial failure looks unrelated and this fixes the issues in the parallel test.

The serial test failure appears to be caused by a race condition in the profile parsing while testing the update path for switching profile bundle images. I think we should investigate this in a separate PR, and would be good to understand what's happening there so we can make the tests more stable.

@rhmdnd rhmdnd merged commit 2bae8b1 into master Feb 28, 2026
19 of 23 checks passed
@rhmdnd rhmdnd deleted the fix-arf-report-test-timeout branch February 28, 2026 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants