New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TRT-1576: Fail if operator has Available=False unless in upgrade window #28735
base: master
Are you sure you want to change the base?
TRT-1576: Fail if operator has Available=False unless in upgrade window #28735
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: DennisPeriquet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/payload-job periodic-ci-openshift-release-master-ci-4.16-e2e-vsphere-ovn-upgrade This will see if my new exception allows the upgrade job to pass despite the single storage operator replica. |
@DennisPeriquet: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/272b5a20-0187-11ef-95a0-20b3d6d376a7-0 |
/payload-job periodic-ci-openshift-release-master-ci-4.16-e2e-vsphere-ovn-upgrade retry because the last one didn't really run |
@DennisPeriquet: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/61bc6960-0194-11ef-8313-791cce82a878-0 |
Job Failure Risk Analysis for sha: 63d0936
|
63d0936
to
3014822
Compare
Job Failure Risk Analysis for sha: 3014822
|
Job Failure Risk Analysis for sha: d950634
|
Job Failure Risk Analysis for sha: 2e4493a
|
/test unit |
/payload-job periodic-ci-openshift-release-master-ci-4.16-e2e-vsphere-ovn-upgrade |
@DennisPeriquet: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8a3d2950-0627-11ef-99cb-168bfde7d9b7-0 |
Job Failure Risk Analysis for sha: 80a02e7
|
/test unit |
/payload-job periodic-ci-openshift-release-master-ci-4.16-e2e-vsphere-ovn-upgrade |
@DennisPeriquet: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/6ff37c20-0690-11ef-86e4-c1c128b91d20-0 |
@DennisPeriquet: This pull request references TRT-1576 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@DennisPeriquet: This pull request references TRT-1576 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
696a8b8
to
efde445
Compare
Job Failure Risk Analysis for sha: efde445
|
657ec8b
to
89a3143
Compare
/test e2e-agnostic-ovn-cmd |
/test verify |
/test e2e-aws-ovn-cgroupsv2 |
Job Failure Risk Analysis for sha: b8aec3c
|
/payload-job periodic-ci-openshift-release-master-ci-4.16-e2e-vsphere-ovn-upgrade |
@DennisPeriquet: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4d404fc0-0b34-11ef-921f-8306786e2a9d-0 |
@DennisPeriquet: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
re: the last /payload with vsphere:
Those two events happened within the upgrade window (but the logs indicate no replicas, which I'm betting is why the test failed):
|
I'm not clear on why that run has an $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/openshift-origin-28735-ci-4.16-e2e-vsphere-ovn-upgrade/1787257947932332032/ar
tifacts/e2e-vsphere-ovn-upgrade/gather-extra/artifacts/nodes.json | jq -r '.items[].metadata.name'
ci-op-6wykcgk2-d2645-7nlts-master-0
ci-op-6wykcgk2-d2645-7nlts-master-1
ci-op-6wykcgk2-d2645-7nlts-master-2
ci-op-6wykcgk2-d2645-7nlts-worker-0-6bdsp
ci-op-6wykcgk2-d2645-7nlts-worker-0-8c5wm
ci-op-6wykcgk2-d2645-7nlts-worker-0-kxfhn And the cluster was configured for highly-available infrastructure (which includes the registry): $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/openshift-origin-28735-ci-4.16-e2e-vsphere-ovn-upgrade/1787257947932332032/artifacts/e2e-vsphere-ovn-upgrade/gather-must-gather/artifacts/must-gather.tar | tar -xOz registry-apps-build02-vmc-ci-openshift-org-ci-op-6wykcgk2-stable-sha256-e7b33149e705570ebcdcebe24c57af8336229175099fb5d53100330fd61015f1/cluster-scoped-resources/config.openshift.io/infrastructures/cluster.yaml | yaml2json | jq -r .status.infrastructureTopology
HighlyAvailable And yet: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/openshift-origin-28735-ci-4.16-e2e-vsphere-ovn-upgrade/1787257947932332032/artifacts/e2e-vsphere-ovn-upgrade/gather-extra/artifacts/deployments.json | jq -c '.items[] | select(.metadata.name == "image-registry").spec | {replicas, strategy}'
{"replicas":1,"strategy":{"type":"Recreate"}} I don't think the registry operator should be trying to wake the admin from sleep with an [edit: Ah, looks like the 1-replicas may be expected, and the |
For this test:
[bz-%v] clusteroperator/%v should not change condition/Available]
:Once the PR where storage operator stops reporting Available status merges, we can remove the exception for it.