qa/cephadm: start upgrade tests from quincy #52881

adk3798 · 2023-08-08T16:37:28Z

Now that reef is released, on main we should only need to start our upgrade tests from quincy. This PR is for changing the starting version of upgrade tests that run as part of the orch/cephadm suite (although one is symlinked from fs suite)

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows

adk3798 · 2023-08-08T16:38:25Z

@dparmar18 @batrick I made a best effort attempt at updating the start version for the mds upgrade sequence tests. Would appreciate feedback on the commit updating those tests.

qa/suites/fs/upgrade/mds_upgrade_sequence/tasks/0-from/v17.2.0.yaml

batrick

Please also add qa/suites/fs/upgrade/mds_upgrade_sequence/tasks/0-from/reef.yaml too.

We should have had quincy already but it was forgotten. Let's not forget reef :)

adk3798 · 2023-08-09T21:03:11Z

Please also add qa/suites/fs/upgrade/mds_upgrade_sequence/tasks/0-from/reef.yaml too.

We should have had quincy already but it was forgotten. Let's not forget reef :)

added in a reef yaml

adk3798 · 2023-08-15T19:24:31Z

The regular upgrade and mgr-nfs-upgrade are working here but mds_upgrade_sequence is not. The one from reef is something on the cephadm side. We need to implement a way to pull in the new compiled cephadm package as the current curl from git doesn't work for reef onward. For the quincy start point it seems the umount it does at the end (e.g. sudo umount /home/ubuntu/cephtest/mnt.0) is hanging the same way it did when the start point of the test was pacific. That will have to be looked at and fixed as a separate issue to this PR I think.

Some of the failed quincy start point mds upgrade sequence runs
https://pulpito.ceph.com/adking-2023-08-15_13:44:09-orch:cephadm-wip-adk-testing-2023-08-14-1902-distro-default-smithi/7368074/
https://pulpito.ceph.com/adking-2023-08-15_13:44:09-orch:cephadm-wip-adk-testing-2023-08-14-1902-distro-default-smithi/7368099

dparmar18 · 2023-08-16T06:59:25Z

For the quincy start point it seems the umount it does at the end (e.g. sudo umount /home/ubuntu/cephtest/mnt.0) is hanging the same way it did when the start point of the test was pacific.

Do we have any tracker for this or any logs to look at?

adk3798 · 2023-08-16T15:46:11Z

For the quincy start point it seems the umount it does at the end (e.g. sudo umount /home/ubuntu/cephtest/mnt.0) is hanging the same way it did when the start point of the test was pacific.

Do we have any tracker for this or any logs to look at?

Haven't made a tracker yet. The only logs would be what you can get from the runs I linked.

adk3798 · 2023-08-21T14:13:52Z

qa/suites/fs/upgrade/mds_upgrade_sequence/tasks/0-from/reef.yaml

+- cephadm:
+    image: quay.ceph.io/ceph-ci/ceph:reef
+    roleless: true
+    compiled_cephadm_branch: reef


the good news is this is actually working in terms of pulling in the reef binary. The bad news is the upgrade still fails with

Upgrade: Paused due to UPGRADE_BAD_TARGET_VERSION: Upgrade: cannot upgrade/downgrade to 18.0.0-5596-gdb1309a8

I think because it considers an upgrade from 18.2.0 -> 18.0.0 an unsupported downgrade. Pretty sure we didn't ad quincy as the new basedpoint for upgrades in the reef cycle until main was already reporting v18 so we didn't have this issue. Will need to see if there's some workaround we can use for this case.

The mds_upgrade_sequence from quincy passed as well https://pulpito.ceph.com/adking-2023-08-19_17:52:39-orch:cephadm-wip-adk-testing-2023-08-19-1107-distro-default-smithi/7373907. Could also consider dropping the reef start point temporarily to get the rest of this in and then come back when we either have a good workaround or main starts reporting v19

Decided to move the reef start point work into #53105 so that we can get this through to at least get the upgrades to start from quincy instead of pacific

We're now past the reef release, so main is now what will become squid and we should only be testing upgrades to squid from quincy onward Signed-off-by: Adam King <adking@redhat.com>

Now that we're post reef release, the upgrade tests on main should be starting their upgrades from quincy rather than pacific Signed-off-by: Adam King <adking@redhat.com>

Now that reef has been released, on main we only need to test upgrades starting from quincy and upgrades from pacific are no longer valid Signed-off-by: Adam King <adking@redhat.com>

adk3798 · 2023-09-11T14:17:36Z

https://pulpito.ceph.com/adking-2023-09-07_12:40:41-orch:cephadm-wip-adk-testing-2023-09-06-1611-distro-default-smithi/

3 failures

1 failure in test_nfs task. This test had been blocked from running properly for a while due to https://tracker.ceph.com/issues/55986 which was recently resolved. It seems that it's just generally a bit broken currently and will need some more work. But shouldn't block merging the set of PRs in the run.
1 failure deploying jaeger-tracing. Known issue https://tracker.ceph.com/issues/59704
1 strange failure in the mgr-nfs-upgrade sequence. It was failing redeploying the first mgr as part of the upgrade. Interactive reruns allowed me to find the issue was

2023-09-08 19:03:19,673 7f017b1f1b80 DEBUG Determined image: 'quay.ceph.io/ceph-ci/ceph@sha256:29eb1b22bdc86e11facd8e3b821e546994d614ae2a0aec9d47234c7aede558d5'
2023-09-08 19:03:19,693 7f017b1f1b80 INFO Redeploy daemon mgr.smithi012.wqsagl ...
2023-09-08 19:06:22,875 7f017b1f1b80 INFO Non-zero exit code 1 from systemctl daemon-reload
2023-09-08 19:06:22,875 7f017b1f1b80 INFO systemctl: stderr Failed to reload daemon: Connection timed out

which is particularly odd because systemctl daemon-reload isn't even a command specific to the mgr's systemd unit. If it had been starting the systemd unit for the mgr, it could maybe be traced back to something with the mgr in the current build, but for whatever reason it was timing out during the daemon-reload. I would have considered it a weird one off if it wasn't for the fact that it reproduced 3 times in a row. Not really sure what to make of it. But either way I don't think we should hold up other PRs merging for it. Will just need some more investigation in the future.

Overall, I think we can merge the PRs from the run.

ljflores · 2023-12-12T19:37:37Z

@adk3798 can you check if this PR is causing this bug? https://tracker.ceph.com/issues/63778

adk3798 added cephfs Ceph File System tests cephadm labels Aug 8, 2023

adk3798 requested a review from a team as a code owner August 8, 2023 16:37

adk3798 requested a review from dparmar18 August 8, 2023 16:39

batrick requested a review from a team August 8, 2023 19:21

vshankar reviewed Aug 9, 2023

View reviewed changes

qa/suites/fs/upgrade/mds_upgrade_sequence/tasks/0-from/v17.2.0.yaml Outdated Show resolved Hide resolved

adk3798 force-pushed the upgrade-test-start-squid branch from bea0e31 to 37f030b Compare August 9, 2023 14:43

adk3798 added the wip-adk-testing label Aug 9, 2023

batrick requested changes Aug 9, 2023

View reviewed changes

adk3798 force-pushed the upgrade-test-start-squid branch from 37f030b to d373c6c Compare August 9, 2023 21:01

batrick approved these changes Aug 10, 2023

View reviewed changes

batrick added the needs-qa label Aug 10, 2023

adk3798 force-pushed the upgrade-test-start-squid branch from d373c6c to a59e377 Compare August 18, 2023 17:04

adk3798 added the DNM label Aug 18, 2023

adk3798 commented Aug 21, 2023

View reviewed changes

adk3798 added 3 commits August 23, 2023 14:25

qa/cephadm: update start version of upgrade tests to quincy

0872eaa

We're now past the reef release, so main is now what will become squid and we should only be testing upgrades to squid from quincy onward Signed-off-by: Adam King <adking@redhat.com>

qa/cephadm: update start version for mgr-nfs-upgrade tests

996b8d1

Now that we're post reef release, the upgrade tests on main should be starting their upgrades from quincy rather than pacific Signed-off-by: Adam King <adking@redhat.com>

qa/fs/upgrade: update mds_upgrade_sequence test starting version

85fe26f

Now that reef has been released, on main we only need to test upgrades starting from quincy and upgrades from pacific are no longer valid Signed-off-by: Adam King <adking@redhat.com>

adk3798 force-pushed the upgrade-test-start-squid branch from a59e377 to 85fe26f Compare August 23, 2023 18:25

adk3798 removed the DNM label Aug 23, 2023

adk3798 merged commit 2b83983 into ceph:main Sep 11, 2023
10 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qa/cephadm: start upgrade tests from quincy #52881

qa/cephadm: start upgrade tests from quincy #52881

adk3798 commented Aug 8, 2023

adk3798 commented Aug 8, 2023

batrick left a comment

adk3798 commented Aug 9, 2023

adk3798 commented Aug 15, 2023

dparmar18 commented Aug 16, 2023

adk3798 commented Aug 16, 2023

adk3798 Aug 21, 2023

adk3798 Aug 21, 2023

adk3798 Aug 23, 2023

adk3798 commented Sep 11, 2023

ljflores commented Dec 12, 2023

qa/cephadm: start upgrade tests from quincy #52881

qa/cephadm: start upgrade tests from quincy #52881

Conversation

adk3798 commented Aug 8, 2023

Contribution Guidelines

Checklist

adk3798 commented Aug 8, 2023

batrick left a comment

Choose a reason for hiding this comment

adk3798 commented Aug 9, 2023

adk3798 commented Aug 15, 2023

dparmar18 commented Aug 16, 2023

adk3798 commented Aug 16, 2023

adk3798 Aug 21, 2023

Choose a reason for hiding this comment

adk3798 Aug 21, 2023

Choose a reason for hiding this comment

adk3798 Aug 23, 2023

Choose a reason for hiding this comment

adk3798 commented Sep 11, 2023

ljflores commented Dec 12, 2023