Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reef: mgr/cephadm: also don't write client files/tuned profiles to maintenance hosts #53111

Merged
merged 1 commit into from Aug 31, 2023

Conversation

adk3798
Copy link
Contributor

@adk3798 adk3798 commented Aug 23, 2023

backport tracker: https://tracker.ceph.com/issues/61549


backport of #51347
parent tracker: https://tracker.ceph.com/issues/59650

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh

…nce hosts

Since they could have been taken offline for maintenance,
they should be treated the same as offline hosts in this
case and we should avoid trying to write files to them

Fixes: https://tracker.ceph.com/issues/59650

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 4244d83)
@adk3798 adk3798 requested a review from a team as a code owner August 23, 2023 20:04
@adk3798 adk3798 added this to the reef milestone Aug 23, 2023
Copy link
Contributor

@phlogistonjohn phlogistonjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK for backport.

@adk3798
Copy link
Contributor Author

adk3798 commented Aug 31, 2023

https://pulpito.ceph.com/adking-2023-08-28_16:10:32-orch:cephadm-wip-adk4-testing-2023-08-23-1835-reef-distro-default-smithi/

reruns of failed/dead jobs: https://pulpito.ceph.com/adking-2023-08-29_11:42:05-orch:cephadm-wip-adk4-testing-2023-08-23-1835-reef-distro-default-smithi/

After reruns, 10 failed jobs

  • 4 failures because "ceph orch upgrade ls" doesn't currently work on reef. Known issue but I haven't opened a tracker
  • 1 failure deploying jaeger-tracing, known issue, https://tracker.ceph.com/issues/59704
  • 4 failures in nfs-rgw-ingress tests. This seems to be an issue with ganesha reading the RGW FSAL
Aug 30 14:59:25 smithi022 conmon[160477]: 30/08/2023 14:59:25 : epoch 64ef5941 : smithi022 : ganesha.nfsd-7[main] load_fsal :NFS STARTUP :DEBUG :Loading FSAL RGW with /usr/lib64/ganesha/libfsalrgw.so
Aug 30 14:59:25 smithi022 conmon[160477]: free(): invalid pointer

Issue also appeared in a small test run of these tests run directly on the reef branch with no extra patches, so it's unrelated to any PRs in the run.

  • 1 failure due to smithi146 kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks: ' in syslog. Doesn't look related to any PRs in the run

@adk3798 adk3798 merged commit 82fea27 into ceph:reef Aug 31, 2023
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants