osd: add option to dump pg log to pg command #46571

NitzanMordhai · 2022-06-08T13:44:50Z

Currently we need to stop the cluster and use ceph_objectstore_tool to dump pg log
with that commit we will be able to dump pg logs with PG command

Fixes: https://tracker.ceph.com/issues/56153
Signed-off-by: Nitzan Mordechai nmordech@redhat.com

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows

Currently we need to stop the cluster and use ceph_objectstore_tool to dump pg log with that commit we will be able to dump pg logs with PG command Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>

athanatos · 2022-06-08T18:57:56Z

src/osd/PrimaryLogPG.cc

+
+    f->open_object_section("op_log");
+    f->open_object_section("pg_log_t");
+    recovery_state.get_pg_log().get_log().dump(f.get());


You probably need to be holding the pg lock here.

Oh, this is PrimaryLogPG::do_command. Check that the caller is already holding the lock, but I think it is.

Yes, osd.cc will do the lock on pg before calling the do_command

NitzanMordhai · 2022-06-15T11:13:44Z

jenkins test make check

NitzanMordhai · 2022-06-15T11:14:05Z

jenkins test api

neha-ojha

I think we should backport this. Let's create a tracker for it.

neha-ojha · 2022-06-21T22:20:07Z

jenkins test make check

NitzanMordhai · 2022-06-22T06:33:40Z

I think we should backport this. Let's create a tracker for it.

done, https://tracker.ceph.com/issues/56153

kamoltat · 2022-06-27T15:56:18Z

0 related failures

yuriw-2022-06-23_14:17:25-rados-wip-yuri6-testing-2022-06-22-1419-distro-default-smithi

jobid: [6894622, 6894626, 6894629, 6894631]
description: rados/singleton-bluestore/{all/cephtool mon_election/classic msgr-failures/many msgr/async-v2only objectstore/bluestore-comp-lz4 rados supported-random-distro$/{rhel_8}}
failure_reason: Command failed (workunit test cephtool/test.sh) on smithi189 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=fad4b1c200ee6a758bd948f031903dd98c630b4c TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh'
traceback: check_response erasure-code didn't find erasure-code in output
tracker: NA
created_tracker: https://tracker.ceph.com/issues/56384

jobid: [6894623, 6894627, 6894630, 6894632]
description: rados/rook/smoke/{0-distro/ubuntu_20.04 0-kubeadm 0-nvme-loop 1-rook 2-workload/radosbench cluster/1-node k8s/1.21 net/host rook/master}
failure_reason: 'wait for operator' reached maximum tries (90) after waiting for 900 seconds
traceback:
tracker: https://tracker.ceph.com/issues/52321
created_tracker:

jobid: [6894628, 6894633, 6894631]
description: rados/upgrade/parallel/{0-random-distro$/{centos_8.stream_container_tools_crun} 0-start 1-tasks mon_election/classic upgrade-sequence workload/{ec-rados-default rados_api rados_loadgenbig rbd_import_export test_rbd_api test_rbd_python}}
failure_reason: Command failed (workunit test cls/test_cls_rgw.sh) on smithi162 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=pacific TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_rgw.sh'
traceback: 2022-06-23T14:59:37.981 INFO:tasks.workunit.client.0.smithi162.stdout:[ FAILED ] cls_rgw.index_suggest_complete
2022-06-23T14:59:37.981 INFO:tasks.workunit.client.0.smithi162.stdout:[ FAILED ] cls_rgw.index_list
2022-06-23T14:59:37.981 INFO:tasks.workunit.client.0.smithi162.stdout:[ FAILED ] cls_rgw.index_list_delimited
tracker: https://tracker.ceph.com/issues/55789, https://tracker.ceph.com/issues/55853
created_tracker:

DEAD job

jobid: 6894621
description: NA
failure_reason: '082ae7ef4302fa54665ed0a2535e8e254118dcfd' not found in repo: git://git.ceph.com/git/teuthology.git!
traceback:
tracker:
created_tracker:

yuriw-2022-06-23_03:08:39-rados-wip-yuri6-testing-2022-06-22-1419-distro-default-smithi

jobid: [6893579, 6893654, 6893734, 6893811]
description: rados/singleton-bluestore/{all/cephtool mon_election/classic msgr-failures/many msgr/async-v2only objectstore/bluestore-comp-lz4 rados supported-random-distro$/{rhel_8}}
failure_reason: Command failed (workunit test cephtool/test.sh) on smithi064 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=fad4b1c200ee6a758bd948f031903dd98c630b4c TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh'
traceback:
tracker: https://tracker.ceph.com/issues/56384
created_tracker:

jobid: [6893583, 6893664, 6893743, 6893822, 6893822]
description: rados/rook/smoke/{0-distro/ubuntu_20.04 0-kubeadm 0-nvme-loop 1-rook 2-workload/radosbench cluster/1-node k8s/1.21 net/host rook/master}
failure_reason: 'wait for operator' reached maximum tries (90) after waiting for 900 seconds
traceback:
tracker: https://tracker.ceph.com/issues/52321
created_tracker:

jobid: [6893667, 6893824]
description: rados/upgrade/parallel/{0-random-distro$/{centos_8.stream_container_tools_crun} 0-start 1-tasks mon_election/classic upgrade-sequence workload/{ec-rados-default rados_api rados_loadgenbig rbd_import_export test_rbd_api test_rbd_python}}
failure_reason: Command failed (workunit test cls/test_cls_rgw.sh) on smithi088 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=pacific TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_rgw.sh'
traceback:
tracker: https://tracker.ceph.com/issues/52321
created_tracker:

jobid: [6893596]
description: rados/mgr/{clusters/{2-node-mgr} debug/mgr mgr_ttl_cache/enable mon_election/classic random-objectstore$/{bluestore-comp-zlib} supported-random-distro$/{centos_8} tasks/prometheus}
failure_reason: "2022-06-23T03:54:30.960636+0000 mds.a (mds.0) 1 : cluster [WRN] evicting unresponsive client smithi165:z (4774), after 300.189 seconds" in cluster log
traceback:
tracker: https://tracker.ceph.com/issues/52876
created_tracker:

jobid: [6893630]
description: rados/cephadm/workunits/{agent/on mon_election/connectivity task/test_nfs}
failure_reason: Test failure: test_create_delete_cluster_idempotency (tasks.cephfs.test_nfs.TestNFS)
traceback: cephadm exited with an error code: 1, stderr: ERROR: Daemon not found: mds.a.smithi059.tgjbmj. See cephadm ls
tracker: https://tracker.ceph.com/issues/56000
created_tracker:

osd: add option to dump pg log to pg command

5ccce8d

Currently we need to stop the cluster and use ceph_objectstore_tool to dump pg log with that commit we will be able to dump pg logs with PG command Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>

NitzanMordhai requested a review from a team as a code owner June 8, 2022 13:44

github-actions bot added the core label Jun 8, 2022

NitzanMordhai requested review from rzarzynski and neha-ojha June 8, 2022 13:45

athanatos reviewed Jun 8, 2022

View reviewed changes

athanatos self-requested a review June 17, 2022 16:27

athanatos approved these changes Jun 17, 2022

View reviewed changes

rzarzynski approved these changes Jun 17, 2022

View reviewed changes

rzarzynski added the needs-qa label Jun 17, 2022

neha-ojha approved these changes Jun 21, 2022

View reviewed changes

yuriw added the wip-yuri6-testing label Jun 22, 2022

yuriw merged commit 1b2fb99 into ceph:main Jun 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd: add option to dump pg log to pg command #46571

osd: add option to dump pg log to pg command #46571

NitzanMordhai commented Jun 8, 2022 •

edited

athanatos Jun 8, 2022

athanatos Jun 8, 2022

NitzanMordhai Jun 9, 2022

NitzanMordhai commented Jun 15, 2022

NitzanMordhai commented Jun 15, 2022

neha-ojha left a comment

neha-ojha commented Jun 21, 2022

NitzanMordhai commented Jun 22, 2022

kamoltat commented Jun 27, 2022

osd: add option to dump pg log to pg command #46571

osd: add option to dump pg log to pg command #46571

Conversation

NitzanMordhai commented Jun 8, 2022 • edited

Contribution Guidelines

Checklist

athanatos Jun 8, 2022

Choose a reason for hiding this comment

athanatos Jun 8, 2022

Choose a reason for hiding this comment

NitzanMordhai Jun 9, 2022

Choose a reason for hiding this comment

NitzanMordhai commented Jun 15, 2022

NitzanMordhai commented Jun 15, 2022

neha-ojha left a comment

Choose a reason for hiding this comment

neha-ojha commented Jun 21, 2022

NitzanMordhai commented Jun 22, 2022

kamoltat commented Jun 27, 2022

yuriw-2022-06-23_14:17:25-rados-wip-yuri6-testing-2022-06-22-1419-distro-default-smithi

yuriw-2022-06-23_03:08:39-rados-wip-yuri6-testing-2022-06-22-1419-distro-default-smithi

NitzanMordhai commented Jun 8, 2022 •

edited