Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: add option to dump pg log to pg command #46571

Merged
merged 1 commit into from Jun 27, 2022

Conversation

NitzanMordhai
Copy link
Contributor

@NitzanMordhai NitzanMordhai commented Jun 8, 2022

Currently we need to stop the cluster and use ceph_objectstore_tool to dump pg log
with that commit we will be able to dump pg logs with PG command

Fixes: https://tracker.ceph.com/issues/56153
Signed-off-by: Nitzan Mordechai nmordech@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

Currently we need to stop the cluster and use ceph_objectstore_tool to dump pg log
with that commit we will be able to dump pg logs with PG command

Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
@NitzanMordhai NitzanMordhai requested a review from a team as a code owner June 8, 2022 13:44
@github-actions github-actions bot added the core label Jun 8, 2022

f->open_object_section("op_log");
f->open_object_section("pg_log_t");
recovery_state.get_pg_log().get_log().dump(f.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need to be holding the pg lock here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is PrimaryLogPG::do_command. Check that the caller is already holding the lock, but I think it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, osd.cc will do the lock on pg before calling the do_command

@NitzanMordhai
Copy link
Contributor Author

jenkins test make check

@NitzanMordhai
Copy link
Contributor Author

jenkins test api

@athanatos athanatos self-requested a review June 17, 2022 16:27
Copy link
Member

@neha-ojha neha-ojha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should backport this. Let's create a tracker for it.

@neha-ojha
Copy link
Member

jenkins test make check

@NitzanMordhai
Copy link
Contributor Author

I think we should backport this. Let's create a tracker for it.

done, https://tracker.ceph.com/issues/56153

@kamoltat
Copy link
Member

0 related failures

yuriw-2022-06-23_14:17:25-rados-wip-yuri6-testing-2022-06-22-1419-distro-default-smithi

jobid: [6894622, 6894626, 6894629, 6894631]
description: rados/singleton-bluestore/{all/cephtool mon_election/classic msgr-failures/many msgr/async-v2only objectstore/bluestore-comp-lz4 rados supported-random-distro$/{rhel_8}}
failure_reason: Command failed (workunit test cephtool/test.sh) on smithi189 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=fad4b1c200ee6a758bd948f031903dd98c630b4c TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh'
traceback: check_response erasure-code didn't find erasure-code in output
tracker: NA
created_tracker: https://tracker.ceph.com/issues/56384

jobid: [6894623, 6894627, 6894630, 6894632]
description: rados/rook/smoke/{0-distro/ubuntu_20.04 0-kubeadm 0-nvme-loop 1-rook 2-workload/radosbench cluster/1-node k8s/1.21 net/host rook/master}
failure_reason: 'wait for operator' reached maximum tries (90) after waiting for 900 seconds
traceback:
tracker: https://tracker.ceph.com/issues/52321
created_tracker:

jobid: [6894628, 6894633, 6894631]
description: rados/upgrade/parallel/{0-random-distro$/{centos_8.stream_container_tools_crun} 0-start 1-tasks mon_election/classic upgrade-sequence workload/{ec-rados-default rados_api rados_loadgenbig rbd_import_export test_rbd_api test_rbd_python}}
failure_reason: Command failed (workunit test cls/test_cls_rgw.sh) on smithi162 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=pacific TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_rgw.sh'
traceback: 2022-06-23T14:59:37.981 INFO:tasks.workunit.client.0.smithi162.stdout:[ FAILED ] cls_rgw.index_suggest_complete
2022-06-23T14:59:37.981 INFO:tasks.workunit.client.0.smithi162.stdout:[ FAILED ] cls_rgw.index_list
2022-06-23T14:59:37.981 INFO:tasks.workunit.client.0.smithi162.stdout:[ FAILED ] cls_rgw.index_list_delimited
tracker: https://tracker.ceph.com/issues/55789, https://tracker.ceph.com/issues/55853
created_tracker:

DEAD job

jobid: 6894621
description: NA
failure_reason: '082ae7ef4302fa54665ed0a2535e8e254118dcfd' not found in repo: git://git.ceph.com/git/teuthology.git!
traceback:
tracker:
created_tracker:

yuriw-2022-06-23_03:08:39-rados-wip-yuri6-testing-2022-06-22-1419-distro-default-smithi

jobid: [6893579, 6893654, 6893734, 6893811]
description: rados/singleton-bluestore/{all/cephtool mon_election/classic msgr-failures/many msgr/async-v2only objectstore/bluestore-comp-lz4 rados supported-random-distro$/{rhel_8}}
failure_reason: Command failed (workunit test cephtool/test.sh) on smithi064 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=fad4b1c200ee6a758bd948f031903dd98c630b4c TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh'
traceback:
tracker: https://tracker.ceph.com/issues/56384
created_tracker:

jobid: [6893583, 6893664, 6893743, 6893822, 6893822]
description: rados/rook/smoke/{0-distro/ubuntu_20.04 0-kubeadm 0-nvme-loop 1-rook 2-workload/radosbench cluster/1-node k8s/1.21 net/host rook/master}
failure_reason: 'wait for operator' reached maximum tries (90) after waiting for 900 seconds
traceback:
tracker: https://tracker.ceph.com/issues/52321
created_tracker:

jobid: [6893667, 6893824]
description: rados/upgrade/parallel/{0-random-distro$/{centos_8.stream_container_tools_crun} 0-start 1-tasks mon_election/classic upgrade-sequence workload/{ec-rados-default rados_api rados_loadgenbig rbd_import_export test_rbd_api test_rbd_python}}
failure_reason: Command failed (workunit test cls/test_cls_rgw.sh) on smithi088 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=pacific TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_rgw.sh'
traceback:
tracker: https://tracker.ceph.com/issues/52321
created_tracker:

jobid: [6893596]
description: rados/mgr/{clusters/{2-node-mgr} debug/mgr mgr_ttl_cache/enable mon_election/classic random-objectstore$/{bluestore-comp-zlib} supported-random-distro$/{centos_8} tasks/prometheus}
failure_reason: "2022-06-23T03:54:30.960636+0000 mds.a (mds.0) 1 : cluster [WRN] evicting unresponsive client smithi165:z (4774), after 300.189 seconds" in cluster log
traceback:
tracker: https://tracker.ceph.com/issues/52876
created_tracker:

jobid: [6893630]
description: rados/cephadm/workunits/{agent/on mon_election/connectivity task/test_nfs}
failure_reason: Test failure: test_create_delete_cluster_idempotency (tasks.cephfs.test_nfs.TestNFS)
traceback: cephadm exited with an error code: 1, stderr: ERROR: Daemon not found: mds.a.smithi059.tgjbmj. See cephadm ls
tracker: https://tracker.ceph.com/issues/56000
created_tracker:

@yuriw yuriw merged commit 1b2fb99 into ceph:main Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants