osd/scrub: late-arriving reservation grants are not an error #46860

ronen-fr · 2022-06-27T16:09:48Z

... as, barring a bug, these are simply the successful grants
received after one replica had failed to secure the required
resources.

Fixes: https://tracker.ceph.com/issues/56400

Signed-off-by: Ronen Friedman rfriedma@redhat.com

ronen-fr · 2022-06-27T16:11:49Z

WIll update with tracker number once the site is available.

... as, barring a bug, these are simply the successful grants received after one replica had failed to secure the required resources. Fixes: https://tracker.ceph.com/issues/56400 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

kamoltat · 2022-06-30T20:58:21Z

PR is good to go.

0 related failure, 0 related dead.

yuriw-2022-06-30_14:20:05-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi

Fail

jobid: [6907395]
description: rados/verify/{centos_latest ceph clusters/{fixed-2 openstack} d-thrash/default/{default thrashosds-health} mon_election/connectivity msgr-failures/few msgr/async-v1only objectstore/bluestore-hybrid rados tasks/rados_api_tests validater/valgrind}
failure_reason: Command failed (workunit test rados/test.sh) on smithi097 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=5db9f98e99cd80cc5f53c9288413fef1214b4b35 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 ALLOW_TIMEOUTS=1 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 6h /home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test.sh'
traceback: 2022-06-30T15:47:37.407 INFO:tasks.ceph.osd.2.smithi097.stderr:2022-06-30T15:47:37.403+0000 11cdc700 -1 received signal: Hangup from /usr/bin/python3 /bin/daemon-helper term env OPENSSL_ia32cap=~0x1000000000000000 valgrind --trace-children=no --child-silent-after-fork=yes --soname-synonyms=somalloc=tcmalloc --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.2.log --time-stamp=yes --vgdb=yes --exit-on-first-error=yes --error-exitcode=42 --tool=memcheck ceph-osd -f --cluster ceph -i 2 (PID: 46502) UID: 0
tracker: https://tracker.ceph.com/issues/55001
created_tracker:

jobid: [6907396]
description: rados/thrash-erasure-code-big/{ceph cluster/{12-osds openstack} mon_election/connectivity msgr-failures/few objectstore/bluestore-low-osd-mem-target rados recovery-overrides/{more-async-recovery} supported-random-distro$/{rhel_8} thrashers/fastread thrashosds-health workloads/ec-rados-plugin=jerasure-k=4-m=2}
failure_reason: Command failed on smithi080 with status 123: "sudo find /var/log/ceph -name '*.log' -print0 | sudo xargs -0 --no-run-if-empty -- gzip --"
traceback:2022-06-30T15:27:49.882 ERROR:teuthology.run_tasks:Manager failed: ceph
tracker: https://tracker.ceph.com/issues/50868
created_tracker:

jobid: [6907397, 6907402, 6907405, 6907411]
description: rados/singleton-bluestore/{all/cephtool mon_election/classic msgr-failures/many msgr/async-v2only objectstore/bluestore-comp-lz4 rados supported-random-distro$/{ubuntu_latest}}
failure_reason: Command failed (workunit test cephtool/test.sh) on smithi162 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=5db9f98e99cd80cc5f53c9288413fef1214b4b35 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh'
traceback: 2022-06-30T15:27:43.795 INFO:tasks.workunit.client.0.smithi162.stderr:Error EINVAL: RADOS pool 'fs_data' is already used by filesystem 'cephfs' as a 'data' pool for application 'cephfs'
tracker: https://tracker.ceph.com/issues/56384
created_tracker:

jobid: [6907398, 6907403, 6907406, 6907412]
description: rados/rook/smoke/{0-distro/ubuntu_20.04 0-kubeadm 0-nvme-loop 1-rook 2-workload/radosbench cluster/1-node k8s/1.21 net/host rook/master}
failure_reason: 'wait for operator' reached maximum tries (90) after waiting for 900 seconds
traceback:
tracker: https://tracker.ceph.com/issues/52321
created_tracker:

jobid: [6907404, 6907413]
description: rados/upgrade/parallel/{0-random-distro$/{rhel_8.6_container_tools_rhel8} 0-start 1-tasks mon_election/classic upgrade-sequence workload/{ec-rados-default rados_api rados_loadgenbig rbd_import_export test_rbd_api test_rbd_python}}
failure_reason: Command failed (workunit test cls/test_cls_rgw.sh) on smithi043 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=pacific TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_rgw.sh'
traceback:
tracker: https://tracker.ceph.com/issues/55853
created_tracker:

Dead

jobid: [6907408, 6907414, 6907415]
description: rados/cephadm/osds/{0-distro/rhel_8.6_container_tools_rhel8 0-nvme-loop 1-start 2-ops/rm-zap-add}
failure_reason: 2022-06-30T15:02:57.869 DEBUG:teuthology.task.ansible:Running ansible-playbook -v --extra-vars '{"ansible_ssh_user": "ubuntu"}' -i /etc/ansible/hosts --limit smithi008.front.sepia.ceph.com,smithi111.front.sepia.ceph.com /home/teuthworker/src/git.ceph.com_git_ceph-cm-ansible_main/cephlab.yml
2022-06-30T15:07:00.184 INFO:teuthology.task.ansible:Archiving ansible failure log at: /home/teuthworker/archive/yuriw-2022-06-30_14:20:05-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi/6907408/ansible_failures.yaml
traceback: 2022-06-30T15:02:57.869 DEBUG:teuthology.task.ansible:Running ansible-playbook -v --extra-vars '{"ansible_ssh_user": "ubuntu"}' -i /etc/ansible/hosts --limit
tracker: https://tracker.ceph.com/issues/56391
created_tracker:

yuriw-2022-06-29_13:30:16-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi

Fail

jobid: [6905499]
description: rados/verify/{centos_latest ceph clusters/{fixed-2 openstack} d-thrash/default/{default thrashosds-health} mon_election/connectivity msgr-failures/few msgr/async-v1only objectstore/bluestore-hybrid rados tasks/rados_api_tests validater/valgrind}
failure_reason: Command failed (workunit test rados/test.sh) on smithi097 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=5db9f98e99cd80cc5f53c9288413fef1214b4b35 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 ALLOW_TIMEOUTS=1 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 6h /home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test.sh'
traceback: 2022-06-30T15:47:37.407 INFO:tasks.ceph.osd.2.smithi097.stderr:2022-06-30T15:47:37.403+0000 11cdc700 -1 received signal: Hangup from /usr/bin/python3 /bin/daemon-helper term env OPENSSL_ia32cap=~0x1000000000000000 valgrind --trace-children=no --child-silent-after-fork=yes --soname-synonyms=somalloc=tcmalloc --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.2.log --time-stamp=yes --vgdb=yes --exit-on-first-error=yes --error-exitcode=42 --tool=memcheck ceph-osd -f --cluster ceph -i 2 (PID: 46502) UID: 0
tracker: https://tracker.ceph.com/issues/55001
created_tracker:

jobid: [6905507, 6905602, 6905682, 6905785]
description: rados/singleton-bluestore/{all/cephtool mon_election/classic msgr-failures/many msgr/async-v2only objectstore/bluestore-comp-lz4 rados supported-random-distro$/{ubuntu_latest}}
failure_reason: Command failed (workunit test cephtool/test.sh) on smithi162 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=5db9f98e99cd80cc5f53c9288413fef1214b4b35 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh'
traceback: 2022-06-30T15:27:43.795 INFO:tasks.workunit.client.0.smithi162.stderr:Error EINVAL: RADOS pool 'fs_data' is already used by filesystem 'cephfs' as a 'data' pool for application 'cephfs'
tracker: https://tracker.ceph.com/issues/56384
created_tracker:

jobid: [6905513, 6905691, 6905796]
description: rados/rook/smoke/{0-distro/ubuntu_20.04 0-kubeadm 0-nvme-loop 1-rook 2-workload/radosbench cluster/1-node k8s/1.21 net/host rook/master}
failure_reason: 'wait for operator' reached maximum tries (90) after waiting for 900 seconds
traceback:
tracker: https://tracker.ceph.com/issues/52321
created_tracker:

jobid: [6905523] Bug has been re-opened after a long time
description: rados/basic/{ceph clusters/{fixed-2 openstack} mon_election/connectivity msgr-failures/many msgr/async-v2only objectstore/bluestore-comp-lz4 rados supported-random-distro$/{centos_8} tasks/scrub_test}
failure_reason: NA
traceback:
2022-06-29T15:47:45.037 INFO:teuthology.orchestra.run.smithi183.stderr:instructing pg 2.7 on osd.3 to repair
2022-06-29T15:47:46.202 INFO:tasks.ceph.osd.3.smithi183.stderr:2022-06-29T15:47:46.198+0000 7f6901ace700 -1 log_channel(cluster) log [ERR] : 2.7 soid 2:e01b7490:::benchmark_data_smithi183_51804_object653:head : data_digest 0x51047cec != data_digest 0xd0c168d4 from shard 3
2022-06-29T15:47:46.202 INFO:tasks.ceph.osd.3.smithi183.stderr:2022-06-29T15:47:46.198+0000 7f6901ace700 -1 log_channel(cluster) log [ERR] : 2.7 shard 3 soid 2:e01b7490:::benchmark_data_smithi183_51804_object653:head : data_digest 0xd0c168d4 != data_digest 0x51047cec from auth oi 2:e01b7490:::benchmark_data_smithi183_51804_object653:head(23'367 osd.3.0:1864 dirty|omap|data_digest|omap_digest s 4096 uv 366 dd 51047cec od 3d709c88 alloc_hint [4096 4096 53])
2022-06-29T15:47:46.335 INFO:tasks.ceph.osd.3.smithi183.stderr:2022-06-29T15:47:46.330+0000 7f6901ace700 -1 log_channel(cluster) log [ERR] : 2.7 repair 0 missing, 1 inconsistent objects
2022-06-29T15:47:46.335 INFO:tasks.ceph.osd.3.smithi183.stderr:2022-06-29T15:47:46.330+0000 7f6901ace700 -1 log_channel(cluster) log [ERR] : 2.7 repair 0 missing, 1 inconsistent objects
2022-06-29T15:47:46.336 INFO:tasks.ceph.osd.3.smithi183.stderr:2022-06-29T15:47:46.330+0000 7f6901ace700 -1 log_channel(cluster) log [ERR] : 2.7 repair 2 errors, 1 fixed
tracker: https://tracker.ceph.com/issues/50242
created_tracker:

jobid: [6905537]
description: rados/standalone/{supported-random-distro$/{rhel_8} workloads/osd}
failure_reason: qa/standalone/osd/divergent-priors.sh fails in test TEST_divergent_3()
traceback: 2022-06-13T17:08:10.273 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:547: delete_pool: ceph osd pool delete test test --yes-i-really-really-mean-it
2022-06-13T17:08:10.829 INFO:tasks.workunit.client.0.smithi121.stderr:pool 'test' does not exist
tracker: https://tracker.ceph.com/issues/56034
created_tracker:

jobid: [6905612, 6905798]
description: rados/upgrade/parallel/{0-random-distro$/{rhel_8.6_container_tools_rhel8} 0-start 1-tasks mon_election/classic upgrade-sequence workload/{ec-rados-default rados_api rados_loadgenbig rbd_import_export test_rbd_api test_rbd_python}}
failure_reason: Command failed (workunit test cls/test_cls_rgw.sh) on smithi043 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=pacific TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_rgw.sh'
traceback:
tracker: https://tracker.ceph.com/issues/55853
created_tracker:

Dead

jobid: [6905506, 6905713, 6905714, 6905809, 6905845, 6905856, 6905860]
description: rados/thrash-erasure-code-big/{ceph cluster/{12-osds openstack} mon_election/connectivity msgr-failures/few objectstore/bluestore-low-osd-mem-target rados recovery-overrides/{more-async-recovery} supported-random-distro$/{rhel_8} thrashers/fastread thrashosds-health workloads/ec-rados-plugin=jerasure-k=4-m=2}
failure_reason: 2022-06-30T15:02:57.869 DEBUG:teuthology.task.ansible:Running ansible-playbook -v --extra-vars '{"ansible_ssh_user": "ubuntu"}' -i /etc/ansible/hosts --limit smithi008.front.sepia.ceph.com,smithi111.front.sepia.ceph.com /home/teuthworker/src/git.ceph.com_git_ceph-cm-ansible_main/cephlab.yml
2022-06-30T15:07:00.184 INFO:teuthology.task.ansible:Archiving ansible failure log at: /home/teuthworker/archive/yuriw-2022-06-30_14:20:05-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi/6907408/ansible_failures.yaml
traceback: 2022-06-30T15:02:57.869 DEBUG:teuthology.task.ansible:Running ansible-playbook -v --extra-vars '{"ansible_ssh_user": "ubuntu"}' -i /etc/ansible/hosts --limit
tracker: https://tracker.ceph.com/issues/56391
created_tracker:

jobid: [6905710]
description: rados/perf/{ceph mon_election/classic objectstore/bluestore-low-osd-mem-target openstack scheduler/dmclock_1Shard_16Threads settings/optimized ubuntu_latest workloads/radosbench_4M_write}
failure_reason: {'smithi097.front.sepia.ceph.com': {'_ansible_no_log': False, 'cache_update_time': 1656534220, 'cache_updated': False, 'changed': False, 'invocation': {'module_args': {'allow_unauthenticated': False, 'autoclean': False, 'autoremove': False, 'cache_valid_time': 0, 'deb': None, 'default_release': None, 'dpkg_options': 'force-confdef,force-confold', 'force': True, 'force_apt_get': False, 'install_recommends': None, 'name': ['apt', 'apache2'], 'only_upgrade': False, 'package': ['apt', 'apache2'], 'policy_rc_d': None, 'purge': False, 'state': 'latest', 'update_cache': None, 'update_cache_retries': 5, 'update_cache_retry_max_delay': 12, 'upgrade': None}}, 'msg': ''/usr/bin/apt-get -y -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" --force-yes install 'apache2'' failed: W: --force-yes is deprecated, use one of the options starting with --allow instead.\nE: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 4727 (apt-get)\nE: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?\n', 'rc': 100, 'stderr': 'W: --force-yes is deprecated, use one of the options starting with --allow instead.\nE: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 4727 (apt-get)\nE: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?\n', 'stderr_lines': ['W: --force-yes is deprecated, use one of the options starting with --allow instead.', 'E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 4727 (apt-get)', 'E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?'], 'stdout': '', 'stdout_lines': []}}
traceback:
tracker: https://tracker.ceph.com/issues/56395
created_tracker:

ronen-fr · 2022-07-01T09:47:06Z

I will be merging based on the QA tests above. No need to wait for the upgrade tests for this change.

github-actions bot added the core label Jun 27, 2022

ronen-fr added bug-fix needs-quincy-backport backport required for quincy needs-pacific-backport PR needs a pacific backport and removed core labels Jun 27, 2022

ronen-fr requested review from ljflores, amathuria and neha-ojha June 27, 2022 16:14

ronen-fr marked this pull request as ready for review June 27, 2022 16:15

ronen-fr requested a review from a team as a code owner June 27, 2022 16:16

neha-ojha approved these changes Jun 27, 2022

View reviewed changes

neha-ojha added the needs-qa label Jun 27, 2022

ronen-fr force-pushed the wip-rf-good-grants branch from ff91a1b to ac236d0 Compare June 28, 2022 11:45

github-actions bot added the core label Jun 28, 2022

ronen-fr force-pushed the wip-rf-good-grants branch from ac236d0 to 141bce2 Compare June 28, 2022 11:53

yuriw added the wip-yuri3-testing label Jun 29, 2022

ronen-fr merged commit ab3e72b into ceph:main Jul 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd/scrub: late-arriving reservation grants are not an error #46860

osd/scrub: late-arriving reservation grants are not an error #46860

ronen-fr commented Jun 27, 2022 •

edited

ronen-fr commented Jun 27, 2022

kamoltat commented Jun 30, 2022

ronen-fr commented Jul 1, 2022

osd/scrub: late-arriving reservation grants are not an error #46860

osd/scrub: late-arriving reservation grants are not an error #46860

Conversation

ronen-fr commented Jun 27, 2022 • edited

ronen-fr commented Jun 27, 2022

kamoltat commented Jun 30, 2022

yuriw-2022-06-30_14:20:05-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi

yuriw-2022-06-29_13:30:16-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi

ronen-fr commented Jul 1, 2022

ronen-fr commented Jun 27, 2022 •

edited