Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/scrub: late-arriving reservation grants are not an error #46860

Merged
merged 1 commit into from Jul 1, 2022

Conversation

ronen-fr
Copy link
Contributor

@ronen-fr ronen-fr commented Jun 27, 2022

... as, barring a bug, these are simply the successful grants
received after one replica had failed to secure the required
resources.

Fixes: https://tracker.ceph.com/issues/56400

Signed-off-by: Ronen Friedman rfriedma@redhat.com

@github-actions github-actions bot added the core label Jun 27, 2022
@ronen-fr ronen-fr added bug-fix needs-quincy-backport backport required for quincy needs-pacific-backport PR needs a pacific backport and removed core labels Jun 27, 2022
@ronen-fr
Copy link
Contributor Author

WIll update with tracker number once the site is available.

@ronen-fr ronen-fr marked this pull request as ready for review June 27, 2022 16:15
@ronen-fr ronen-fr requested a review from a team as a code owner June 27, 2022 16:16
... as, barring a bug, these are simply the successful grants
received after one replica had failed to secure the required
resources.

Fixes: https://tracker.ceph.com/issues/56400

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
@kamoltat
Copy link
Member

PR is good to go.

0 related failure, 0 related dead.

yuriw-2022-06-30_14:20:05-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi

Fail

jobid: [6907395]
description: rados/verify/{centos_latest ceph clusters/{fixed-2 openstack} d-thrash/default/{default thrashosds-health} mon_election/connectivity msgr-failures/few msgr/async-v1only objectstore/bluestore-hybrid rados tasks/rados_api_tests validater/valgrind}
failure_reason: Command failed (workunit test rados/test.sh) on smithi097 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=5db9f98e99cd80cc5f53c9288413fef1214b4b35 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 ALLOW_TIMEOUTS=1 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 6h /home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test.sh'
traceback: 2022-06-30T15:47:37.407 INFO:tasks.ceph.osd.2.smithi097.stderr:2022-06-30T15:47:37.403+0000 11cdc700 -1 received signal: Hangup from /usr/bin/python3 /bin/daemon-helper term env OPENSSL_ia32cap=~0x1000000000000000 valgrind --trace-children=no --child-silent-after-fork=yes --soname-synonyms=somalloc=tcmalloc --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.2.log --time-stamp=yes --vgdb=yes --exit-on-first-error=yes --error-exitcode=42 --tool=memcheck ceph-osd -f --cluster ceph -i 2 (PID: 46502) UID: 0
tracker: https://tracker.ceph.com/issues/55001
created_tracker:

jobid: [6907396]
description: rados/thrash-erasure-code-big/{ceph cluster/{12-osds openstack} mon_election/connectivity msgr-failures/few objectstore/bluestore-low-osd-mem-target rados recovery-overrides/{more-async-recovery} supported-random-distro$/{rhel_8} thrashers/fastread thrashosds-health workloads/ec-rados-plugin=jerasure-k=4-m=2}
failure_reason: Command failed on smithi080 with status 123: "sudo find /var/log/ceph -name '*.log' -print0 | sudo xargs -0 --no-run-if-empty -- gzip --"
traceback:2022-06-30T15:27:49.882 ERROR:teuthology.run_tasks:Manager failed: ceph
tracker: https://tracker.ceph.com/issues/50868
created_tracker:

jobid: [6907397, 6907402, 6907405, 6907411]
description: rados/singleton-bluestore/{all/cephtool mon_election/classic msgr-failures/many msgr/async-v2only objectstore/bluestore-comp-lz4 rados supported-random-distro$/{ubuntu_latest}}
failure_reason: Command failed (workunit test cephtool/test.sh) on smithi162 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=5db9f98e99cd80cc5f53c9288413fef1214b4b35 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh'
traceback: 2022-06-30T15:27:43.795 INFO:tasks.workunit.client.0.smithi162.stderr:Error EINVAL: RADOS pool 'fs_data' is already used by filesystem 'cephfs' as a 'data' pool for application 'cephfs'
tracker: https://tracker.ceph.com/issues/56384
created_tracker:

jobid: [6907398, 6907403, 6907406, 6907412]
description: rados/rook/smoke/{0-distro/ubuntu_20.04 0-kubeadm 0-nvme-loop 1-rook 2-workload/radosbench cluster/1-node k8s/1.21 net/host rook/master}
failure_reason: 'wait for operator' reached maximum tries (90) after waiting for 900 seconds
traceback:
tracker: https://tracker.ceph.com/issues/52321
created_tracker:

jobid: [6907404, 6907413]
description: rados/upgrade/parallel/{0-random-distro$/{rhel_8.6_container_tools_rhel8} 0-start 1-tasks mon_election/classic upgrade-sequence workload/{ec-rados-default rados_api rados_loadgenbig rbd_import_export test_rbd_api test_rbd_python}}
failure_reason: Command failed (workunit test cls/test_cls_rgw.sh) on smithi043 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=pacific TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_rgw.sh'
traceback:
tracker: https://tracker.ceph.com/issues/55853
created_tracker:

Dead

jobid: [6907408, 6907414, 6907415]
description: rados/cephadm/osds/{0-distro/rhel_8.6_container_tools_rhel8 0-nvme-loop 1-start 2-ops/rm-zap-add}
failure_reason: 2022-06-30T15:02:57.869 DEBUG:teuthology.task.ansible:Running ansible-playbook -v --extra-vars '{"ansible_ssh_user": "ubuntu"}' -i /etc/ansible/hosts --limit smithi008.front.sepia.ceph.com,smithi111.front.sepia.ceph.com /home/teuthworker/src/git.ceph.com_git_ceph-cm-ansible_main/cephlab.yml
2022-06-30T15:07:00.184 INFO:teuthology.task.ansible:Archiving ansible failure log at: /home/teuthworker/archive/yuriw-2022-06-30_14:20:05-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi/6907408/ansible_failures.yaml
traceback: 2022-06-30T15:02:57.869 DEBUG:teuthology.task.ansible:Running ansible-playbook -v --extra-vars '{"ansible_ssh_user": "ubuntu"}' -i /etc/ansible/hosts --limit
tracker: https://tracker.ceph.com/issues/56391
created_tracker:

yuriw-2022-06-29_13:30:16-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi

Fail

jobid: [6905499]
description: rados/verify/{centos_latest ceph clusters/{fixed-2 openstack} d-thrash/default/{default thrashosds-health} mon_election/connectivity msgr-failures/few msgr/async-v1only objectstore/bluestore-hybrid rados tasks/rados_api_tests validater/valgrind}
failure_reason: Command failed (workunit test rados/test.sh) on smithi097 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=5db9f98e99cd80cc5f53c9288413fef1214b4b35 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 ALLOW_TIMEOUTS=1 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 6h /home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test.sh'
traceback: 2022-06-30T15:47:37.407 INFO:tasks.ceph.osd.2.smithi097.stderr:2022-06-30T15:47:37.403+0000 11cdc700 -1 received signal: Hangup from /usr/bin/python3 /bin/daemon-helper term env OPENSSL_ia32cap=~0x1000000000000000 valgrind --trace-children=no --child-silent-after-fork=yes --soname-synonyms=somalloc=tcmalloc --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.2.log --time-stamp=yes --vgdb=yes --exit-on-first-error=yes --error-exitcode=42 --tool=memcheck ceph-osd -f --cluster ceph -i 2 (PID: 46502) UID: 0
tracker: https://tracker.ceph.com/issues/55001
created_tracker:

jobid: [6905507, 6905602, 6905682, 6905785]
description: rados/singleton-bluestore/{all/cephtool mon_election/classic msgr-failures/many msgr/async-v2only objectstore/bluestore-comp-lz4 rados supported-random-distro$/{ubuntu_latest}}
failure_reason: Command failed (workunit test cephtool/test.sh) on smithi162 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=5db9f98e99cd80cc5f53c9288413fef1214b4b35 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh'
traceback: 2022-06-30T15:27:43.795 INFO:tasks.workunit.client.0.smithi162.stderr:Error EINVAL: RADOS pool 'fs_data' is already used by filesystem 'cephfs' as a 'data' pool for application 'cephfs'
tracker: https://tracker.ceph.com/issues/56384
created_tracker:

jobid: [6905513, 6905691, 6905796]
description: rados/rook/smoke/{0-distro/ubuntu_20.04 0-kubeadm 0-nvme-loop 1-rook 2-workload/radosbench cluster/1-node k8s/1.21 net/host rook/master}
failure_reason: 'wait for operator' reached maximum tries (90) after waiting for 900 seconds
traceback:
tracker: https://tracker.ceph.com/issues/52321
created_tracker:

jobid: [6905523] Bug has been re-opened after a long time
description: rados/basic/{ceph clusters/{fixed-2 openstack} mon_election/connectivity msgr-failures/many msgr/async-v2only objectstore/bluestore-comp-lz4 rados supported-random-distro$/{centos_8} tasks/scrub_test}
failure_reason: NA
traceback:
2022-06-29T15:47:45.037 INFO:teuthology.orchestra.run.smithi183.stderr:instructing pg 2.7 on osd.3 to repair
2022-06-29T15:47:46.202 INFO:tasks.ceph.osd.3.smithi183.stderr:2022-06-29T15:47:46.198+0000 7f6901ace700 -1 log_channel(cluster) log [ERR] : 2.7 soid 2:e01b7490:::benchmark_data_smithi183_51804_object653:head : data_digest 0x51047cec != data_digest 0xd0c168d4 from shard 3
2022-06-29T15:47:46.202 INFO:tasks.ceph.osd.3.smithi183.stderr:2022-06-29T15:47:46.198+0000 7f6901ace700 -1 log_channel(cluster) log [ERR] : 2.7 shard 3 soid 2:e01b7490:::benchmark_data_smithi183_51804_object653:head : data_digest 0xd0c168d4 != data_digest 0x51047cec from auth oi 2:e01b7490:::benchmark_data_smithi183_51804_object653:head(23'367 osd.3.0:1864 dirty|omap|data_digest|omap_digest s 4096 uv 366 dd 51047cec od 3d709c88 alloc_hint [4096 4096 53])
2022-06-29T15:47:46.335 INFO:tasks.ceph.osd.3.smithi183.stderr:2022-06-29T15:47:46.330+0000 7f6901ace700 -1 log_channel(cluster) log [ERR] : 2.7 repair 0 missing, 1 inconsistent objects
2022-06-29T15:47:46.335 INFO:tasks.ceph.osd.3.smithi183.stderr:2022-06-29T15:47:46.330+0000 7f6901ace700 -1 log_channel(cluster) log [ERR] : 2.7 repair 0 missing, 1 inconsistent objects
2022-06-29T15:47:46.336 INFO:tasks.ceph.osd.3.smithi183.stderr:2022-06-29T15:47:46.330+0000 7f6901ace700 -1 log_channel(cluster) log [ERR] : 2.7 repair 2 errors, 1 fixed
tracker: https://tracker.ceph.com/issues/50242
created_tracker:

jobid: [6905537]
description: rados/standalone/{supported-random-distro$/{rhel_8} workloads/osd}
failure_reason: qa/standalone/osd/divergent-priors.sh fails in test TEST_divergent_3()
traceback: 2022-06-13T17:08:10.273 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:547: delete_pool: ceph osd pool delete test test --yes-i-really-really-mean-it
2022-06-13T17:08:10.829 INFO:tasks.workunit.client.0.smithi121.stderr:pool 'test' does not exist
tracker: https://tracker.ceph.com/issues/56034
created_tracker:

jobid: [6905612, 6905798]
description: rados/upgrade/parallel/{0-random-distro$/{rhel_8.6_container_tools_rhel8} 0-start 1-tasks mon_election/classic upgrade-sequence workload/{ec-rados-default rados_api rados_loadgenbig rbd_import_export test_rbd_api test_rbd_python}}
failure_reason: Command failed (workunit test cls/test_cls_rgw.sh) on smithi043 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=pacific TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_rgw.sh'
traceback:
tracker: https://tracker.ceph.com/issues/55853
created_tracker:

Dead

jobid: [6905506, 6905713, 6905714, 6905809, 6905845, 6905856, 6905860]
description: rados/thrash-erasure-code-big/{ceph cluster/{12-osds openstack} mon_election/connectivity msgr-failures/few objectstore/bluestore-low-osd-mem-target rados recovery-overrides/{more-async-recovery} supported-random-distro$/{rhel_8} thrashers/fastread thrashosds-health workloads/ec-rados-plugin=jerasure-k=4-m=2}
failure_reason: 2022-06-30T15:02:57.869 DEBUG:teuthology.task.ansible:Running ansible-playbook -v --extra-vars '{"ansible_ssh_user": "ubuntu"}' -i /etc/ansible/hosts --limit smithi008.front.sepia.ceph.com,smithi111.front.sepia.ceph.com /home/teuthworker/src/git.ceph.com_git_ceph-cm-ansible_main/cephlab.yml
2022-06-30T15:07:00.184 INFO:teuthology.task.ansible:Archiving ansible failure log at: /home/teuthworker/archive/yuriw-2022-06-30_14:20:05-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi/6907408/ansible_failures.yaml
traceback: 2022-06-30T15:02:57.869 DEBUG:teuthology.task.ansible:Running ansible-playbook -v --extra-vars '{"ansible_ssh_user": "ubuntu"}' -i /etc/ansible/hosts --limit
tracker: https://tracker.ceph.com/issues/56391
created_tracker:

jobid: [6905710]
description: rados/perf/{ceph mon_election/classic objectstore/bluestore-low-osd-mem-target openstack scheduler/dmclock_1Shard_16Threads settings/optimized ubuntu_latest workloads/radosbench_4M_write}
failure_reason: {'smithi097.front.sepia.ceph.com': {'_ansible_no_log': False, 'cache_update_time': 1656534220, 'cache_updated': False, 'changed': False, 'invocation': {'module_args': {'allow_unauthenticated': False, 'autoclean': False, 'autoremove': False, 'cache_valid_time': 0, 'deb': None, 'default_release': None, 'dpkg_options': 'force-confdef,force-confold', 'force': True, 'force_apt_get': False, 'install_recommends': None, 'name': ['apt', 'apache2'], 'only_upgrade': False, 'package': ['apt', 'apache2'], 'policy_rc_d': None, 'purge': False, 'state': 'latest', 'update_cache': None, 'update_cache_retries': 5, 'update_cache_retry_max_delay': 12, 'upgrade': None}}, 'msg': ''/usr/bin/apt-get -y -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" --force-yes install 'apache2'' failed: W: --force-yes is deprecated, use one of the options starting with --allow instead.\nE: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 4727 (apt-get)\nE: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?\n', 'rc': 100, 'stderr': 'W: --force-yes is deprecated, use one of the options starting with --allow instead.\nE: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 4727 (apt-get)\nE: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?\n', 'stderr_lines': ['W: --force-yes is deprecated, use one of the options starting with --allow instead.', 'E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 4727 (apt-get)', 'E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?'], 'stdout': '', 'stdout_lines': []}}
traceback:
tracker: https://tracker.ceph.com/issues/56395
created_tracker:

@ronen-fr
Copy link
Contributor Author

ronen-fr commented Jul 1, 2022

I will be merging based on the QA tests above. No need to wait for the upgrade tests for this change.

@ronen-fr ronen-fr merged commit ab3e72b into ceph:main Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants