jewel: filestore: can get stuck in an unbounded loop during scrub #12001

Merged
merged 1 commit into from Dec 1, 2016

Projects

None yet

4 participants

@liewegas @ldachary liewegas os/filestore/HashIndex: fix list_by_hash_* termination on reaching end
If we set *next to max, then the caller (a few lines up) doesn't terminate
the loop and will keep trying to list objects in every following hash
dir until it reaches the end of the collection.  In fact, if we have an
end bound we will never to an efficient listing unless we hit the max
first.

For one user, this was causing OSD suicides when scrub ran because it
wasn't able to list all objects before the timeout.  In general, this would
cause scrub to stall a PG for a long time and slow down requests.

Broken by refactor in 921c458.

Fixes: http://tracker.ceph.com/issues/17859
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit c518026)
3cc29c6
@dachary dachary self-assigned this Nov 15, 2016
@dachary dachary added this to the jewel milestone Nov 15, 2016
@dachary dachary changed the base branch to ceph:jewel-next from ceph:jewel Nov 15, 2016
@dachary
Member
dachary commented Nov 16, 2016

jenkins test this please (jenkins was stuck)

@theanalyst
Member

@dachary can you retarget this to jewel, as we need this also for 10.2.4

@theanalyst theanalyst changed the base branch to ceph:jewel from ceph:jewel-next Nov 28, 2016
@dachary dachary changed the base branch to ceph:jewel-next from ceph:jewel Nov 28, 2016
@liewegas liewegas was assigned by theanalyst Nov 28, 2016
@theanalyst
Member

@liewegas ok to merge?

@dachary dachary changed the base branch to ceph:jewel from ceph:jewel-next Nov 28, 2016
@liewegas
Member

yep!

@athanatos
Member

retest this

@athanatos
Member

jenkins retest this

@athanatos
Member

retest this please

@dachary
Member
dachary commented Dec 1, 2016

@athanatos the error is expected and is fixed in jewel-next. It is environmental.

# TOTAL: 141
# PASS:  140
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: ceph-disk/run-tox.sh
==========================
...
ERROR:   flake8: commands failed
  py27: commands succeeded
@athanatos athanatos merged commit be5c828 into ceph:jewel Dec 1, 2016

2 of 3 checks passed

default Build finished.
Details
Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment