Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: fix erasure hung op bug (9835) #2760

Merged
merged 3 commits into from
Nov 13, 2014
Merged

osd: fix erasure hung op bug (9835) #2760

merged 3 commits into from
Nov 13, 2014

Conversation

liewegas
Copy link
Member

No description provided.

@athanatos
Copy link
Contributor

wip-sam-firefly-testing

@athanatos
Copy link
Contributor

Calc_pg_role doesn't take into account primary affinity, caused OOO ops. Fixing.

@athanatos
Copy link
Contributor

DNM, need to squash first, added updated version to wip-sam-firefly-testing, I think we want something more like this one.

@liewegas
Copy link
Member Author

liewegas commented Nov 7, 2014

Reviewed-by:

This is only failing on EC pools, right? For replicated primary is always == acting[0]?

@liewegas liewegas added this to the firefly milestone Nov 11, 2014
@liewegas liewegas removed the firefly label Nov 11, 2014
liewegas and others added 3 commits November 12, 2014 17:13
Helper to check whether an osd is a given op target for a pg.  This
assumes that for EC we always send ops to the primary, while for
replicated we may target any replica.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 89c0263)
Erasure pools do not support read from replica, so we should drop
any rank > 0 requests.

This fixes a bug where an erasure pool maps to [1,2,3], temporarily maps
to [-1,2,3], sends a request to osd.2, and then remaps back to [1,2,3].
Because the 0 shard never appears on osd.2, the request sits in the
waiting_for_pg map indefinitely and cases slow request warnings.
This problem does not come up on replicated pools because all instances of
the PG are created equal.

Fix by only considering role == 0 for erasure pools as a correct mapping.

Fixes: #9835
Signed-off-by: Sage Weil <sage@redhat.com>
calc_pg_role doesn't actually take into account primary affinity.

Fixes: #9835
Signed-off-by: Samuel Just <sam.just@inktank.com>
@liewegas
Copy link
Member Author

repushed, with cherry-pick -x from the version for master/next

@liewegas liewegas added the core label Nov 13, 2014
athanatos pushed a commit that referenced this pull request Nov 13, 2014
osd: fix erasure hung op bug (9835)

Reviewed-by: Samuel Just <sjust@redhat.com>
@athanatos athanatos merged commit c069bce into firefly Nov 13, 2014
@athanatos athanatos deleted the wip-9835-firefly branch November 13, 2014 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants