Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: Calculate degraded and misplaced more accurately #13031

Merged
merged 1 commit into from Jan 25, 2017

Conversation

dzafman
Copy link
Contributor

@dzafman dzafman commented Jan 20, 2017

Calculate num_object_copies based on the larger of pool size,
up set size and acting set size.

Calculate num_objects_degraded as the difference between num_object_copies
and all copies found on acting set and backfilling up set OSDs.

Calculate num_objects_misplaced as all copies on acting set OSDs not in up set
less copies that have been backfilled to up set OSDs.

Fixes: http://tracker.ceph.com/issues/18619

Calculate num_object_copies based on the larger of pool size,
up set size and acting set size.

Calculate num_objects_degraded as the difference between num_object_copies
and all copies found on acting set and backfilling up set OSDs.

Calculate num_objects_misplaced as all copies on acting set OSDs not in up set
less copies that have been backfilled to up set OSDs.

Fixes: http://tracker.ceph.com/issues/18619

Signed-off-by: David Zafman <dzafman@redhat.com>
@liewegas
Copy link
Member

YES!

I'm confused about one thing, though.. when is acting.size() or up.size() larger than the pool size?

@dzafman
Copy link
Contributor Author

dzafman commented Jan 24, 2017

@liewegas I was trying to handle the case that pool size is larger than both up.size() and acting.size(). Maybe more to you question, if you reduce pool size your acting.size() can be larger than pool size. Or maybe it won't do that because it won't need to remap? Does that matter in terms of calculations?

@dzafman
Copy link
Contributor Author

dzafman commented Jan 24, 2017

Rados run: http://pulpito.ceph.com/dzafman-2017-01-20_12:26:00-rados-wip-calc-stats-distro-basic-smithi/

2 DEAD: Infrastructure

1 FAIL:
-81> 2017-01-20 22:51:50.166044 7fa5f4702700 10 log_client log_queue is 8 last_log 378 sent 369 num 8 unsent 9 sending 9
0> 2017-01-20 22:51:50.166949 7fa5f4702700 -1 /build/ceph-11.1.0-6744-g8423bc4/src/common/LogClient.cc: In function 'Message* LogClient::_get_mon_log_message()' thread 7fa5f4702700 time 2017-01-20 22:51:50.166046
/build/ceph-11.1.0-6744-g8423bc4/src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())

@liewegas
Copy link
Member

I think that if the pool size decreases, acting will also shrink. We certainly have to handle the case where it's smaller than the pool size, though. It's just a little confusing as written, but fine!

@liewegas liewegas merged commit a35a8ec into ceph:master Jan 25, 2017
@dzafman dzafman deleted the wip-calc-stats branch January 25, 2017 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants