jewel: Hammer (0.94.3) OSD does not delete old OSD Maps in a timely fashion (maybe at all?) #9100

Abhishekvrshny · 2016-05-12T15:58:44Z

http://tracker.ceph.com/issues/15856

in a large cluster, there are better chances that the OSD fails to trim the cached osdmap in a timely manner. and sometimes, it is just unable to keep up with the incoming osdmap if skip_maps, so the osdmap cache can keep building up to over 250GB in size. in this change * publish_superblock() before trimming the osdmaps, so other osdmap consumers of OSDService.superblock won't access the osdmaps being removed. * trim all stale osdmaps in batch of conf->osd_target_transaction_size if skip_maps is true. in my test, it happens when the osd only receives the osdmap from monitor occasionally because the osd happens to be chosen when monitor wants to share a new osdmap with a random osd. * always use dedicated transaction(s) for trimming osdmaps. so even in the normal case where we are able to trim all stale osdmaps in a single batch, a separated transaction is used. we can piggy back the commits for removing maps, but we keep it this way for simplicity. * use std::min() instead MIN() for type safety Fixes: http://tracker.ceph.com/issues/13990 Signed-off-by: Kefu Chai <kchai@redhat.com> (cherry picked from commit 369db99)

smithfarm · 2016-05-12T18:29:46Z

@tchaikov please review - http://tracker.ceph.com/issue/13990 mentions three PRs: #8828 #8990 #9108 but the backports only cherry-pick #8990

tchaikov · 2016-05-13T02:29:37Z

thanks @smithfarm and @Abhishekvrshny , we'd better include #9108 also, once it gets reviewed and merged. but since #8828 does not work under some circumstances, i need to revisit it later.

if we can make this PR into the approaching v10.2.1. probably we should just have #8990 in this backport PR, as #9108 is still pending on review and it "needs-qa". what do you think?

tchaikov · 2016-05-13T04:01:23Z

i piggy backed the revised #8828 in #9108.

smithfarm · 2016-05-13T06:51:42Z

@tchaikov Our backport/stable releases tooling is designed on the (loose) assumption that each backport has a corresponding "master bug". Is it possible to give #9108 its own tracker issue separate from http://tracker.ceph.com/issues/13990 ?

smithfarm · 2016-05-13T06:52:49Z

Put more simply, it makes our lives easier if there is a 1:1 correspondence between issues and PRs.

tchaikov · 2016-05-13T07:09:41Z

@smithfarm then i'd prefer we cherry pick all of them in a single backport PR. what do you think?

i just copied http://tracker.ceph.com/issues/15879, but i think it just makes the already-confusing http://tracker.ceph.com/issues/13990 more complicated..

smithfarm · 2016-05-13T08:12:15Z

i'd prefer we cherry pick all of them in a single backport PR. what do you think?

@tchaikov That would be my preference as well.

tchaikov · 2016-05-20T02:19:45Z

@Abhishekvrshny hi Abhishek, in addition to this commit, could you also cherry-pick following commits?

82b0af7
87850e9

i removed the non-critical commits in #9108 from the list. also, this hammer counterpart of this backport is #9090. thanks!

we should release the osdmap reference once we are done with it, otherwise we might need to wait very long to update that reference with a newer osdmap ref. this appears to be an OSDMap leak: it is held by an quiet OSD::Session forever. the osdmap is not reset in OSD::session_notify_pg_create(), because its only caller is wake_pg_waiters(), which will call dispatch_session_waiting() later. and dispatch_session_waiting() will check the session->osdmap, and will also reset the osdmap if session->waiting_for_pg.empty(). Fixes: http://tracker.ceph.com/issues/13990 Signed-off-by: Kefu Chai <kchai@redhat.com> (cherry picked from commit 82b0af7)

this helps to free the resources referenced by the connection, among other things, in the case of MOSDOp, the OSD::Session and OSDMap. this helps to free the resource earlier and trim the osdmaps in time. Fixes: http://tracker.ceph.com/issues/13990 Signed-off-by: Kefu Chai <kchai@redhat.com> (cherry picked from commit 87850e9)

Abhishekvrshny · 2016-05-27T05:19:29Z

@tchaikov : Done. Please review.

tchaikov · 2016-05-31T05:41:57Z

tested at http://pulpito.ceph.com/kchai-2016-05-30_08:11:32-rados-wip-13990-jewel---basic-smithi/

failed ones are addressed by ceph/ceph-qa-suite#1027 .

Abhishekvrshny self-assigned this May 12, 2016

Abhishekvrshny added bug-fix core labels May 12, 2016

Abhishekvrshny added this to the jewel milestone May 12, 2016

tchaikov changed the title ~~jewel: Hammer (0.94.3) OSD does not delete old OSD Maps in a timely fashion (maybe at all?)~~ [DNM] jewel: Hammer (0.94.3) OSD does not delete old OSD Maps in a timely fashion (maybe at all?) May 13, 2016

tchaikov added 2 commits May 26, 2016 21:58

tchaikov changed the title ~~[DNM] jewel: Hammer (0.94.3) OSD does not delete old OSD Maps in a timely fashion (maybe at all?)~~ jewel: Hammer (0.94.3) OSD does not delete old OSD Maps in a timely fashion (maybe at all?) May 27, 2016

tchaikov merged commit a046d2a into ceph:jewel May 31, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jewel: Hammer (0.94.3) OSD does not delete old OSD Maps in a timely fashion (maybe at all?) #9100

jewel: Hammer (0.94.3) OSD does not delete old OSD Maps in a timely fashion (maybe at all?) #9100

Abhishekvrshny commented May 12, 2016

smithfarm commented May 12, 2016 •

edited

tchaikov commented May 13, 2016

tchaikov commented May 13, 2016

smithfarm commented May 13, 2016

smithfarm commented May 13, 2016

tchaikov commented May 13, 2016 •

edited

smithfarm commented May 13, 2016

tchaikov commented May 20, 2016 •

edited

Abhishekvrshny commented May 27, 2016

tchaikov commented May 31, 2016

jewel: Hammer (0.94.3) OSD does not delete old OSD Maps in a timely fashion (maybe at all?) #9100

jewel: Hammer (0.94.3) OSD does not delete old OSD Maps in a timely fashion (maybe at all?) #9100

Conversation

Abhishekvrshny commented May 12, 2016

smithfarm commented May 12, 2016 • edited

tchaikov commented May 13, 2016

tchaikov commented May 13, 2016

smithfarm commented May 13, 2016

smithfarm commented May 13, 2016

tchaikov commented May 13, 2016 • edited

smithfarm commented May 13, 2016

tchaikov commented May 20, 2016 • edited

Abhishekvrshny commented May 27, 2016

tchaikov commented May 31, 2016

smithfarm commented May 12, 2016 •

edited

tchaikov commented May 13, 2016 •

edited

tchaikov commented May 20, 2016 •

edited