New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jewel: Hammer (0.94.3) OSD does not delete old OSD Maps in a timely fashion (maybe at all?) #9100
Conversation
in a large cluster, there are better chances that the OSD fails to trim the cached osdmap in a timely manner. and sometimes, it is just unable to keep up with the incoming osdmap if skip_maps, so the osdmap cache can keep building up to over 250GB in size. in this change * publish_superblock() before trimming the osdmaps, so other osdmap consumers of OSDService.superblock won't access the osdmaps being removed. * trim all stale osdmaps in batch of conf->osd_target_transaction_size if skip_maps is true. in my test, it happens when the osd only receives the osdmap from monitor occasionally because the osd happens to be chosen when monitor wants to share a new osdmap with a random osd. * always use dedicated transaction(s) for trimming osdmaps. so even in the normal case where we are able to trim all stale osdmaps in a single batch, a separated transaction is used. we can piggy back the commits for removing maps, but we keep it this way for simplicity. * use std::min() instead MIN() for type safety Fixes: http://tracker.ceph.com/issues/13990 Signed-off-by: Kefu Chai <kchai@redhat.com> (cherry picked from commit 369db99)
thanks @smithfarm and @Abhishekvrshny , we'd better include #9108 also, once it gets reviewed and merged. but since #8828 does not work under some circumstances, i need to revisit it later. if we can make this PR into the approaching v10.2.1. probably we should just have #8990 in this backport PR, as #9108 is still pending on review and it "needs-qa". what do you think? |
@tchaikov Our backport/stable releases tooling is designed on the (loose) assumption that each backport has a corresponding "master bug". Is it possible to give #9108 its own tracker issue separate from http://tracker.ceph.com/issues/13990 ? |
Put more simply, it makes our lives easier if there is a 1:1 correspondence between issues and PRs. |
@smithfarm then i'd prefer we cherry pick all of them in a single backport PR. what do you think? i just copied http://tracker.ceph.com/issues/15879, but i think it just makes the already-confusing http://tracker.ceph.com/issues/13990 more complicated.. |
@tchaikov That would be my preference as well. |
@Abhishekvrshny hi Abhishek, in addition to this commit, could you also cherry-pick following commits?
i removed the non-critical commits in #9108 from the list. also, this hammer counterpart of this backport is #9090. thanks! |
we should release the osdmap reference once we are done with it, otherwise we might need to wait very long to update that reference with a newer osdmap ref. this appears to be an OSDMap leak: it is held by an quiet OSD::Session forever. the osdmap is not reset in OSD::session_notify_pg_create(), because its only caller is wake_pg_waiters(), which will call dispatch_session_waiting() later. and dispatch_session_waiting() will check the session->osdmap, and will also reset the osdmap if session->waiting_for_pg.empty(). Fixes: http://tracker.ceph.com/issues/13990 Signed-off-by: Kefu Chai <kchai@redhat.com> (cherry picked from commit 82b0af7)
this helps to free the resources referenced by the connection, among other things, in the case of MOSDOp, the OSD::Session and OSDMap. this helps to free the resource earlier and trim the osdmaps in time. Fixes: http://tracker.ceph.com/issues/13990 Signed-off-by: Kefu Chai <kchai@redhat.com> (cherry picked from commit 87850e9)
@tchaikov : Done. Please review. |
tested at http://pulpito.ceph.com/kchai-2016-05-30_08:11:32-rados-wip-13990-jewel---basic-smithi/ failed ones are addressed by ceph/ceph-qa-suite#1027 . |
http://tracker.ceph.com/issues/15856