New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: fix buffers pinned by indefinitely deferred writes #15398

Merged
merged 2 commits into from Jun 2, 2017

Conversation

Projects
None yet
3 participants
@liewegas
Member

liewegas commented May 31, 2017

Two problems:

  • We could queue a deferred write and then never submit it if enough
    other deferred writes didn't come along. This would pin TransContexts
    in the OpSequencer queue indefinitely.

  • TransContexts' IOContext have a running_aios list with aio_t's, each
    with a bufferlist pinning the buffers under IO. Even after the IO
    was completed, this list wasn't cleared, which meant buffers were
    pinned as long as the TransContext stuck around.

The combination of these two things could make memory usage balloon
under certain workloads!

@liewegas liewegas requested a review from ifed01 May 31, 2017

@markhpc

This comment has been minimized.

Show comment
Hide comment
@markhpc

markhpc May 31, 2017

Member

So far this is working beautifully. After 30 minutes of 4k random writes with 4k min alloc size the OSD is using approx 1.5GB RSS while previously it would be using 6GB+ RSS.

Member

markhpc commented May 31, 2017

So far this is working beautifully. After 30 minutes of 4k random writes with 4k min alloc size the OSD is using approx 1.5GB RSS while previously it would be using 6GB+ RSS.

@@ -7815,6 +7816,10 @@ void BlueStore::_txc_finish(TransContext *txc)
// for _osr_drain_preceding()
notify = true;
}
if (txc->state == TransContext::STATE_DEFERRED_QUEUED &&

This comment has been minimized.

@ifed01

ifed01 Jun 1, 2017

Contributor

what's about an alternative approach to submit deferred txcs if deferred queue hasn't been updated for some time?

@ifed01

ifed01 Jun 1, 2017

Contributor

what's about an alternative approach to submit deferred txcs if deferred queue hasn't been updated for some time?

Show outdated Hide outdated src/os/bluestore/BlueStore.cc Outdated

liewegas added some commits May 31, 2017

os/bluestore: submit deferred if txc cleanup is blocked
If we have a single deferred write, and then a uniform workload with *no*
deferred writes, we will never actually submit it.  Meanwhile, the txc is
stuck on the osr q and nothing ever gets retired.

Simple fix is to submit any deferred ops if the osr queue is blocked by
a queued deferred write and the osr queue length is above some
threshold.  This prevents memory from being pinned indefinitely.

Signed-off-by: Sage Weil <sage@redhat.com>
os/bluestore: release aios and pinned buffers on io complete
Once we're done with our IO, clear the aio list so that the pinned buffers
are unpinned.  This ensures we release memory quickly, even if the
TransContext sticks around for a while (e.g., in the osr q).

Signed-off-by: Sage Weil <sage@redhat.com>
@ifed01

ifed01 approved these changes Jun 1, 2017

@liewegas liewegas merged commit ecef6fd into ceph:master Jun 2, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@liewegas liewegas deleted the liewegas:wip-bluestore-leak branch Jun 2, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment