rbd-mirror: reduce memory footprint during journal replay #10341

dillaman · 2016-07-19T04:51:47Z

No description provided.

rjfd · 2016-07-20T15:42:41Z

src/test/librbd/fsx.cc

@@ -323,7 +324,7 @@ int register_journal(rados_ioctx_t ioctx, const char *image_name) {
                return r;
        }

-        journal::Journaler journaler(io_ctx, image_id, JOURNAL_CLIENT_ID, 0);
+        journal::Journaler journaler(io_ctx, image_id, JOURNAL_CLIENT_ID, {});


The default commit_interval is 5, but the test was using value 0. If the default value does not matter for the test please ignore this comment!
The same for the remaining similar changes.

I ratcheted it down in the cases where the test case truly depended on looking at the commit position. But in general, this parameter only has any effect after a crash (i.e. maximum number of seconds of events that would need to be replayed in the worst case).

trociny · 2016-07-21T13:36:36Z

src/journal/JournalPlayer.cc

+
+  // trim empty player to prefetch the next available object
+  for (auto &player_pair : m_object_players) {
+    ObjectPlayerPtr object_player(player_pair.second);


@dillaman What is the purpose of this line?

Apparently nothing -- will clean up

Additional runtime configuration settings will be needed. The new class will avoid the need to expand the constructor. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

Support fetching the full object or incremental chunks (with a minimum of at least a single decoded entry if available). Signed-off-by: Jason Dillaman <dillaman@redhat.com>

Journal playback will need to read at least a full entry which was currently limited to the maximum object size. In memory constrained environment, this new optional limit will set a fix upper bound on memory usage. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

Previously it was prefetching up to 2 object sets worth of journal data objects which consumed too much memory. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

Now that it's possible for the ObjectPlayer to only read a partial subset of available entries, the JournalPlayer needs to detect that more entries might be available. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

rbd-mirror debugging involved potentially thousands of journals concurrently running. The instance address will correlate log messages between journals. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

If a future flush is requested at the exact same moment that an overflow is detected, the two threads will deadlock since locks are not taken in a consistent order. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

When streaming playback, avoid the unnecessary watch delay when one or more entries have been pruned. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

Operation request op finish events should not be fire and forget. Instead, ensure the event is committed to the journal before completing the op. This will avoid several possible split-brain events during mirroring. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

Ensure that, by default, IO journal events are broken up into manageable sizes when factoring in that an rbd-mirror daemon might be replaying events from thousands of images. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

Fixes: http://tracker.ceph.com/issues/16223 Signed-off-by: Jason Dillaman <dillaman@redhat.com>

When multiple pools are being replicated, start the shut down process concurrently across all pool replayers. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

Fixed lockdep issue from status update callback and fixed the potential for a stuck status state. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

The cancel request could race with the actual scheduling of the image sync operation. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

librbd will replay these ops when opening an image, so rbd-mirror should also ensure these ops are replayed. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

Signed-off-by: Jason Dillaman <dillaman@redhat.com>

rbd-mirror: reduce memory footprint during journal replay #10341

trociny · 2016-07-30T16:27:38Z

lgtm

dillaman added bug-fix rbd labels Jul 19, 2016

dillaman force-pushed the wip-16223 branch 7 times, most recently from 4f54add to 182ef6c Compare July 20, 2016 14:15

rjfd reviewed Jul 20, 2016
View reviewed changes

dillaman force-pushed the wip-16223 branch 3 times, most recently from 316474b to 52ad1ba Compare July 21, 2016 11:30

trociny reviewed Jul 21, 2016
View reviewed changes

Jason Dillaman added 16 commits July 21, 2016 12:52

journal: helper class for organizing optional settings

dad8328

Additional runtime configuration settings will be needed. The new class will avoid the need to expand the constructor. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

journal: optionally fetch entries in small chunks during replay

f7362e9

Support fetching the full object or incremental chunks (with a minimum of at least a single decoded entry if available). Signed-off-by: Jason Dillaman <dillaman@redhat.com>

journal: replay should only read from a single object set

2666d36

Previously it was prefetching up to 2 object sets worth of journal data objects which consumed too much memory. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

journal: support streaming entry playback

28d5ca1

Now that it's possible for the ObjectPlayer to only read a partial subset of available entries, the JournalPlayer needs to detect that more entries might be available. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

journal: improve debug log messages

11475f4

rbd-mirror debugging involved potentially thousands of journals concurrently running. The instance address will correlate log messages between journals. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

journal: possible deadlock during flush of journal entries

2c65471

If a future flush is requested at the exact same moment that an overflow is detected, the two threads will deadlock since locks are not taken in a consistent order. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

journal: optimize speed of live replay journal pruning

08a8ee9

When streaming playback, avoid the unnecessary watch delay when one or more entries have been pruned. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

rbd-mirror: configuration options to control replay throttling

24883e0

Fixes: http://tracker.ceph.com/issues/16223 Signed-off-by: Jason Dillaman <dillaman@redhat.com>

rbd-mirror: shut down image replayers in parallel

73cdd08

When multiple pools are being replicated, start the shut down process concurrently across all pool replayers. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

rbd-mirror: fix issues detected when attempting clean shut down

0275c7c

Fixed lockdep issue from status update callback and fixed the potential for a stuck status state. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

rbd-mirror: potential memory leak when attempting to cancel image sync

e6cdf95

The cancel request could race with the actual scheduling of the image sync operation. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

rbd-mirror: do not cancel maintenance ops with missing finish events

862e581

librbd will replay these ops when opening an image, so rbd-mirror should also ensure these ops are replayed. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

qa/workunits/rbd: override rbd-mirror integration test poll frequency

574be74

Signed-off-by: Jason Dillaman <dillaman@redhat.com>

dillaman force-pushed the wip-16223 branch from 52ad1ba to 574be74 Compare July 21, 2016 16:53

dillaman changed the title ~~[DNM] rbd-mirror: reduce memory footprint during journal replay~~ rbd-mirror: reduce memory footprint during journal replay Jul 21, 2016

trociny self-assigned this Jul 21, 2016

trociny added the wip-mgolub-testing label Jul 21, 2016

trociny pushed a commit that referenced this pull request Jul 25, 2016

Merge branch 'wip-16223' into wip-mgolub-testing

08fe431

rbd-mirror: reduce memory footprint during journal replay #10341

trociny pushed a commit that referenced this pull request Jul 27, 2016

Merge branch 'wip-16223' into wip-mgolub-testing

4fc7327

rbd-mirror: reduce memory footprint during journal replay #10341

trociny pushed a commit that referenced this pull request Jul 28, 2016

Merge branch 'wip-16223' into wip-mgolub-testing

581c009

rbd-mirror: reduce memory footprint during journal replay #10341

trociny merged commit df2aa58 into ceph:master Jul 30, 2016

dillaman deleted the wip-16223 branch July 30, 2016 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rbd-mirror: reduce memory footprint during journal replay #10341

rbd-mirror: reduce memory footprint during journal replay #10341

dillaman commented Jul 19, 2016

rjfd Jul 20, 2016

dillaman Jul 20, 2016 •

edited

trociny Jul 21, 2016

dillaman Jul 21, 2016

trociny commented Jul 30, 2016

rbd-mirror: reduce memory footprint during journal replay #10341

rbd-mirror: reduce memory footprint during journal replay #10341

Conversation

dillaman commented Jul 19, 2016

rjfd Jul 20, 2016

Choose a reason for hiding this comment

dillaman Jul 20, 2016 • edited

Choose a reason for hiding this comment

trociny Jul 21, 2016

Choose a reason for hiding this comment

dillaman Jul 21, 2016

Choose a reason for hiding this comment

trociny commented Jul 30, 2016

dillaman Jul 20, 2016 •

edited