New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: allow multiple DeferredBatches in flight at once #16769

Merged
merged 3 commits into from Aug 4, 2017

Conversation

Projects
None yet
4 participants
@liewegas
Member

liewegas commented Aug 3, 2017

Fixes: http://tracker.ceph.com/issues/20295

@liewegas liewegas changed the title from mon: show class in 'osd crush tree'; sort result to os/bluestore: allow multiple DeferredBatches in flight at once Aug 3, 2017

@liewegas liewegas requested a review from xiexingguo Aug 3, 2017

@liewegas liewegas added this to the luminous milestone Aug 3, 2017

@xiexingguo

This comment has been minimized.

Show comment
Hide comment
@xiexingguo

xiexingguo Aug 4, 2017

Member

http://qa-proxy.ceph.com/teuthology/sage-2017-08-03_22:08:13-rbd:singleton-bluestore-wip-20295-b-distro-basic-mira/1480065

2017-08-03T22:22:59.793 INFO:tasks.workunit.client.0.mira058.stderr:+ ceph osd pool set ecpool allow_ec_overwrites true
2017-08-03T22:23:01.802 INFO:tasks.workunit.client.0.mira058.stderr:set pool 2 allow_ec_overwrites to true
2017-08-03T22:23:01.823 INFO:tasks.workunit.client.0.mira058.stderr:+ rbd --data-pool ecpool create --size 1024G test1
2017-08-03T22:23:03.888 INFO:tasks.workunit.client.0.mira058.stderr:+ rbd bench --io-type write --io-size 4096 --io-pattern=rand --io-total 100M test1
2017-08-03T22:23:03.957 INFO:tasks.workunit.client.0.mira058.stdout:bench  type write io_size 4096 io_threads 16 bytes 104857600 pattern random
2017-08-03T22:24:23.923 INFO:tasks.workunit.client.0.mira058.stdout:  SEC       OPS   OPS/SEC   BYTES/SEC
2017-08-03T22:24:23.923 INFO:tasks.workunit.client.0.mira058.stdout:    2      5281   1997.72  8182640.85
2017-08-03T22:24:23.923 INFO:tasks.workunit.client.0.mira058.stdout:    3      5320   1567.09  6418808.08
2017-08-03T22:24:23.923 INFO:tasks.workunit.client.0.mira058.stdout:    4      5381   1342.48  5498798.77
2017-08-03T22:24:23.923 INFO:tasks.workunit.client.0.mira058.stdout:    5      5431   1080.87  4427227.47
2017-08-03T22:24:23.923 INFO:tasks.workunit.client.0.mira058.stdout:    6      5537    912.57  3737866.87
2017-08-03T22:24:23.923 INFO:tasks.workunit.client.0.mira058.stdout:    7      5612     75.53  309373.91
2017-08-03T22:24:23.924 INFO:tasks.workunit.client.0.mira058.stdout:    8      5670     75.99  311268.38
2017-08-03T22:24:23.924 INFO:tasks.workunit.client.0.mira058.stdout:    9      5751     74.03  303226.42
2017-08-03T22:24:23.924 INFO:tasks.workunit.client.0.mira058.stdout:   10      5802     74.42  304824.96
2017-08-03T22:24:23.924 INFO:tasks.workunit.client.0.mira058.stdout:   11      5900     72.71  297819.10
2017-08-03T22:24:23.924 INFO:tasks.workunit.client.0.mira058.stdout:   12      6018     81.73  334782.21
2017-08-03T22:24:23.924 INFO:tasks.workunit.client.0.mira058.stdout:   13      6127     91.25  373756.93
2017-08-03T22:24:23.924 INFO:tasks.workunit.client.0.mira058.stdout:   14      6167     81.74  334816.56
2017-08-03T22:24:23.924 INFO:tasks.workunit.client.0.mira058.stdout:   15      6296     95.00  389118.11
2017-08-03T22:24:23.925 INFO:tasks.workunit.client.0.mira058.stdout:   16      6372     92.58  379220.42
2017-08-03T22:24:23.925 INFO:tasks.workunit.client.0.mira058.stderr:2017-08-03 22:24:23.924379 7fc2fa366700  1 heartbeat_map is_healthy 'librbd::thread_pool thread 0x7fc2eaffd700' had timed out after 60
2017-08-03T22:24:27.122 INFO:tasks.workunit.client.0.mira058.stderr:2017-08-03 22:24:27.124338 7fc2eaffd700  1 heartbeat_map reset_timeout 'librbd::thread_pool thread 0x7fc2eaffd700' had timed out after 60

The latest run implies the problem persists, or new problem?

Member

xiexingguo commented Aug 4, 2017

http://qa-proxy.ceph.com/teuthology/sage-2017-08-03_22:08:13-rbd:singleton-bluestore-wip-20295-b-distro-basic-mira/1480065

2017-08-03T22:22:59.793 INFO:tasks.workunit.client.0.mira058.stderr:+ ceph osd pool set ecpool allow_ec_overwrites true
2017-08-03T22:23:01.802 INFO:tasks.workunit.client.0.mira058.stderr:set pool 2 allow_ec_overwrites to true
2017-08-03T22:23:01.823 INFO:tasks.workunit.client.0.mira058.stderr:+ rbd --data-pool ecpool create --size 1024G test1
2017-08-03T22:23:03.888 INFO:tasks.workunit.client.0.mira058.stderr:+ rbd bench --io-type write --io-size 4096 --io-pattern=rand --io-total 100M test1
2017-08-03T22:23:03.957 INFO:tasks.workunit.client.0.mira058.stdout:bench  type write io_size 4096 io_threads 16 bytes 104857600 pattern random
2017-08-03T22:24:23.923 INFO:tasks.workunit.client.0.mira058.stdout:  SEC       OPS   OPS/SEC   BYTES/SEC
2017-08-03T22:24:23.923 INFO:tasks.workunit.client.0.mira058.stdout:    2      5281   1997.72  8182640.85
2017-08-03T22:24:23.923 INFO:tasks.workunit.client.0.mira058.stdout:    3      5320   1567.09  6418808.08
2017-08-03T22:24:23.923 INFO:tasks.workunit.client.0.mira058.stdout:    4      5381   1342.48  5498798.77
2017-08-03T22:24:23.923 INFO:tasks.workunit.client.0.mira058.stdout:    5      5431   1080.87  4427227.47
2017-08-03T22:24:23.923 INFO:tasks.workunit.client.0.mira058.stdout:    6      5537    912.57  3737866.87
2017-08-03T22:24:23.923 INFO:tasks.workunit.client.0.mira058.stdout:    7      5612     75.53  309373.91
2017-08-03T22:24:23.924 INFO:tasks.workunit.client.0.mira058.stdout:    8      5670     75.99  311268.38
2017-08-03T22:24:23.924 INFO:tasks.workunit.client.0.mira058.stdout:    9      5751     74.03  303226.42
2017-08-03T22:24:23.924 INFO:tasks.workunit.client.0.mira058.stdout:   10      5802     74.42  304824.96
2017-08-03T22:24:23.924 INFO:tasks.workunit.client.0.mira058.stdout:   11      5900     72.71  297819.10
2017-08-03T22:24:23.924 INFO:tasks.workunit.client.0.mira058.stdout:   12      6018     81.73  334782.21
2017-08-03T22:24:23.924 INFO:tasks.workunit.client.0.mira058.stdout:   13      6127     91.25  373756.93
2017-08-03T22:24:23.924 INFO:tasks.workunit.client.0.mira058.stdout:   14      6167     81.74  334816.56
2017-08-03T22:24:23.924 INFO:tasks.workunit.client.0.mira058.stdout:   15      6296     95.00  389118.11
2017-08-03T22:24:23.925 INFO:tasks.workunit.client.0.mira058.stdout:   16      6372     92.58  379220.42
2017-08-03T22:24:23.925 INFO:tasks.workunit.client.0.mira058.stderr:2017-08-03 22:24:23.924379 7fc2fa366700  1 heartbeat_map is_healthy 'librbd::thread_pool thread 0x7fc2eaffd700' had timed out after 60
2017-08-03T22:24:27.122 INFO:tasks.workunit.client.0.mira058.stderr:2017-08-03 22:24:27.124338 7fc2eaffd700  1 heartbeat_map reset_timeout 'librbd::thread_pool thread 0x7fc2eaffd700' had timed out after 60

The latest run implies the problem persists, or new problem?

@liewegas

This comment has been minimized.

Show comment
Hide comment
@liewegas

liewegas Aug 4, 2017

Member
Member

liewegas commented Aug 4, 2017

smithfarm and others added some commits Aug 1, 2017

tests: rbd: reproducer for rbd-on-EC issue
This introduces a new "rbd/singleton-bluestore" suite because creating an rbd
on an EC-backed datapool will fail on filestore.

References: http://tracker.ceph.com/issues/20295
Signed-off-by: Nathan Cutler <ncutler@suse.com>
os/bluestore: allow multiple DeferredBatches in flight at once
The current code only allows two DeferredBatches per osr: one that is
accumulating new writes and one that is currently in flight to disk.  If
the previous batch is in flight ot disk, then the currently accumulating
one can't also be queued for disk.

This can cause problems, notably that described in
http://tracker.ceph.com/issues/20295, where one transaction is trying to
grab deferred_throttle but cannot due to other in-progress txcs that
include deferred IO.  The short version is that it cannot queue all of the
IO needed, and that later when the IO does complete it is awkward to
determine whether other IO queued behind it also needs to be queued
immediately.  And since it's not, this leads to a deadlock/stall.

Simply allowing multiple batches of IO to be in flight at once is a simple
fix.  Specifically, in queue_transactions(), if throttle_deferred_bytes
get_or_fail() fails, we can now deferred_submit_all() and be sure that
other IO will be submitted and thus complete and release the throttle that
we need to continue.

It is possible that the deferred_aggressive behavior could be simplified
now that the old restriction is dropped, but that needs a closer review
of the code.

Fixes: http://tracker.ceph.com/issues/20295
Signed-off-by: Sage Weil <sage@redhat.com>
os/bluestore: set deferred_aggressive if initial throttle get fails
This ensures that in-progress transactions with deferred writes queue their
IO immediately.  Otherwise, we may end up waiting indefinitely.

This is a biggish hammer.

Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas

This comment has been minimized.

Show comment
Hide comment

@liewegas liewegas merged commit 9c7a653 into ceph:master Aug 4, 2017

4 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details

@liewegas liewegas deleted the liewegas:wip-20295-b branch Aug 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment