New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbd: parallelize "rbd ls -l" #15579
rbd: parallelize "rbd ls -l" #15579
Conversation
93b0ab2
to
59b2766
Compare
retest this please |
@dillaman ping? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to take a look at how the "rbd mirror pool" commands handle parallelized image actions (i.e. open the image asynchronously, do something, close the image) [1][2]. You could copy the basic structure from that implementation, ignoring the complications added to generically handle multiple commands with a single state machine.
[1] https://github.com/ceph/ceph/blob/master/src/tools/rbd/action/MirrorPool.cc#L454
[2] https://github.com/ceph/ceph/blob/master/src/tools/rbd/action/MirrorPool.cc#L131
src/tools/rbd/action/List.cc
Outdated
return r < 0 ? r : 0; | ||
} | ||
|
||
void get_arguments(po::options_description *positional, | ||
po::options_description *options) { | ||
options->add_options() | ||
("long,l", po::bool_switch(), "long listing format"); | ||
options->add_options() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to re-use the existing rbd_concurrent_management_ops
instead of introducing a new (equivalent) setting for "rbd ls -l" since it's already used by numerous other CLI actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into it. Thanks!
src/tools/rbd/action/List.cc
Outdated
|
||
int init_worker(std::string pool_name, worker_entry &worker, librados::Rados& rados) | ||
{ | ||
worker.io_ctx = new librados::IoCtx(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
librados::IoCtx
and librbd::RBD
can safely be re-used between threads of execution, so no need allocate multiple on the heap.
src/tools/rbd/action/List.cc
Outdated
{ | ||
worker.io_ctx = new librados::IoCtx(); | ||
worker.rbd = new librbd::RBD(); | ||
worker.img = new librbd::Image(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
librbd::Image
is uses the pimpl pattern (pointer to implementation), so allocating a librbd::Image
on the heap really just allocated space for a pointer to the real implementation.
src/tools/rbd/action/List.cc
Outdated
comp.opened = LIST_IDLE; | ||
continue; | ||
} | ||
r = list_process_image(&rados, comp, lflag, f, tbl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the images now be out-of-order when displayed since nothing is ensuring that the things remain in-sync with the original image list order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for sure. But I don't think it's a real problem, most users grep it or sort their way anyway, especially machine readable formats.
069de0c
to
543ec64
Compare
@dillaman adressed all your comments, except for the templatized, generator-based state machine as I feel it's somewhat overkill for this purpose. |
@branch-predictor Yeah -- agree w/ not re-creating a templated generator, I was just showing an example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it looks ok, but it would be great to switch to the use of OrderedThrottle
so that (1) it remains consistently ordered between "rbd ls" and "rbd ls -l" (it's the little things) and (2) it would allow you to convert WorkerThread
to a C_ImageList
state machine that directly invokes the next state method to eliminate the need for a state enum.
@@ -203,15 +326,7 @@ int execute(const po::variables_map &vm) { | |||
return r; | |||
} | |||
|
|||
librados::Rados rados; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to leave all of this logic for initializing the rados connection instead of creating your own solution
if (threads < 1) { | ||
threads = 1; | ||
} | ||
if (threads > 32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: if someone has a large enough cluster, why limit them to 32? I would just remove this guard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried running it with 100 parallel jobs and it ate almost 140MB of memory, which is quite high, yet that much didn't help in any way. Anything above 150 caused it to crash in lockdep on vstart cluster and I expect it to cause a lot of other problems with absurd amounts of parallel jobs (think of "make -j 322" kind of typo), so IMHO it's better to cap it to some reasonable value now, than collect crash reports later.
src/tools/rbd/action/List.cc
Outdated
string name; | ||
}; | ||
|
||
enum { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: use a named enum a la State
and prefix values with STATE_
instead of LIST_
src/tools/rbd/action/List.cc
Outdated
librbd::RBD* rbd; | ||
librbd::Image img; | ||
librbd::RBD::AioCompletion* completion; | ||
int opened; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: use named enum here instead of generic int
and rename to state
src/tools/rbd/action/List.cc
Outdated
namespace action { | ||
namespace list { | ||
|
||
namespace at = argument_types; | ||
namespace po = boost::program_options; | ||
|
||
int do_list(librbd::RBD &rbd, librados::IoCtx& io_ctx, bool lflag, | ||
Formatter *f) { | ||
struct worker_entry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: perhaps rename to WorkerEntry
src/tools/rbd/action/List.cc
Outdated
return r < 0 ? r : 0; | ||
} | ||
|
||
void init_worker(worker_entry* worker, librbd::RBD* rbd, librados::IoCtx* ioctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: just use a constructor in the struct instead of an external helper function to "construct" it.
src/tools/rbd/action/List.cc
Outdated
return r; | ||
} | ||
|
||
worker_entry* main = new worker_entry(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this worker entry doesn't serve much value -- just directly use the librbd::RBD
object, etc when retrieving the image list.
src/tools/rbd/action/List.cc
Outdated
int do_list(librbd::RBD &rbd, librados::IoCtx& io_ctx, bool lflag, | ||
Formatter *f) { | ||
struct worker_entry { | ||
librados::IoCtx* io_ctx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: io_ctx
and rbd
are unnecessary since they are shared
src/tools/rbd/action/List.cc
Outdated
if (!lockers.empty()) { | ||
lockstr = (exclusive) ? "excl" : "shr"; | ||
} | ||
for (int left = 1; left < std::min(threads, (int)names.size()); left++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: once you eliminate the unnecessary list
worker, this loop could be zero-based and you could emplace_back
-construct the workers instead of heap allocating them.
543ec64
to
239f3f7
Compare
@dillaman Adressed most of your comments. As for the rest, I don't think this is going to be extended in any reasonable manner, anything regarding data collection is done in list_process_image and hence I don't believe it deserves oop-style state machine. That would make perfect sense if each worker would be an actual, physical thread of execution, but that calls for larger changes and more potential for bugs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
src/tools/rbd/action/List.cc
Outdated
auto i = names.begin(); | ||
bool wait_needed = false; // true if no aio finished during previous iteration, so we may | ||
// wait for first aio to finish as well instead of reiterating | ||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@branch-predictor Sorry for the delay -- I was playing with this today and it cannot run under valgrind since this loop turns into an effective 100% busy-loop which eats up all of valgrind's single CPU. If you used a SimpleThrottle (or equivalent), you could at least block until you have work to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dillaman on 239f3f7#diff-d060c6e7d7d39b5bff9393edac1f2531R237 there's a wait_for_complete which does just that. More precisely, it starts all workers in one iteration, then goes over all of them during second iteration, and if during that it finds out that no worker has finished, on third iteration it waits on wait_for_complete() for first busy worker to complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The STATE_DONE
is missing that logic to wait, so if I run w/ 1 concurrent operation, I can see it busy-loop spin while it's waiting for the image to close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I didn't expect aio_close to be that slow. Added a new commit which changes the logic - now it should be better.
@branch-predictor Thanks -- I'll run this through the test suite today |
@branch-predictor Looks like the non-deterministic ordering is causing a test failure: |
97b890b
to
f1d6860
Compare
When a cluster contains a large number of images, "rbd ls -l" takes a long time to finish. In my particular case, it took about 58s to process 3000 images. "rbd ls -l" opens each of image and that takes majority of time, so improve this by using aio_open() and aio_close() to do it asynchronously. This reduced total processing time down to around 15 seconds when using default 10 concurrently opened images. Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
f1d6860
to
8f76fc8
Compare
@dillaman try now, ripped out the previous logic and now processing images in order. |
@dillaman any news? |
@branch-predictor lgtm -- just need to run it through the integration tests again (hopefully this afternoon after the build completes). |
When a cluster contains a large number of images, "rbd ls -l" takes a long time to finish. In my particular case, it took about 58s to process 3000 images.
"rbd ls -l" opens each of image and that takes majority of time, so improve this by using aio_open() and aio_close() to do it asynchronously. This reduced total processing time down to around 15 seconds when using default 8 concurrently opened images.
New option "-t" / "--threads" lets user pick a concurrency level that better suits their environment -- as the async I/O is used, there's a certain point at which increasing concurrency doesn't help (or even hurts), the best thread count depends mostly on single-thread processing speed of CPU on which "rbd ls -l" is ran.
Signed-off-by: Piotr Dałek piotr.dalek@corp.ovh.com