-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add nbd stream to live migration #44640
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it looks like a good idea. At least I don't see reasons why not.
As I mentioned in the comment below my main concern is yet another library dependency.
I left some comments, though I have not looked carefully and most likely missed some minor things.
And we will need to update the documentation and add some tests (if we decide to have this).
And just as a note. We have the PR #44447 where we are discussing adding "sparsify" feature to raw format. Though it will still read zeroes from the source and would be less optimal than nbd_stream.
Thanks for the insightful comments @trociny. I'll proceed with proper testing/updating docs/handling your comments and update the PR when ready. |
I have updated the PR as per your suggestions. Can you please take a look @trociny? |
src/librbd/migration/NBDStream.cc
Outdated
} | ||
|
||
while (nbd_aio_in_flight(nbd) > 0) { | ||
if (nbd_poll (nbd, -1) == -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still does not make it async, as you are blocking here for the request to complete. What I meant is to register a callback in nbd_aio_pread
(instead of NBD_NULL_COMPLETION
), which would check the return value and call on_finish->complete(r)
(on_finish
could be stored in user_data
in nbd_completion_callback
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The libnbd library does not spawn its own threads, so the code spawns a new thread in the read() function. The read function then immediately returns without waiting for the actual processing of the read in ReadRequest. This is similar to how the File stream handles aio. But in addition, I switched to using the nbd_completion_callback. Is that resonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite sure I understand what you mean by "the code spawns a new thread in the read() function".
Anyway, My idea is that if you set a read callback you don't need call nbd_poll
to wait until the read is complete and may just return immediately. And when the callback is callet it will complete the on_finish
context. At least this is how I suppose the nbd_aio API could be used here (I have not tried it myself).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I wasn't clear. The call to boost::asio::post(m_strand, ctx { ctx->send_read(); }); in NBDStream::read() immediately returns control, so we are not blocking. But regardless, I still believe we need the nbd_poll. Libnbd uses an event loop (see https://manpages.ubuntu.com/manpages/focal/man3/libnbd.3.html) and we either call nbd_poll or implement our our own version of checking for events. BTW, I updated the code in question and pushed an update. If you'll take a look you'll see I switched to using the callback, but I still needed to keep the nbd_poll call. Without it, we never get into the callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds weird to me. I don't have experience with libnbd but but your current aio version does not make much sense to me. I don't see why to use aio read if you are still blocking on poll. Then your initial "sync" version would be easier.
I am going to try your code this weekend and see what is going there.
src/librbd/migration/NBDStream.cc
Outdated
auto rc = nbd_aio_pread(nbd, boost::asio::buffer_cast<void *>(buffer), | ||
byte_length, byte_offset, | ||
(nbd_completion_callback) { .callback=task_completed, | ||
.user_data=on_finish }, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still in the process of trying it but here is one thing that I just noticed is definitely wrong. When I suggested to just provide on_finish
as user_data
and complete it in the callback, I missed that we are doing it in the for
loop for every extent. And we may compete it only once when all callbacks are completed. So it should be a bit more complicated. E.g. C_Gather
may be used (you may find many examples in librbd source), or you could implement your own "completion" class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. Good catch. I'll see what I can do to handle it better. BTW, I hope you and you loved ones are safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @trociny I hope all is well. I recognize that you might have more pressing issues to deal with, so take your time. I pushed back the original handling of the on_finish which calls it once the requests are fully served. The current version uses a boost asio strand to avoid blocking the main caller. I am not certain we really gain much benefit from the using the async version of the api other than possibly batching them. And if I look at the file stream handling, I see that there a strand is also used, but the data is read serially. We can switch back to that model if you prefer it.
@effi-ofer I read more carefully the libnbd(3) man page and see now what you meant. Ok, so using nbd asynchronous API is not so simple as I initially assumed. We still could eventually use it but not in the way you implemented. Because the way you are using it does not differ from using the sync API, just more complicated. It looks like to use it effectively we would need to have a separate thread for polling. If you don't have plans to work on this (at least not in this initial PR) then I suggest returning back to your sync version. And it might make sense to learn @idryomov opinion about this feature first, if we want to add libnbd dependency. So you would not waste your time working on a feature that will not be merged. |
Hey @trociny it's great hearing from you. I hope all is well. The PR used a boost thread similar to how the file source is done. Is this what you mean by a different thread for polling? Or did you had in mind something else? I spoke with @idryomov last Monday or Tue and made sure he's aware of the libnbd dependency. |
No. My idea was that NBDStream::read would just call The way you implemented ( |
So basically add a thread that will wait for nbd requests, right? I don't mind making this change, but note again that this differs from how the file stream is handled. Also, we can't switch to the sync calls without a separate thread because then the caller will hang. Of course we could just do the sync calls within a separate thread. Or maybe that's what you meant. |
If you decide to use the libnbd sync API (or async API but in the "sync" way you do currently, i.e. polling in the same thread, which I don't like) then yes, you should do it the same way as we have for file stream, which uses "sync" calls. And currently I don't insist on using async API for this implementation. I suggested it initially because I thought it would be rather easy to use, which is not. And the way with a thread is a way that I only see, but people might have some other suggestions. And actually I have no idea how much it will improve things in comparison with the simple "sync" implementation, unless we try both. |
Understood. Thanks for the clarification. I'll switch to the sync api with the boost thread. |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
Moving out of stale. Waiting for review. |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
cacce0a
to
a3782fa
Compare
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
@idryomov @trociny I would appreciate your input. The following PR is more of a WIP to get your feedback rather than a fully baked and tested PR. FYI @orozery
RBD images can be live-migrated from external data sources. The sources may be native rbd, raw file, or qcow2 and may be streamed from file, http, or s3. However, ceph's qcow2 implementation is incomplete and lacks encryption, compression and backing files support. As a workaround, one can live-importing qcow2 images by first mounting the source qcow2 images using qemu-nbd which exposes the image as a raw file. Unfortunately, by importing as a raw file we lose information about the sparseness of an image.
I therefore would like to suggest adding an NBD source to live-migration which will allow us to mount a qcow2 image using qemu-nbd and then live-import the image using the native NBD protocol (via a socket rather than a raw file). This has multiple advantages. It enables live import to preserve sparseness, performs better (it doesn't go through the kernel), makes it possible to execute the live import from one machine while the image is accessed from other compute nodes without having to qemu-nbd/kernel mount the image on multiple machines. And it is also a step in the direction of having full qcow2 support with multiple differential backing images.
Note that by creating an NBD stream, I had to 'overload' the list_snap() so it gets executed by the stream, If you find this ugly, an alternative is to introduce an NBD format rather than a stream. Let me know what you think and thanks in advance.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox