New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd-mirror A/A: proxy InstanceReplayer APIs via InstanceWatcher RPC #13978

Merged
merged 6 commits into from Apr 21, 2017

Conversation

Projects
None yet
2 participants
@trociny
Contributor

trociny commented Mar 15, 2017

Fixes: http://tracker.ceph.com/issues/18787
Signed-off-by: Mykola Golub mgolub@mirantis.com

@trociny trociny requested a review from dillaman Mar 15, 2017

@trociny

This comment has been minimized.

Contributor

trociny commented Mar 15, 2017

@dillaman It includes #13803. It might be easier to review only this PR.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Mar 15, 2017

@trociny OK -- thanks

@dillaman

Perhaps InstanceWatcher's acquire/release methods should just take a Context * to hide all the retry logic. Then after de-duping requests upon receipt of an acquire/release notification, it can directly invoke the acquire/release methods on InstanceReplayer.

@@ -1,12 +1,15 @@
add_library(rbd_mirror_types STATIC
leader_watcher/Types.cc)
leader_watcher/Types.cc
instance_watcher/Types.cc)

This comment has been minimized.

@dillaman

dillaman Mar 16, 2017

Contributor

Nit: "instance_watcher" before "leader_watcher"

std::vector<std::string> *images_to_detach,
std::vector<std::string> *images_to_attach);
void attach(const std::string &instance_id,

This comment has been minimized.

@dillaman

dillaman Mar 16, 2017

Contributor

Nit: should be attached if this will eventually be invoked once the peer responds that attach was successful. Similar comment for detach below.

#include <map>
#include <string>
#include <vector>

This comment has been minimized.

@dillaman

dillaman Mar 16, 2017

Contributor

Nit: not used in header

@@ -69,6 +69,50 @@ class ImageReplayer {
STATE_STOPPED,
};
struct RemoteImage {

This comment has been minimized.

@dillaman

dillaman Mar 16, 2017

Contributor

Nit: I would move this to the types header file since this is a templated class and it's now being used outside this class

template <typename ImageCtxT = librbd::ImageCtx>
class InstanceReplayer {
public:
struct ImagePeer {

This comment has been minimized.

@dillaman

dillaman Mar 16, 2017

Contributor

Nit: same as ImageReplayer::RemoteImage struct -- definitely should move to Types.h

struct ImageAcquirePayload {
static const NotifyOp NOTIFY_OP = NOTIFY_OP_IMAGE_ACQUIRE;

This comment has been minimized.

@dillaman

dillaman Mar 16, 2017

Contributor

I think it might be valuable to put a monotonically increasing request id into each request / response data structure to protect against replays.

This comment has been minimized.

@dillaman

dillaman Mar 16, 2017

Contributor

... additionally, if you add a unique request id, you could have a generic "async op complete" callback instead of requiring individual "image acquired" / "image released" messages (similar to ImageWatcher)

} else {
bufferlist bl;
::encode(NotifyMessage{ImageAcquirePayload{global_image_id, peers}}, bl);
auto req = new NotifyInstanceRequest(

This comment has been minimized.

@dillaman

dillaman Mar 16, 2017

Contributor

Will need to handle re-sending -ETIMEDOUT notifications (until shut down)

}
void finish(int r) override {
op_tracker.finish_op();

This comment has been minimized.

@dillaman

dillaman Mar 16, 2017

Contributor

Nit: debug log for the notification result

}
void send() {
notifier.notify(bl, nullptr, this);

This comment has been minimized.

@dillaman

dillaman Mar 16, 2017

Contributor

Might be nice to update the API to Notifier::notify to replace the out_bl with a pointer to a response struct (map from <gid, cookie> to response payload and list of timed out <gid, cookie> peers). If the response struct has zero peers in the response, you know no instances were registered on the watch.

Mutex::Locker locker(m_lock);
assert(peers.size() == 1U);

This comment has been minimized.

@dillaman

dillaman Mar 16, 2017

Contributor

Nit: I assume this is temporary -- why not directly pass the full collection through?

@trociny

This comment has been minimized.

Contributor

trociny commented Mar 20, 2017

@dillaman I have rebased it against the recent master -- after the pool update notifications have been merged, I don't need ImageMapper to track acquired images, so it is removed from this PR.

I addressed your minor comments, it is still WIP though: need to add notification request id, update the API to Notifier::notify, and address this comment I am not sure I understood:

Perhaps InstanceWatcher's acquire/release methods should just take a Context * to hide all the retry logic. Then after de-duping requests upon receipt of an acquire/release notification, it can directly invoke the acquire/release methods on InstanceReplayer.

Are you talking about methods like notify_image_acquire or handle_image_acquire? I suppose the latter?

Do you mean that instead of passing a listener to the constructor, I can pass ImageReplayer (or let InstanceWatcher to create it)? The InstanceWatcher will need also a way to learn who is the leader
currently to send back image_acquired/released notifications. Right now the Replayer uses the LeaderWatcher for this.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Mar 20, 2017

@trociny I was suggesting that the notify_image_acquire should take a Context * so that InstanceWatcher can hide all the details about retrying failed attempts, ignoring duplicates, etc (sort of like the interaction between librbd::Operations and librbd::ImageWatcher). On leader loss, Replayer could invoke a cancel method on InstanceWatcher to remove all tracking for the callbacks so they won't get retried on failure.

The InstanceWatcher constructor could take a InstanceReplayer pointer at the time of construction (or at the time of init) so that when it receives an acquire/release notification, it could directly invoke the associated image acquire/release methods within the InstanceReplayer w/ another Context * that handles sending the ack -- bypassing the need for a listener.

Of course, now that I think about this more, since the ACKs will be sent to the leader's instance watcher, it makes it difficult to wrap all of this logic unless you only ever instatiate a InstanceWatcher for yourself, and the notify_XYZ methods take an instance id to send the messages to the specific instance.

@trociny

This comment has been minimized.

Contributor

trociny commented Mar 20, 2017

@dillaman

I was suggesting that the notify_image_acquire should take a Context * so that InstanceWatcher can hide all the details about retrying failed attempts, ignoring duplicates, etc (sort of like the interaction between librbd::Operations and librbd::ImageWatcher). On leader loss, Replayer could invoke a cancel method on InstanceWatcher to remove all tracking for the callbacks so they won't get retried on failure.

Then do we really need image_acquired and image_released notifications? Wouldn't just a successful ack on image_acquire and image_release (on Context * completion) be enough?

The InstanceWatcher constructor could take a InstanceReplayer pointer at the time of construction (or at the time of init) so that when it receives an acquire/release notification, it could directly invoke the associated image acquire/release methods within the InstanceReplayer w/ another Context * that handles sending the ack -- bypassing the need for a listener.

I could remove "proxy" logic from InstanceWatcher and provide something similar to Operation::invoke_async_request in Replayer (or in a separate class/file). The InstanceWatcher notify methods then would just send requests to a specific instance.

Of course, now that I think about this more, since the ACKs will be sent to the leader's instance watcher, it makes it difficult to wrap all of this logic unless you only ever instatiate a InstanceWatcher for yourself, and the notify_XYZ methods take an instance id to send the messages to the specific instance.

So, instance_id can be provided as a parameter to the InstanceWatcher notify methods (as it is currently), or Policy/ImageMapper could be used by the InstanceWatcher internally to make a decision about the instance. What do you like more?

@dillaman

This comment has been minimized.

Contributor

dillaman commented Mar 20, 2017

@trociny

Then do we really need image_acquired and image_released notifications? Wouldn't just a successful ack on image_acquire and image_release (on Context * completion) be enough?

Assuming we retry on -ETIMEDOUT, consolidate dups, and prevent stale replays (which we should be doing anyway) -- yes, I think we can eliminate the separate ACK messages.

So, instance_id can be provided as a parameter to the InstanceWatcher notify methods (as it is currently), or Policy/ImageMapper could be used by the InstanceWatcher internally to make a decision about the instance. What do you like more?

I'm fine w/ either -- although since InstanceWatcher will need additional intelligence, it might make sense for it to eventually interact w/ the policy.

@trociny

This comment has been minimized.

Contributor

trociny commented Mar 23, 2017

@dillaman

Assuming we retry on -ETIMEDOUT, consolidate dups, and prevent stale replays (which we should be doing anyway) -- yes, I think we can eliminate the separate ACK messages.

Trying this way I am not sure how to "consolidate dups" properly. I assume the source of dups is a situation like below:

  • A request is received and being executed by something like: handle_request(..., on_finish), where on_finish will eventually complete the watcher's on_notify_ack.
  • The handler is executed too long (e.g. stopping a replayer), the sender is timed out and resends the request.
  • The dup is received, handle_request(..., on_finish) is not executed this time, instead the on_finish from the first request is set to use this new on_notify_ack on complete.

But I don't know what to do with the first on_notify_ack? Just complete with a error code?

There is another situation, when the handle_request is completed in a short window after the request is timed out on the sender side, but just before the sender has resent the request. So when receiving a dup, the request is already completed. I think I can handle this properly by keeping the completed request results for some time (2 * TIMEOUT) and just return it in this case.

@trociny

This comment has been minimized.

Contributor

trociny commented Mar 27, 2017

@dillaman I have pushed the implementation without image_aquired/image_released ACK messages, for brief review. I am going to add tests but the moment that worry me right now is handling of "duplicate for in-progress request" in InstanceWatcher::prepare_request(). I am not sure it is correct.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Mar 28, 2017

But I don't know what to do with the first on_notify_ack? Just complete with a error code?

You would (under lock) delete it and update your request id -> on_notify_ack mapping to point to the new context. In the case where the leader encountered a -ETIMEOUT error, ack the notification is already too late to accomplish anything. In the case where there was a connection hiccup and the OSD re-sent a duplicate notification to the slave instance, the two different on_notify_ack contexts would essentially be tracking the same request so you can't complete both.

There is another situation, when the handle_request is completed in a short window after the request is timed out on the sender side, but just before the sender has resent the request. So when receiving a dup, the request is already completed. I think I can handle this properly by keeping the completed request results for some time (2 * TIMEOUT) and just return it in this case.

In this case, the acquire / release action would have already taken place so you can proceed as normal w/o any additional tracking. Just don't fail if the InstaceReplayer receives a duplicate acquire or release message -- just complete the provided context with a success error code.

@trociny

This comment has been minimized.

Contributor

trociny commented Mar 28, 2017

@dillaman

You would (under lock) delete it and update your request id -> on_notify_ack mapping to point to the new context. In the case where the leader encountered a -ETIMEOUT error, ack the notification is already too late to accomplish anything. In the case where there was a connection hiccup and the OSD re-sent a duplicate notification to the slave instance, the two different on_notify_ack contexts would essentially be tracking the same request so you can't complete both.

But this looks like we are leaking memory in this case?

In this case, the acquire / release action would have already taken place so you can proceed as normal w/o any additional tracking. Just don't fail if the InstaceReplayer receives a duplicate acquire or release message -- just complete the provided context with a success error code.

Could you please look at my InstanceWatcher<I>::prepare_request() and comment what you see wrong with how it handles dups?

@dillaman

This comment has been minimized.

Contributor

dillaman commented Mar 28, 2017

@trociny

But this looks like we are leaking memory in this case?

Hmm, did valgrind complain about a leak? Assuming you delete the stale C_NotifyAck context, the low-level librados call just sends a message to the OSD and doesn't deallocate any previously allocated memory slots.

Could you please look at my InstanceWatcher::prepare_request() and comment what you see wrong with how it handles dups?

I don't see anything immediately wrong - I think it's overly complicated, however. I especially worry about edge conditions w/ long-delayed requests and completed events registered in the m_max_expired_request_ids collection. At a certain point, you have to trust that the state machine on the other end is correct (i.e. it shouldn't be sending remove events before the add has completed). If that's the case, the repeated notifications will never get out-of-order w/ respect to one another.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Mar 28, 2017

@trociny Trying to think about this long-term, the Replayer should initialize the InstanceReplayer before InstanceWatcher and LeaderWatcher. My rationale is that whether or not this instance is the leader, it will be assigned images (by a remote leader or by "itself" if it's the leader).

The InstanceWatcher could be the proxy for its own notifications if it's provided w/ a reference to its associate InstanceReplayer. That will allow the core changes and separation of Replayer and InstanceReplayer to proceed forward w/o supporting RPC to additional peers. Let's get that foundation in place as part of #13803.

That leaves the question of who is responsible for keeping track of the image assignment state machine given the interaction between Replayer::handle_update, the future image distribution policy class, and InstanceWatcher::notify_xyz methods. Short term, I think it would be the policy class's responsibility to track the state since handling callbacks requires more knowledge than ImageWatcher currently has available and I'd like to keep Replayer small(ish). Until we support additional policies (if ever), I think that is a good approach for reducing the complexity in the short-term.

@trociny

This comment has been minimized.

Contributor

trociny commented Mar 28, 2017

@dillaman Thanks. I think now I have a better understanding and will return back with improved and cleaned version.

@trociny

This comment has been minimized.

Contributor

trociny commented Mar 29, 2017

@dillaman

Trying to think about this long-term, the Replayer should initialize the InstanceReplayer before InstanceWatcher and LeaderWatcher. My rationale is that whether or not this instance is the leader, it will be assigned images (by a remote leader or by "itself" if it's the leader).

Agreed, but right now (before the policies are implemented) if I allow the InstanceReplayer to live when we the Replayer is not the leader, I need a method to tell it to stop all image replayers on the leader release. Just shutting it down does the trick. So I can make the InstanceReplayer long living and provide public stop_image_replayers method (which is temporary, as we are not likely to need it later) or do shut down (as I do right now and change this later). Both solutions look like temporal.

The InstanceWatcher could be the proxy for its own notifications if it's provided w/ a reference to its associate InstanceReplayer.

In my current implementation I am providing InstanceWatcher with the Replayer's callbacks instead of InstanceReplayer mostly because on an image acquire I also need to know the remote io_ctx. I.e. right now the Replayer's callback does the following:

  for (auto peer : peers) {
    remote_images.insert({peer.mirror_uuid, peer.image_id, m_remote_io_ctx});
  }
  m_instance_replayer->acquire_image(global_image_id, remote_images, on_finish);

So if we want InstanceReplayer is used directly by InstanceWatcher, we need to decide what to do with the remote context (or rather many contexts in future) -- who should be responsible for their initialization and how InstanceWatcher (or just InstanceReplayer) should learn about them.

@dillaman

This comment has been minimized.

Contributor

dillaman commented Mar 29, 2017

@trociny I think it would be useful to have InstanceReplayer have a stop_all command that could be re-used when it determines it has been blacklisted and needs to shut down. That way we can get the current bootstrap ordering for InstanceReplayer.

With regard to the peers, I think you can just add a new set_peers (or add_peer/remove_peer in the future when we support >1) method to InstanceReplayer. We can clean up the handling of retrieving the remote mirror uuid as well so that it doesn't do a lookup with each Replayer::handle_update callback.

@trociny

This comment has been minimized.

Contributor

trociny commented Mar 29, 2017

@dillaman Ok, I will add InstanceReplayer::stop_all though I am not sure we will use it when being blacklisted, because we stop the Replayer in this case, and InstanceReplayer will be shut down.

As for InstanceReplayer::set_peers method, am I correct understanding it this way, that in the Replayer, after initialization of the InstanceReplayer and remote io_ctx, we call InstanceReplayer::set_peers (providing remote_mirror_uuid, remote_io_ctx pair(s)), and when later InstanceReplayer::acquire_image method is called, its peers param will contain remote_mirror_uuid, remote_image_id pair(s), and the InstanceReplayer will be able to match the remote io_ctx by mirror_uuid?

Don't we expect that mirror_uuid can be changed for the pool?

@dillaman

This comment has been minimized.

Contributor

dillaman commented Mar 29, 2017

@trociny

though I am not sure we will use it when being blacklisted, because we stop the Replayer in this case, and InstanceReplayer will be shut down

True -- I am just trying to avoid having to refactor the state machine in the future. I would hope that having InstanceReplayer iterate through all registered images and invoking the release method would be a smaller change.

The mirror uuid can only change if (pool) mirroring is disabled and re-enabled on the remote side. I can submit a PR to have PoolWatcher handle the mirror_uuid retrieval and refresh it if needed.

@trociny

This comment has been minimized.

Contributor

trociny commented Mar 29, 2017

@dillaman

I can submit a PR to have PoolWatcher handle the mirror_uuid retrieval and refresh it if needed.

Sure I wouldn't object )

@trociny

This comment has been minimized.

Contributor

trociny commented Apr 3, 2017

@dillaman Updated, both this and #13803 (just in case you would like to concentrate on InstanceReplayer only first). Both PR are currently marked DNM, because they depend on #14240.

Also this PR contain DNM hack I am using for testing. It will be removed before merging.

@trociny

This comment has been minimized.

Contributor

trociny commented Apr 4, 2017

@dillaman Testing it I found that handling of "duplicate for in-progress request" case in InstanceWatcher::prepare_request was completely wrong. I updated the code.

@trociny

This comment has been minimized.

Contributor

trociny commented Apr 4, 2017

@dillaman It still crashes. I will fix this tomorrow.

@trociny

This comment has been minimized.

Contributor

trociny commented Apr 5, 2017

I have fixed the crash. Hope it is much better now.

@dillaman

I still think all that expiring request logic in InstanceWatcher can be removed -- we just need to ensure that InstanceReplayer is idempotent when it receives image acquire / release notifications.

NOTIFY_OP_IMAGE_RELEASE = 1,
};
struct ImageAcquirePayload {

This comment has been minimized.

@dillaman

dillaman Apr 12, 2017

Contributor

Nit: I would have these two payload classes inherit from a common ImagePayloadBase class so you don't need to re-implement all the encode/decode/dump methods for the same three parameters.

@@ -17,6 +17,7 @@
#include "include/rados/librados.hpp"
#include "ClusterWatcher.h"
#include "InstanceWatcher.h"

This comment has been minimized.

@dillaman

dillaman Apr 12, 2017

Contributor

Nit: why can't you forward declare anymore?

@@ -46,6 +46,8 @@ class LeaderWatcher : protected librbd::Watcher {
void shut_down(Context *on_finish);
bool is_leader();
bool is_releasing_leader();

This comment has been minimized.

@dillaman

dillaman Apr 12, 2017

Contributor

Nit: can these methods be const?

const boost::optional<std::string> &id = boost::none);
InstanceWatcher(Threads<ImageCtxT> *threads, librados::IoCtx &io_ctx,
InstanceReplayer<ImageCtxT> *instance_replayer,
const boost::optional<std::string> &id);

This comment has been minimized.

@dillaman

dillaman Apr 12, 2017

Contributor

Nit: can we remove the optional id?

This comment has been minimized.

@trociny

trociny Apr 13, 2017

Contributor

Do you mean to remove id argument, or to make it non-optional?

The id is mostly needed only for InstanceWatcher::remove_instance() (and may be for tests). So it is not for public use (and create method does not have it).

I think I can make it non-optional in the constructor and properly set in create().

This comment has been minimized.

@dillaman

dillaman Apr 13, 2017

Contributor

OK, sounds good

uint64_t request_id;
std::string global_image_id;
PeerImageIds peers;

This comment has been minimized.

@dillaman

dillaman Apr 12, 2017

Contributor

I wonder if we can eliminate this from the acquire/release notifications. Each rbd-mirror daemon instance (currently) tracks its peers (albeit only a single peer right now). The leader instance doesn't know which peer (if any) are primary, so regardless it's up to the assigned instance's ImageReplayer to figure that out. Therefore, on acquire the peer instance should have enough information to start. On the release side, just because an image disappears doesn't necessarily imply the primary was deleted. Again, right now we have ImageDeleter checking to see if we should really remove the image. I am thinking the ImageDeleter hooks that currently exist within InstanceReplayer should be moved to ImageReplayer. I'll try to work on this idea tomorrow in the current codebase.

This comment has been minimized.

@trociny

trociny Apr 13, 2017

Contributor

The source of information about peer is the PoolWatcher, which detects a mirroring image addition/removal in the pool. I thought pool watchers would be run only by the instance leader?

Now I think it is more correct to have only one peer in acquire/release notifications (not a list), because a message is going to be generated on a pool state change. Then actually we might not need PeerImageId type (and PeerImage). I added those mainly to be able to pass a list of peers (remote images). When it is always one peer, we can just pass mirror_uuid, image_id as arguments. And the newly introduced ImageReplayer::set_remote_images() should be removed and only add_remote_image()/remove_remote_image() are used.

This comment has been minimized.

@dillaman

dillaman Apr 13, 2017

Contributor

The peers actually come from "Mirror" when it scans the rbd_mirroring object and creates one Replayer per peer (eventually needs to be on Replayer per pool w/ multiple peers). Since an image will eventually be available on all peers, it doesn't really matter which specific peer has the image (or had the image removed) so long as we get notifications to re-verify a given image so that we can determine if we should remove it locally or if we should register ourselves in the journal of the peer.

This comment has been minimized.

@trociny

trociny Apr 13, 2017

Contributor

Sorry, by peer (in the PoolWatcher context) I meant ImagePeer.

I am still not sure I see how we can nicely eliminate ImagePeer from the acquire/release notifications.

Let's we have two peers (remote mirroring pools). An image (primary) is created in one of the pools. The PoolWatcher sends update to the Replayer, and the Replayer sends image_acquire notification. At this moment the Replayer possesses information about the image peer (remote image) and can provide it within image_acquire notification. If it is not sent to the InstanceReplayer, it will need to figure this out itself before calling ImageReplayer::add_remote_image, i.e. scan all available peers for an image with this global id. Do you want me to do this way?

This comment has been minimized.

@dillaman

dillaman Apr 13, 2017

Contributor

When we eventually support multiple peers, ImageReplayer will need to scan all peers regardless. Take the example where the PoolWatcher gets a image added update from peer C -- but peer C's image was actually just a non-primary replica of peer B's primary image. Eventually it should receive a image added update from peer B, but any number factors can lead to a race.

This comment has been minimized.

@trociny

trociny Apr 13, 2017

Contributor

If it is ImageReplayer who should scan all peers, then I suppose ImageReplayer::add_remote_image/remove_remote_image methods should retire too? You added them recently so I thought you did not plan this would be the ImageReplayer job. Then we need to add ImageReplayer::add_peer/remove_peer instead?

Talking about the race you described, it does not look like hurt and will resolve eventually. Anyway, the ImageReplayer looks like will need to recover from such situations to choose a proper peer image. Still the images themselves could be provided externally.

I am ok to do this either way, just want to make it clear.

This comment has been minimized.

@dillaman

dillaman Apr 13, 2017

Contributor

Agreed -- definitely won't hurt. I'm just trying to say that passing around a collection of peers on each image acquire/release probably doesn't matter. If we pass only the single peer that was associated with the original event, perhaps that could provide some optimization in the future so perhaps it doesn't hurt to replace the collection w/ a single peer. If we don't end up using it, at least it can be useful for debugging events.

assert(r == 0);
on_finish->complete(r);
}
};
struct NotifyInstanceRequest : public Context {

This comment has been minimized.

@dillaman

dillaman Apr 12, 2017

Contributor

Nit: usually prefix these w/ C_

acknowledge_notify(notify_id, handle, out);
Mutex::Locker locker(m_lock);
if (m_on_finish != nullptr) {

This comment has been minimized.

@dillaman

dillaman Apr 12, 2017

Contributor

Leaks on_notify_ack

Mutex::Locker locker(m_lock);
if (m_on_finish != nullptr) {

This comment has been minimized.

@dillaman

dillaman Apr 12, 2017

Contributor

Nit: same comment here

@trociny trociny changed the title from [DNM] rbd-mirror A/A: proxy InstanceReplayer APIs via InstanceWatcher RPC to rbd-mirror A/A: proxy InstanceReplayer APIs via InstanceWatcher RPC Apr 17, 2017

@trociny

This comment has been minimized.

Contributor

trociny commented Apr 17, 2017

@dillaman updated

uint64_t request_id = ++m_request_seq;
bufferlist bl;
::encode(NotifyMessage{ImageAcquirePayload{request_id, global_image_id,
peer_mirror_uuid, peer_image_id}}, bl);

This comment has been minimized.

@dillaman

dillaman Apr 19, 2017

Contributor

Nit: spacing

This comment has been minimized.

@trociny

trociny Apr 19, 2017

Contributor

@dillaman not sure what indent you like in this case. peer_mirror_uuid should start from the same position as request_id above?

This comment has been minimized.

@dillaman

dillaman Apr 19, 2017

Contributor

@trociny yeah, preferably it should be indented to the same level as its associated paramters (request_id in this case) or move all the parameters down to a new (indented) line.

uint64_t request_id = ++m_request_seq;
bufferlist bl;
::encode(NotifyMessage{ImageReleasePayload{request_id, global_image_id,
peer_mirror_uuid, peer_image_id, schedule_delete}}, bl);

This comment has been minimized.

@dillaman

dillaman Apr 19, 2017

Contributor

Nit: spacing

req->send();
}
template <typename I>
InstanceWatcher<I> *InstanceWatcher<I>::create(
librados::IoCtx &io_ctx, ContextWQ *work_queue,

This comment has been minimized.

@dillaman

dillaman Apr 19, 2017

Contributor

Nit: 4 spaces indent is cleaner for a hanging indent

uint64_t notifier_id, bufferlist &bl) {
dout(20) << dendl;
void InstanceWatcher<I>::notify_image_acquire(
const std::string &instance_id, const std::string &global_image_id,

This comment has been minimized.

@dillaman

dillaman Apr 19, 2017

Contributor

Nit: 4 spaces indent is cleaner for a hanging indent

if (r == -ETIMEDOUT) {
derr << "C_NotifyInstanceRequest: " << this << " " << __func__
<< ": resending after timeout" << dendl;
send();

This comment has been minimized.

@dillaman

dillaman Apr 19, 2017

Contributor

We are going to need some way to cancel in-flight notifications (e.g. leader died so we will never get a response from it). All in-flight notifications would be canceled on shut down and notifications to a specific peer would be canceled when the leader blacklists a peer due to missed heartbeats (we can add the hook for canceling in-flight notifications to a later PR if this one would grow too large -- but we should track in-flights for this purpose).

Mykola Golub added some commits Mar 15, 2017

Mykola Golub
Mykola Golub
qa/workunits/rbd: don't use deprecated syntax
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Mykola Golub
librbd: race between notify context completion and Notifier destroy
The assert "m_pending_aio_notifies == 0" could fail in the destructor.

Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Mykola Golub
rbd-mirror: possible lockdep false positive
(when InstanceWatcher::remove_instance is called)

Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Mykola Golub
rbd-mirror: remove recently added public peer image types
They were supposed to be used to pass a list of peer images.
It has turned out unnecessary.

Signed-off-by: Mykola Golub <mgolub@mirantis.com>
Mykola Golub
test/librados_test_stub: mock wrapper for aio_notify
Signed-off-by: Mykola Golub <mgolub@mirantis.com>
@trociny

This comment has been minimized.

Contributor

trociny commented Apr 20, 2017

@dillaman updated:

  • indent issues fixed
  • cancel for C_NotifyInstanceRequest added
  • TestMockInstanceWatcher.ImageAcquireRelease improved (which helped to find a bug in C_NotifyInstanceRequest)
  • TestMockInstanceWatcher.ImageAcquireReleaseCancel added

I have not added a hook for canceling in-flight notifications to an instance that is being removed. It looks like not difficult to do by passing InstanceWatcher to Instances, but I would rather to work on it later as a separate PR. The same problem will be with cancelling SYNC notifications, and we might need something more generic.

@dillaman

lgtm

@dillaman dillaman merged commit dee4f31 into ceph:master Apr 21, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@trociny trociny deleted the trociny:wip-18787 branch Apr 21, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment