Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw multisite: use a rados lock to coordinate data log trimming #10546

Merged
merged 5 commits into from Nov 28, 2016

Conversation

cbodley
Copy link
Contributor

@cbodley cbodley commented Aug 2, 2016

This PR is based on top of #10372. I'll rebase once that merges.

You can review the delta here: cbodley/ceph@wip-rgw-data-log-trim...cbodley:wip-rgw-log-trim-lease

@yehudasa
Copy link
Member

yehudasa commented Oct 4, 2016

@cbodley see my comments, also need to rebase this

@cbodley
Copy link
Contributor Author

cbodley commented Oct 4, 2016

@yehudasa ok, will rebase - but i'm not seeing any comments?

RGWRados *store;
RGWHTTPManager *http;
const int num_shards;
const utime_t interval; //< polling interval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbodley should use ceph::real_time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, just noticed that this was just a code that was moved around

RGWHTTPManager *http;
const int num_shards;
const utime_t interval; //< polling interval
const std::string& zone; //< my zone id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/zone/zone_id to be explicit


// prevent others from trimming for our entire wait interval
set_status("acquiring trim lock");
yield call(new RGWSimpleRadosLockCR(store->get_async_rados(), store,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbodley, maybe use RGWContinuousLeaseCR instead? and need to shut it down when done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was by design - if there are multiple gateways in a zone, we only want one of them to attempt trimming each rgw_sync_log_trim_interval. this is accomplished by leaving the lock held for the duration

(we exchanged email about it with the subject line "lease for multisite lock trimming")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbodley right.. I see that discussion. I'm not sure I like the idea of just firing a lease without making the effort of releasing it. Is it a problem releasing it at the end here? Or is it just so that we don't trigger the trim code more frequent that originally planned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a problem releasing it at the end here? Or is it just so that we don't trigger the trim code more frequent that originally planned?

right. if we release the lock then all of the gateways would try to trim, when we only want it to happen once per trim interval

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yehudasa do i you still object to this appoach? any ideas for a better solution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbodley I'm not happy with this one, but let's just comment it clearly

@yehudasa
Copy link
Member

yehudasa commented Oct 4, 2016

@cbodley maybe now?

@cbodley
Copy link
Contributor Author

cbodley commented Oct 4, 2016

rebased and renamed to zone_id

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley
Copy link
Contributor Author

cbodley commented Oct 17, 2016

updated comments to clarify use of rados lock:

      // request a 'data_trim' lock that covers the entire wait interval to
      // prevent other gateways from attempting to trim for the duration
...
        // if the lock is already held, go back to sleep and try again later
...
      // note that the lock is not released. this is intentional, as it avoids
      // duplicating this work in other gateways

Copy link
Member

@yehudasa yehudasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@cbodley
Copy link
Contributor Author

cbodley commented Nov 14, 2016

jenkins test this please

@yehudasa yehudasa merged commit 77b7e30 into ceph:master Nov 28, 2016
@cbodley cbodley deleted the wip-rgw-log-trim-lease branch November 29, 2016 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants