New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw multisite: automated mdlog trimming #13111

Merged
merged 24 commits into from May 12, 2017

Conversation

Projects
None yet
2 participants
@cbodley
Contributor

cbodley commented Jan 25, 2017

implements the coordinated trimming of mdlog entries once they're no longer needed. the algorithm is split between the metadata master zone, which is authoritative for determining which entries are safe to trim, and the non-master peer zones, which only trim up to the master's trim position

the metadata master zone decides which entries are safe to trim by querying the metadata sync status of all peer zones. an mdlog entry is only safe to trim if all peers indicate a sync status marker greater than or equal to that entry. similarly, entire mdlog periods can be safely removed once all peer zones report a realm_epoch higher than that mdlog period. this logic guarantees that no mdlog entries will be trimmed while other zones may try to consume them

the non-master peer zones trim mdlog entries based on the trim info they read from the master zone, which consists of an array of timestamps for each shard of the mdlog. timestamps are used here instead of markers because peers can only trip -up to- the master's oldest entry, so the master reports its oldest timestamp minus a small delta (1ms). old mdlog periods can be removed based on the oldest_realm_epoch reported in the master's mdlog info. this logic guarantees that peers will not trim mdlog entries that are still present on the master, in case that zone is later promoted to master and expected to serve those entries to other zones

as each old mdlog period is removed, it updates the RGWMetadataLogHistory in the meta.history object (as reported by RGWOp_MDLog_Info in the /admin/log api). this update is made atomic with the use of RGWObjVersionTracker, and required adding objv_tracker arguments to RGWSimpleRadosRead/WriteCR and put_system_obj_data() in RGWRados and RGWCache

the mdlog trim process runs every rgw_sync_log_trim_interval (default: 20min) in the same
RGWSyncLogTrimThread used for datalog trimming. to coordinate between multiple gateways in the same zone, a rados lock is held on the meta.history object during trim to avoid duplicating the work. this polling and locking logic is encapsulated in class MetaTrimPollCR

(this branch depends on #13067 and #13070, which are included here as the first 5 commits)

TODO:

  • add radosgw-admin command to run this sync logic (without requiring lease)
  • add test_multi.py test to verify output of radosgw-admin mdlog list after trim
@cbodley

This comment has been minimized.

Contributor

cbodley commented Feb 1, 2017

squashed small fixes, added radosgw-admin mdlog autotrim command, and added trim checks to test_multi_period_incremental_sync() in test_multi.py

@cbodley

This comment has been minimized.

Contributor

cbodley commented Feb 6, 2017

passed teuthology at http://pulpito.ceph.com/cbodley-2017-02-03_15:59:33-rgw-wip-cbodley-testing---basic-mira/

(2 valgrind issues were in the osd, and the one s3test failure was due to requests taking slightly > 30s)

@cbodley cbodley changed the title from [DNM] rgw multisite: automated mdlog trimming to rgw multisite: automated mdlog trimming Feb 6, 2017

@cbodley cbodley requested a review from yehudasa Feb 6, 2017

req(NULL) {
const T& _data, RGWObjVersionTracker *objv_tracker = nullptr)
: RGWSimpleCoroutine(_store->ctx()), async_rados(_async_rados),
store(_store), pool(_pool), oid(_oid), objv_tracker(objv_tracker) {

This comment has been minimized.

@yehudasa

yehudasa Feb 6, 2017

Member

@cbodley lost req initialization

RGWObjVersionTracker *objv;
RGWMetadataLogHistory state;
public:
WriteHistoryCR(RGWRados *store, Cursor cursor, RGWObjVersionTracker *objv)

This comment has been minimized.

@yehudasa

yehudasa Feb 6, 2017

Member

@cbodley why not pass cursor as a const ref?

@@ -2226,3 +2226,124 @@ int RGWCloneMetaLogCoroutine::state_store_mdlog_entries_complete()
}
#undef dout_prefix

This comment has been minimized.

@yehudasa

yehudasa Feb 16, 2017

Member

@cbodley maybe it's better to put this code in a separate .cc file?

using StatusCR = RGWReadRESTResourceCR<rgw_meta_sync_status>;
auto conn = c.second.get();
spawn(new StatusCR(cct, conn, env.http, "/admin/log/", params, &*p),
false);

This comment has been minimized.

@yehudasa

yehudasa Feb 16, 2017

Member

@cbodley some other place we have a window of operations, so that we don't spawn more than X concurrent crs at once. Maybe here is a good candidate for doing the same?

@yehudasa

@cbodley it seems ok, need to make sure we test all cases

cbodley added some commits Jan 20, 2017

rgw: skip sync thread if current period is empty
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: move timelog trim wrapper to header
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: move coroutines out of anonymous namespace
anonymous namespaces do terrible things to name mangling, and this shows
up in our coroutine logging

Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: add objv_tracker arg to RGWRados::put_system_obj_data
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: add RGWRadosRemoveCR
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: add objv_tracker arg to RGWSimpleRadosRead/WriteCR
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: use objv_tracker for mdlog history
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: add CRs for async mdlog history operations
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: add PurgePeriodLogsCR to purge entire mdlog periods
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: add MetaMasterTrimShardCollectCR
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: add MetaMasterTrimCR to query sync status from peers
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: master calculates minimum sync status of peers
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: master purges period mdlogs once all peers are done
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: master trims mdlogs as peers make progress on current period
Signed-off-by: Casey Bodley <cbodley@redhat.com>

cbodley added some commits Jan 23, 2017

rgw: add MetaPeerTrimCR to query master mdlog info
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: peer purges mdlog periods before master's oldest
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: add MetaPeerTrimShardCR to trim mdlog shards
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: peer trims mdlog shards up to master's oldest entry
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: add MetaTrimPollCR to coordinate polling and leases
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: RGWSyncLogTrimThread runs mdlog trim
Signed-off-by: Casey Bodley <cbodley@redhat.com>
radosgw-admin: add 'mdlog autotrim' command
Signed-off-by: Casey Bodley <cbodley@redhat.com>
test/rgw: test for mdlog trimming
added to existing test_multi_period_incremental_sync() because we want
to test trimming old mdlog periods

Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: use RGWShardCollectCR in MetaMasterTrimCR
limit the number of concurrent sync status requests to peers

Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: add TODOs to split trim logic into separate source files
Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley

This comment has been minimized.

Contributor

cbodley commented Apr 27, 2017

pushed an update:

  • rebased, resolving conflicts with rgw_raw_obj and test_multi.py
  • removed prerequisite commits that merged in #13067 and #13070
  • addressed review comments (except for splitting trim into separate source files - i added // TODO: move into rgw_sync_trim.cc for now)

consistently passing trim test in test_multi_period_incremental_sync(). will get another run through teuthology

@cbodley

This comment has been minimized.

@cbodley

This comment has been minimized.

Contributor

cbodley commented May 1, 2017

@yehudasa should be ready to merge. anything else you'd like to see here?

@cbodley

This comment has been minimized.

Contributor

cbodley commented May 12, 2017

ping @yehudasa

@yehudasa yehudasa merged commit 404cee7 into ceph:master May 12, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@cbodley cbodley deleted the cbodley:wip-rgw-mdlog-trim branch May 12, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment