Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: multisite log tracing #16492

Merged
merged 20 commits into from Sep 11, 2017

Conversation

Projects
None yet
8 participants
@yehudasa
Copy link
Member

commented Jul 21, 2017

A new framework that tracks in memory of current rgw sync process. The new system allows following a specific sync entity (vs. the current flat log dump). Each entity has an id that is a concatenation of the path within the execution tree. An entity would roughly reflect a sync role (meta, meta shard, meta entry, data, data shard, bucket shard, object), and it is possible to look at the history of that entity's point of view. New admin socket commands were added:

    "sync trace active": "show active multisite sync entities information"
    "sync trace active_short": "show active multisite sync entities entries"
    "sync trace history": "sync trace history [filter_str]: show history of multisite tracing information"
    "sync trace show": "sync trace show [filter_str]: show current multisite tracing information"

All commands can get an extra param that is a regex that can be used to search for a specific entity (matching the history of that entity). E.g.,

$ ceph --admin-daemon=/home/yehudasa/ceph/build/run/c2/out/radosgw.8001.asok sync trace show meta.*shard.13
{
    "running": [
        {
            "status": "meta:shard[13]: took lease"
        }
    ],
    "complete": []
}

We keep info about all current running nodes, and also keep some history of complete nodes. We can see a view of current nodes that are marked as active (where we identified and deal with actual meta/data sync):

$ ceph --admin-daemon=/home/yehudasa/ceph/build/run/c1/out/radosgw.8000.asok sync trace active
{
    "running": [
        {
            "status": "data:sync:shard[115]:entry[buck:a6fabc5f-23bc-473c-9b05-ae8d1948dc08.4185.5]:bucket[buck:a6fabc5f-23bc-473c-9b05-ae8d1948dc08.4185.5]:inc_sync[buck:a6fabc5f-23bc-473c-9b05-ae8d1948dc08.4185.5]: listing bilog for incremental sync"
        },
        {
            "status": "data:sync:shard[115]:entry[buck:a6fabc5f-23bc-473c-9b05-ae8d1948dc08.4185.5]:bucket[buck:a6fabc5f-23bc-473c-9b05-ae8d1948dc08.4185.5]:inc_sync[buck:a6fabc5f-23bc-473c-9b05-ae8d1948dc08.4185.5]:entry[fff3]: bucket sync: sync obj: 8ef1cd9f-d0f2-485d-ba1e-c02f1f085b1c/buck[a6fabc5f-23bc-473c-9b05-ae8d1948dc08.4185.5])/fff3[0]"
        }
    ],
    "complete": []
}

There's a active_short option that can be used to just get a plain name of the entities that are current syncing (e.g., list of /). This list is being sent to the service map periodically and can be retrieved via the ceph service status command.

@mattbenjamin

This comment has been minimized.

Copy link
Contributor

commented Jul 24, 2017

@yehudasa I spent some time trying to find something worrisome in this; I haven't found it so far. I don't -think- sending updates to the mgr at 10s intervals should be costly at either end? I did wonder whether we have a modern idiom replacing RWLock?

@mattbenjamin

This comment has been minimized.

Copy link
Contributor

commented Jul 24, 2017

@yehudasa ok, one practical question; this change creates
+class RGWSyncTraceServiceMapThread : public RGWRadosThread {
which seems to mainly manage periodic mgr updates; would it make sense to consolidate this with other periodic mgr updates?

@liewegas liewegas added the rgw label Aug 8, 2017

@yehudasa

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2017

@mattbenjamin while we update the status every 10 seconds, it is librados that decides when to send these updates, following its own internal config

@yehudasa

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2017

would it make sense to consolidate this with other periodic mgr updates?

@mattbenjamin I don't recall any such periodic mgr thread that can be used. I do think that it'd be nice to have something we could hook into, but maybe that's out of scope.

@yehudasa yehudasa force-pushed the yehudasa:wip-rgw-multi-log branch from 17ff646 to 2e66311 Aug 14, 2017

@yehudasa

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2017

@mattbenjamin rebased

};

/*
* a container to RGWSyncTraceNodeRef, responsible to keep track

This comment has been minimized.

Copy link
@amitkumar50

amitkumar50 Aug 14, 2017

Contributor

Let's start sentence with capital letter. Let's make a capital.

* of live nodes, and when last ref is dropped, calls ->finish()
* so that node moves to the retired list in the manager
*/
class RGWSyncTraceNodeContainer {

This comment has been minimized.

Copy link
@cbodley

cbodley Aug 15, 2017

Contributor

this wrapper adds an extra allocation to create and indirection to access. consider using a custom deleter with shared_ptr instead? this works for me:

-using RGWSTNCRef = std::shared_ptr<RGWSyncTraceNodeContainer>;
 
-RGWSTNCRef RGWSyncTraceManager::add_node(RGWSyncTraceNode *node)
+RGWSyncTraceNodeRef RGWSyncTraceManager::add_node(RGWSyncTraceNode *node)
 {
   RWLock::WLocker wl(lock);
   RGWSyncTraceNodeRef& ref = nodes[node->handle];
   ref.reset(node);
-  return RGWSTNCRef(new RGWSyncTraceNodeContainer(ref));
+  // return a separate shared_ptr that calls finish() on the node instead of
+  // deleting it. the lambda capture holds a reference to the original 'ref'
+  auto deleter = [ref] (RGWSyncTraceNode *node) { node->finish(); };
+  return {node, deleter};
 }

This comment has been minimized.

Copy link
@cbodley

cbodley Aug 15, 2017

Contributor

the full commit that replaces uses of RGWSTNCRef with RGWSyncTraceNodeRef is available here: cbodley@af246d3

ldout(sync_env->cct, 20) << __func__ << "(): sync status for bucket "
<< bucket_shard_str{bs} << ": " << sync_status.state << dendl;
tn->log(20, SSTR("sync status for bucket "
<< bucket_shard_str{bs} << ": " << sync_status.state));

This comment has been minimized.

Copy link
@cbodley

cbodley Aug 15, 2017

Contributor

i think we should leave bucket_shard_str part out of these messages, now that it's already included in the log message. for example, this one looks like:

data:sync:shard[75]:entry[eslbge-29:d2560962-42d3-4295-ac0c-efbbce6b79d8.4134.9]:bucket[eslbge-29:d2560962-42d3-4295-ac0c-efbbce6b79d8.4134.9]: sync status for bucket eslbge-29:d2560962-42d3-4295-ac0c-efbbce6b79d8.4134.9: 0

@@ -1537,6 +1537,9 @@ OPTION(rgw_sync_log_trim_interval, OPT_INT) // time in seconds between attempts

OPTION(rgw_sync_data_inject_err_probability, OPT_DOUBLE) // range [0, 1]
OPTION(rgw_sync_meta_inject_err_probability, OPT_DOUBLE) // range [0, 1]
OPTION(rgw_sync_trace_history_size, OPT_INT) // max number of complete sync trace entries to keep

This comment has been minimized.

Copy link
@amitkumar50

amitkumar50 Aug 16, 2017

Contributor

@yehudasa How good it would be to start comment with Capital letters?
// Max number of complete sync trace entries to keep

This comment has been minimized.

Copy link
@yehudasa

yehudasa Aug 16, 2017

Author Member

@amitkumar50 on a scale of 0 to 100? 3

@yehudasa

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2017

@cbodley incorporated your changes, comments. @mattbenjamin replaced RWLock with ceph::shunique_lock.

mutable boost::shared_mutex lock;
using shunique_lock = ceph::shunique_lock<decltype(lock)>;

std::atomic<uint64_t> count;

This comment has been minimized.

Copy link
@theanalyst

theanalyst Aug 21, 2017

Member

can we init count to 0 here

struct bucket_str_noinstance {
const rgw_bucket& b;
bucket_str_noinstance(const rgw_bucket& b) : b(b) {}
};

This comment has been minimized.

Copy link
@theanalyst

theanalyst Aug 21, 2017

Member

nit: newline after struct definition

@@ -420,18 +426,13 @@ class RGWMetaSyncSingleEntryCR : public RGWCoroutine {

bool error_injection;

RGWSyncTraceNodeRef tn;

public:
RGWMetaSyncSingleEntryCR(RGWMetaSyncEnv *_sync_env,
const string& _raw_key, const string& _entry_marker,
const RGWMDLogStatus& _op_status,

This comment has been minimized.

Copy link
@theanalyst

theanalyst Aug 21, 2017

Member

while you're at it can you fix the alignment here, (not a part of this changeset)

} catch (boost::bad_expression const& e) {
ldout(cct, 5) << "NOTICE: sync trace: bad expression: bad regex search term" << dendl;
} catch (...) {
ldout(cct, 5) << "NOTICE: sync trace: regex_search() through exception" << dendl;

This comment has been minimized.

Copy link
@theanalyst

@yehudasa yehudasa added the needs-qa label Aug 29, 2017

yehudasa added some commits Jun 15, 2017

rgw: initial sync trace data structures
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: sync trace initial output
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: add a container to track sync trace nodes
So that we could then move nodes to the retired node list in
the manager.
Also, more tracing implementation.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: sync tracing: user both rgw, rgw_sync log subsystems
Dump trace node messages when either rgw or rgw_sync log level is
high enough (will only show one message if both are set).

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: sync tracing: implement meta incremental logging
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: sync trace: admin socket integration
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: sync trace: add 'active' search
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: sync trace: can use regex
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: initial data sync tracing
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: more log tracing (data, bucket)
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: trace log: fix sync_trace destruction
avoid use after free

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: multi trace: set/reset active flags in data/bucket sync
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: sync trace: add an option for short resource name
So that we can just see which resources are getting synced right now

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: fixes following rebase
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

yehudasa and others added some commits Jul 21, 2017

rgw: sync trace: update service map with current active entities
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: replace RGWSyncTraceNodeContainer wrapper
Signed-off-by: Casey Bodley <cbodley@redhat.com>
rgw: remove some redundant log info
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: sync_trace, replace RWLock with ceph::shunique_lock
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
rgw: minor whitespace fix
following code review

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

@yehudasa yehudasa force-pushed the yehudasa:wip-rgw-multi-log branch from 29f56b8 to 88c2adf Aug 31, 2017

@yehudasa

This comment has been minimized.

Copy link
Member Author

commented Aug 31, 2017

@cbodley @theanalyst rebased and took care of the issues mentioned

rgw: move RGWSyncTraceManager lock
To prevent lock getting freed before other members that use it for
cleanup.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

@yuriw yuriw merged commit cfb13b3 into ceph:master Sep 11, 2017

5 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details

smithfarm added a commit to smithfarm/ceph that referenced this pull request Jan 5, 2018

rgw: dont log EBUSY errors in 'sync error list'
these temporary errors get retried automatically, so no admin
intervention is required. logging them only serves to waste space in
omap and obscure the more serious sync errors

Fixes: http://tracker.ceph.com/issues/22473

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit ca4510b)

Conflicts:
    src/rgw/rgw_data_sync.cc ("multisite log tracing" feature - see
        ceph#16492 - is not being backported to
        luminous)

robbat2 added a commit to dreamhost/ceph that referenced this pull request Jan 26, 2018

rgw: dont log EBUSY errors in 'sync error list'
these temporary errors get retried automatically, so no admin
intervention is required. logging them only serves to waste space in
omap and obscure the more serious sync errors

Fixes: http://tracker.ceph.com/issues/22473

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit ca4510b)

Conflicts:
    src/rgw/rgw_data_sync.cc ("multisite log tracing" feature - see
        ceph#16492 - is not being backported to
        luminous)

jdurgin pushed a commit to jdurgin/ceph that referenced this pull request Mar 5, 2018

rgw: dont log EBUSY errors in 'sync error list'
these temporary errors get retried automatically, so no admin
intervention is required. logging them only serves to waste space in
omap and obscure the more serious sync errors

Fixes: http://tracker.ceph.com/issues/22473
Resolves: rhbz#1530665

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit ca4510b)

Conflicts:
    src/rgw/rgw_data_sync.cc ("multisite log tracing" feature - see
        ceph#16492 - is not being backported to
        luminous)

robbat2 added a commit to dreamhost/ceph that referenced this pull request Mar 6, 2018

rgw: dont log EBUSY errors in 'sync error list'
these temporary errors get retried automatically, so no admin
intervention is required. logging them only serves to waste space in
omap and obscure the more serious sync errors

Fixes: http://tracker.ceph.com/issues/22473

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit ca4510b)

Conflicts:
    src/rgw/rgw_data_sync.cc ("multisite log tracing" feature - see
        ceph#16492 - is not being backported to
        luminous)

ivancich pushed a commit to ivancich/ceph-fork that referenced this pull request Mar 27, 2018

rgw: dont log EBUSY errors in 'sync error list'
these temporary errors get retried automatically, so no admin
intervention is required. logging them only serves to waste space in
omap and obscure the more serious sync errors

Fixes: http://tracker.ceph.com/issues/22473
Resolves: rhbz#1527132

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit ca4510b)
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

Conflicts:
    src/rgw/rgw_data_sync.cc ("multisite log tracing" feature - see
        ceph#16492 - is not being backported to luminous/jewel)

guihecheng pushed a commit to guihecheng/ceph that referenced this pull request Sep 12, 2018

rgw: dont log EBUSY errors in 'sync error list'
these temporary errors get retried automatically, so no admin
intervention is required. logging them only serves to waste space in
omap and obscure the more serious sync errors

Fixes: http://tracker.ceph.com/issues/22473
Resolves: rhbz#1527132

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit ca4510b)
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

Conflicts:
    src/rgw/rgw_data_sync.cc ("multisite log tracing" feature - see
        ceph#16492 - is not being backported to luminous/jewel)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.