Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: multisite log tracing #16492

Merged
merged 20 commits into from Sep 11, 2017
Merged

rgw: multisite log tracing #16492

merged 20 commits into from Sep 11, 2017

Conversation

yehudasa
Copy link
Member

A new framework that tracks in memory of current rgw sync process. The new system allows following a specific sync entity (vs. the current flat log dump). Each entity has an id that is a concatenation of the path within the execution tree. An entity would roughly reflect a sync role (meta, meta shard, meta entry, data, data shard, bucket shard, object), and it is possible to look at the history of that entity's point of view. New admin socket commands were added:

    "sync trace active": "show active multisite sync entities information"
    "sync trace active_short": "show active multisite sync entities entries"
    "sync trace history": "sync trace history [filter_str]: show history of multisite tracing information"
    "sync trace show": "sync trace show [filter_str]: show current multisite tracing information"

All commands can get an extra param that is a regex that can be used to search for a specific entity (matching the history of that entity). E.g.,

$ ceph --admin-daemon=/home/yehudasa/ceph/build/run/c2/out/radosgw.8001.asok sync trace show meta.*shard.13
{
    "running": [
        {
            "status": "meta:shard[13]: took lease"
        }
    ],
    "complete": []
}

We keep info about all current running nodes, and also keep some history of complete nodes. We can see a view of current nodes that are marked as active (where we identified and deal with actual meta/data sync):

$ ceph --admin-daemon=/home/yehudasa/ceph/build/run/c1/out/radosgw.8000.asok sync trace active
{
    "running": [
        {
            "status": "data:sync:shard[115]:entry[buck:a6fabc5f-23bc-473c-9b05-ae8d1948dc08.4185.5]:bucket[buck:a6fabc5f-23bc-473c-9b05-ae8d1948dc08.4185.5]:inc_sync[buck:a6fabc5f-23bc-473c-9b05-ae8d1948dc08.4185.5]: listing bilog for incremental sync"
        },
        {
            "status": "data:sync:shard[115]:entry[buck:a6fabc5f-23bc-473c-9b05-ae8d1948dc08.4185.5]:bucket[buck:a6fabc5f-23bc-473c-9b05-ae8d1948dc08.4185.5]:inc_sync[buck:a6fabc5f-23bc-473c-9b05-ae8d1948dc08.4185.5]:entry[fff3]: bucket sync: sync obj: 8ef1cd9f-d0f2-485d-ba1e-c02f1f085b1c/buck[a6fabc5f-23bc-473c-9b05-ae8d1948dc08.4185.5])/fff3[0]"
        }
    ],
    "complete": []
}

There's a active_short option that can be used to just get a plain name of the entities that are current syncing (e.g., list of /). This list is being sent to the service map periodically and can be retrieved via the ceph service status command.

@mattbenjamin
Copy link
Contributor

@yehudasa I spent some time trying to find something worrisome in this; I haven't found it so far. I don't -think- sending updates to the mgr at 10s intervals should be costly at either end? I did wonder whether we have a modern idiom replacing RWLock?

@mattbenjamin
Copy link
Contributor

@yehudasa ok, one practical question; this change creates
+class RGWSyncTraceServiceMapThread : public RGWRadosThread {
which seems to mainly manage periodic mgr updates; would it make sense to consolidate this with other periodic mgr updates?

@liewegas liewegas added the rgw label Aug 8, 2017
@yehudasa
Copy link
Member Author

@mattbenjamin while we update the status every 10 seconds, it is librados that decides when to send these updates, following its own internal config

@yehudasa
Copy link
Member Author

would it make sense to consolidate this with other periodic mgr updates?

@mattbenjamin I don't recall any such periodic mgr thread that can be used. I do think that it'd be nice to have something we could hook into, but maybe that's out of scope.

@yehudasa
Copy link
Member Author

@mattbenjamin rebased

};

/*
* a container to RGWSyncTraceNodeRef, responsible to keep track

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start sentence with capital letter. Let's make a capital.

* of live nodes, and when last ref is dropped, calls ->finish()
* so that node moves to the retired list in the manager
*/
class RGWSyncTraceNodeContainer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wrapper adds an extra allocation to create and indirection to access. consider using a custom deleter with shared_ptr instead? this works for me:

-using RGWSTNCRef = std::shared_ptr<RGWSyncTraceNodeContainer>;
 
-RGWSTNCRef RGWSyncTraceManager::add_node(RGWSyncTraceNode *node)
+RGWSyncTraceNodeRef RGWSyncTraceManager::add_node(RGWSyncTraceNode *node)
 {
   RWLock::WLocker wl(lock);
   RGWSyncTraceNodeRef& ref = nodes[node->handle];
   ref.reset(node);
-  return RGWSTNCRef(new RGWSyncTraceNodeContainer(ref));
+  // return a separate shared_ptr that calls finish() on the node instead of
+  // deleting it. the lambda capture holds a reference to the original 'ref'
+  auto deleter = [ref] (RGWSyncTraceNode *node) { node->finish(); };
+  return {node, deleter};
 }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the full commit that replaces uses of RGWSTNCRef with RGWSyncTraceNodeRef is available here: cbodley@af246d3

ldout(sync_env->cct, 20) << __func__ << "(): sync status for bucket "
<< bucket_shard_str{bs} << ": " << sync_status.state << dendl;
tn->log(20, SSTR("sync status for bucket "
<< bucket_shard_str{bs} << ": " << sync_status.state));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should leave bucket_shard_str part out of these messages, now that it's already included in the log message. for example, this one looks like:

data:sync:shard[75]:entry[eslbge-29:d2560962-42d3-4295-ac0c-efbbce6b79d8.4134.9]:bucket[eslbge-29:d2560962-42d3-4295-ac0c-efbbce6b79d8.4134.9]: sync status for bucket eslbge-29:d2560962-42d3-4295-ac0c-efbbce6b79d8.4134.9: 0

@@ -1537,6 +1537,9 @@ OPTION(rgw_sync_log_trim_interval, OPT_INT) // time in seconds between attempts

OPTION(rgw_sync_data_inject_err_probability, OPT_DOUBLE) // range [0, 1]
OPTION(rgw_sync_meta_inject_err_probability, OPT_DOUBLE) // range [0, 1]
OPTION(rgw_sync_trace_history_size, OPT_INT) // max number of complete sync trace entries to keep

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yehudasa How good it would be to start comment with Capital letters?
// Max number of complete sync trace entries to keep

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amitkumar50 on a scale of 0 to 100? 3

@yehudasa
Copy link
Member Author

@cbodley incorporated your changes, comments. @mattbenjamin replaced RWLock with ceph::shunique_lock.

mutable boost::shared_mutex lock;
using shunique_lock = ceph::shunique_lock<decltype(lock)>;

std::atomic<uint64_t> count;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we init count to 0 here

struct bucket_str_noinstance {
const rgw_bucket& b;
bucket_str_noinstance(const rgw_bucket& b) : b(b) {}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newline after struct definition

@@ -420,18 +426,13 @@ class RGWMetaSyncSingleEntryCR : public RGWCoroutine {

bool error_injection;

RGWSyncTraceNodeRef tn;

public:
RGWMetaSyncSingleEntryCR(RGWMetaSyncEnv *_sync_env,
const string& _raw_key, const string& _entry_marker,
const RGWMDLogStatus& _op_status,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while you're at it can you fix the alignment here, (not a part of this changeset)

} catch (boost::bad_expression const& e) {
ldout(cct, 5) << "NOTICE: sync trace: bad expression: bad regex search term" << dendl;
} catch (...) {
ldout(cct, 5) << "NOTICE: sync trace: regex_search() through exception" << dendl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

threw

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
So that we could then move nodes to the retired node list in
the manager.
Also, more tracing implementation.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Dump trace node messages when either rgw or rgw_sync log level is
high enough (will only show one message if both are set).

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
avoid use after free

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
So that we can just see which resources are getting synced right now

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
yehudasa and others added 5 commits August 31, 2017 03:09
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
following code review

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
@yehudasa
Copy link
Member Author

@cbodley @theanalyst rebased and took care of the issues mentioned

To prevent lock getting freed before other members that use it for
cleanup.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
@yuriw yuriw merged commit cfb13b3 into ceph:master Sep 11, 2017
smithfarm pushed a commit to smithfarm/ceph that referenced this pull request Jan 5, 2018
these temporary errors get retried automatically, so no admin
intervention is required. logging them only serves to waste space in
omap and obscure the more serious sync errors

Fixes: http://tracker.ceph.com/issues/22473

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit ca4510b)

Conflicts:
    src/rgw/rgw_data_sync.cc ("multisite log tracing" feature - see
        ceph#16492 - is not being backported to
        luminous)
robbat2 pushed a commit to dreamhost/ceph that referenced this pull request Jan 26, 2018
these temporary errors get retried automatically, so no admin
intervention is required. logging them only serves to waste space in
omap and obscure the more serious sync errors

Fixes: http://tracker.ceph.com/issues/22473

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit ca4510b)

Conflicts:
    src/rgw/rgw_data_sync.cc ("multisite log tracing" feature - see
        ceph#16492 - is not being backported to
        luminous)
jdurgin pushed a commit to jdurgin/ceph that referenced this pull request Mar 5, 2018
these temporary errors get retried automatically, so no admin
intervention is required. logging them only serves to waste space in
omap and obscure the more serious sync errors

Fixes: http://tracker.ceph.com/issues/22473
Resolves: rhbz#1530665

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit ca4510b)

Conflicts:
    src/rgw/rgw_data_sync.cc ("multisite log tracing" feature - see
        ceph#16492 - is not being backported to
        luminous)
robbat2 pushed a commit to dreamhost/ceph that referenced this pull request Mar 6, 2018
these temporary errors get retried automatically, so no admin
intervention is required. logging them only serves to waste space in
omap and obscure the more serious sync errors

Fixes: http://tracker.ceph.com/issues/22473

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit ca4510b)

Conflicts:
    src/rgw/rgw_data_sync.cc ("multisite log tracing" feature - see
        ceph#16492 - is not being backported to
        luminous)
ivancich pushed a commit to ivancich/ceph-fork that referenced this pull request Mar 27, 2018
these temporary errors get retried automatically, so no admin
intervention is required. logging them only serves to waste space in
omap and obscure the more serious sync errors

Fixes: http://tracker.ceph.com/issues/22473
Resolves: rhbz#1527132

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit ca4510b)
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

Conflicts:
    src/rgw/rgw_data_sync.cc ("multisite log tracing" feature - see
        ceph#16492 - is not being backported to luminous/jewel)
guihecheng pushed a commit to guihecheng/ceph that referenced this pull request Sep 12, 2018
these temporary errors get retried automatically, so no admin
intervention is required. logging them only serves to waste space in
omap and obscure the more serious sync errors

Fixes: http://tracker.ceph.com/issues/22473
Resolves: rhbz#1527132

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit ca4510b)
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

Conflicts:
    src/rgw/rgw_data_sync.cc ("multisite log tracing" feature - see
        ceph#16492 - is not being backported to luminous/jewel)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants