New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: add perf counters for file system operations #14938

Merged
merged 1 commit into from May 16, 2017

Conversation

Projects
None yet
3 participants
@michaelsevilla
Contributor

michaelsevilla commented May 3, 2017

Shows the types of operations that clients are doing by adding counters to the
spot where operations are demuxed. Does not support per-client activity. This
is the first step towards comprehensive performance tracking in the file
system.

Signed-off-by: Michael Sevilla mikesevilla3@gmail.com

@michaelsevilla

This comment has been minimized.

Contributor

michaelsevilla commented May 3, 2017

@batrick

I was just going to work on this :). Only a few minor changes.

Since you're head is now in this, can you create a new tracker ticket for any other performance counters you think would be appropriate. At the very least, the next step would be counters for mds-to-mds messages.

case CEPH_MDS_OP_RMDIR:
logger->inc(l_mdss_req_rmdir);

This comment has been minimized.

@batrick

batrick May 3, 2017

Member

l_mdss_req_rmdir gets incremented for unlink and rmdir (see fallthrough for unlink).

@@ -1533,100 +1587,127 @@ void Server::dispatch_client_request(MDRequestRef& mdr)
switch (req->get_op()) {
case CEPH_MDS_OP_LOOKUPHASH:
case CEPH_MDS_OP_LOOKUPINO:
logger->inc(l_mdss_req_lookupino);

This comment has been minimized.

@batrick

batrick May 3, 2017

Member

Let's break these into two counters?

@michaelsevilla

This comment has been minimized.

Contributor

michaelsevilla commented May 3, 2017

I can't think of any other perf counters I'd want -- I had this code lying around from a research project on load balancing. I have LTTnG code that gives you the latency of each file system request (from the MDS's view), similar to the stuff @noahdesu did in his blog post on OSD request tracing. Would that be useful?

@@ -1586,6 +1588,9 @@ void Server::dispatch_client_request(MDRequestRef& mdr)
switch (req->get_op()) {
case CEPH_MDS_OP_LOOKUPHASH:
logger->inc(l_mdss_req_lookuphash);
handle_client_lookup_ino(mdr, false, false);

This comment has been minimized.

@michaelsevilla

michaelsevilla May 3, 2017

Contributor

Not sure if you wanted this repeated like this.

This comment has been minimized.

@batrick

batrick May 4, 2017

Member

I don't have a problem with it.

@batrick

This comment has been minimized.

Member

batrick commented May 4, 2017

@michaelsevilla that sounds very useful! Please make a tracker ticket to collect feedback for it.

@batrick

batrick approved these changes May 4, 2017

@batrick

This comment has been minimized.

Member

batrick commented May 4, 2017

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

@batrick

This comment has been minimized.

@jcsp

This comment has been minimized.

Contributor

jcsp commented May 11, 2017

The downside to this implementation is that it will count operations multiple times when they pass through dispatch_client_request more than once. That happens when the operation waits for locks or waits for metadata to load.

Perhaps incrementing the counters in respond_to_request instead? It would require writing out the big case statement by op type again, but the resulting counters would be more meaningful.

@michaelsevilla

This comment has been minimized.

Contributor

michaelsevilla commented May 11, 2017

Ah - thanks. I was seeing double counting for my experiments and I couldn't figure out why. Hopefully incrementing in respond_to_request fixes that problem.

@jcsp

jcsp approved these changes May 15, 2017

@@ -1046,6 +1102,93 @@ void Server::respond_to_request(MDRequestRef& mdr, int r)
mdr->internal_op_finish->complete(r);
mdcache->request_finish(mdr);
}
// add here to avoid counting ops multiple times (e.g., locks, loading)
switch(mdr->client_request->get_op()) {

This comment has been minimized.

@jcsp

jcsp May 15, 2017

Contributor

Oops, missed this in previous review -- mdr->client_request may be null (see conditional above), so this breaks when internal ops are executed.

This comment has been minimized.

@michaelsevilla

michaelsevilla May 15, 2017

Contributor

That massive switch statement has internal ops (like snapshot and file layout stuff), right? So we want a conditional like this?:

if (mdr->client_request || mdr->internal_op > -1) {

I suppose I could split the perf counters up -- but I'm not sure which ones are internal ops.

This comment has been minimized.

@jcsp

jcsp May 15, 2017

Contributor

I think all the ops you've got stats for are client ones (that includes snapshots/layout stuff). The internal ones are listed in include/ceph_fs.h line 363 onwards.

This comment has been minimized.

@jcsp

jcsp May 15, 2017

Contributor

...and I think it's fine to not have the counters for internal ops here

@michaelsevilla

This comment has been minimized.

Contributor

michaelsevilla commented May 15, 2017

Whoops -- bad rebase. Let me fix it.

mds: add perf counters for file system operations
Shows the types of operations that clients are doing by adding counters to the
spot where the MDS responds to clients. This avoids counting operations that
were restarted while waiting for locks or for metadata to load.  Does not
support per-client activity. This is the first step towards comprehensive
performance tracking in the file system.

Signed-off-by: Michael Sevilla <mikesevilla3@gmail.com>
@jcsp

jcsp approved these changes May 16, 2017

@jcsp

jcsp approved these changes May 16, 2017

@jcsp jcsp merged commit 71e3a85 into ceph:master May 16, 2017

2 of 3 checks passed

default Build finished.
Details
Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment