Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common: add "avglat" in perf result to calculate average latency. #12199

Closed
wants to merge 1 commit into from
Closed

common: add "avglat" in perf result to calculate average latency. #12199

wants to merge 1 commit into from

Conversation

liupan1111
Copy link
Contributor

Signed-off-by: Pan Liu pan.liu@istuary.com

Signed-off-by: Pan Liu <pan.liu@istuary.com>
@liupan1111
Copy link
Contributor Author

When we want to tune the performance, we may need perf dump command. For the variables added by "add_time_avg", only "sum" and "avgcount" are dumped out, so I have to compute average latency by hand: sum/avgcount. Indeed, for this kind of performance tuning, average latency of different part is even more important when compare two different perf results.

In my modification, the unit of avglat is ms, not second, so that easier to read.

@liewegas @tchaikov , please help take a look.

@tchaikov tchaikov self-assigned this Nov 28, 2016
@liewegas
Copy link
Member

I'm not sure this helps you. The sum and avgcount are totals over the lifetime of the counter, so dividing them directly doesn't tell you much.. you need to take the delta with the previous measurement (say, 1s ago), and then device that to get a useful value.

Either way, this seems like something that consuming tools (e.g., 'ceph daemonperf ...') should be doing...

@liupan1111
Copy link
Contributor Author

@liewegas, there is also one command very usefull: ceph daemon... perf reset all. In am working tuning ceph on P3700 ssds. Using perf reset all and perf dump really help me a lot. :) After adding avg lat, it is much easier for me to compare The latency before and After perf reset all.

@liupan1111
Copy link
Contributor Author

liupan1111 commented Dec 14, 2016

@liewegas @tchaikov , I tried ceph daemonperf:

ceph daemonperf /var/run/ceph/ceph-osd.0.asok
---objecter--- -----------osd-----------
writ read actv|recop rd wr lat ops |
0 0 0 | 0 0 50M 332 48
0 0 0 | 0 0 40M 369 39
0 0 0 | 0 0 50M 341 49
0 0 0 | 0 0 59M 280 57
0 0 0 | 0 0 67M 263 64
0 0 0 | 0 0 33M 467 32
0 0 0 | 0 0 40M 385 39

I didn't find any perf information for a variable such as "l_bluestore_compress_lat". I also found there was no option to output this kind info:
ceph daemonperf {daemon_name|socket_path} [{interval} [{count}]]

I agree this tool should support this. But seems many enhancement should be done to support. In this way, I think my modification is practical, can help users to do analysis.

What is your opinion?

Thanks.

@liewegas
Copy link
Member

liewegas commented Dec 14, 2016 via email

@liupan1111
Copy link
Contributor Author

@dmick , what is your opinion about this PR? I believe avg lat is useful for debug.

@dmick
Copy link
Member

dmick commented Feb 27, 2017

I guess my opinion is that, as Sage said, the averages aren't very useful. I get that along with manually resetting the counters they can be made more useful, but that's a pretty brute-force solution, and affects more than just the counter you're interested in.

That said, it does seem useful to be able to select a set of stats with arguments to ceph daemon, so that experiments like yours and others, while they may be less-generally useful, can still be accomplished without hacking the code. I'd vote for moving in that direction rather than adding more questionably-useful hardcoded stats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants