New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track time operation spends in work queue #4218
Comments
|
Comment from firstyear (@Firstyear) at 2020-06-19 01:30:33 I think this would help, but I think we can think bigger than this. It would be good to have finegrained timings without just systemtap probes or whatever. In the op struct, we could have a nested tree of timings that we build up to geta profile of an operation. IE a complete time, the queue time, how much time in plugins etc. That would help us much much more to identify performance gaps that goes beyond just the work queue. And I think it wouldn't be much work work on top to achieve it. What do you think? |
|
Comment from firstyear (@Firstyear) at 2020-06-19 01:30:33 Metadata Update from @Firstyear:
|
|
Comment from mreynolds (@mreynolds389) at 2020-06-19 02:30:07
I like it, but I'm not sure I am envisioning your nested tree structure of timings. There are only so many things we can get timings for so I would think we could just use a static array of timespecs and enum the indexes, but I suspect you have something better in mind :-) The other issue is how much do we write to the access log, 15 keywords for various timings of the operation? This would potentially double the output for each logged RESULT. Maybe it's a non-issue and I'm just being paranoid. But to be clear I want this in the access log by default. I don't want it to be something we turn on as this would make analysing customer problems way easier if it's already there in SOS reports. |
|
Comment from firstyear (@Firstyear) at 2020-06-19 03:22:48
Perhaps my idea is better saved for when we have the logging subsystem rewrite, because else we do risk performance issues I guess. I'd say I want a seperate log for this kind of measurement I think. So maybe my ideas can wait :) but I like the wtime idea. |
|
Comment from tbordaz (@tbordaz) at 2020-06-19 12:11:41 @Firstyear @mreynolds389 I think the extended idea you are discussing is the purpose of the PR #3730 I sent few months ago. Basically attaching a an operation extended structure to monitor various aspect of the operation itself. In the PR it was related to index keys list, but it can be easily extended to others places. I wrote a design of it http://www.port389.org/docs/389ds/design/log-operation-stats.html I wanted to be back on this PR for a long time as I think it has some advantages compare to a set of stap scripts. Unfortunately I did not get the time to be back on it. I think it is a good opportunity to complete the PR review and possibly use the new approach to fix this ticket. |
|
Comment from mreynolds (@mreynolds389) at 2020-06-19 22:30:46 After talking with @tbordaz I will implement the "wtime" and "optime", and then we can look into real logging changes via a different approach/ticket. So my plan is to have 3 times in the result message:
If you add "optime" and "wtime" together it should be equal to the "etime" |
|
Comment from mreynolds (@mreynolds389) at 2020-06-19 22:30:56 Metadata Update from @mreynolds389:
|
|
Comment from mreynolds (@mreynolds389) at 2020-06-19 22:31:24 Metadata Update from @mreynolds389:
|
|
Comment from firstyear (@Firstyear) at 2020-06-22 04:11:31 Sounds good to me! I look forward to reviewing it :) |
|
Comment from mreynolds (@mreynolds389) at 2020-06-23 22:45:29 @Firstyear enjoy: #4236 |
|
Comment from mreynolds (@mreynolds389) at 2020-06-23 23:16:15 Metadata Update from @mreynolds389:
|
|
Comment from firstyear (@Firstyear) at 2020-06-24 03:15:18 I will enjoy it thoroughly :) Thanks mate, |
|
Comment from tbordaz (@tbordaz) at 2020-06-24 07:52:35 Metadata Update from @tbordaz:
|
|
Comment from mreynolds (@mreynolds389) at 2020-06-24 18:56:52 Commit 3246fe79 relates to this ticket |
|
Comment from mreynolds (@mreynolds389) at 2020-06-24 18:59:37
|
|
Comment from mreynolds (@mreynolds389) at 2020-06-24 18:59:37 Metadata Update from @mreynolds389:
|
|
Comment from mreynolds (@mreynolds389) at 2020-06-24 19:00:14 Design Doc: http://www.port389.org/docs/389ds/design/access-log-new-time-stats-design.html |
|
Comment from mreynolds (@mreynolds389) at 2020-06-28 21:50:04 Add more enhancements to logconv |
|
Comment from mreynolds (@mreynolds389) at 2020-06-29 04:06:25 Commit f771ff7e relates to this ticket |
|
Comment from mreynolds (@mreynolds389) at 2020-06-29 04:11:23
|
|
Comment from mreynolds (@mreynolds389) at 2020-08-17 15:22:45 Commit dda3ab0d fixes this issue |
|
Comment from mreynolds (@mreynolds389) at 2020-08-17 15:24:26 Fix for extended ops:
|
|
Comment from mreynolds (@mreynolds389) at 2020-08-17 15:24:27 Metadata Update from @mreynolds389:
|
Description: Add a test case to dirsrvtests/tests/suites/ds_logs/ds_logs_test.py: test_optime_and_wtime_keywords. It tests that the new optime and wtime keywords are present in the access log and have correct values Also, adapt test_etime_order_of_magnitude adapted to the new RESULT string format in the access log Relates: #4218 Reviewed by: @droideck, @Firstyear (Thanks!)
Bug description: with 389ds#4218 (wtime, optime in access log), hrtime is set in the operation. But it is done before checking if the operation is set. covscan fails Fix description: move the setting after verification that operation != NULL relates: 389ds#4759 Reviewed by: Simon Pichugin Platforms tested: F34
Bug description: with #4218 (wtime, optime in access log), hrtime is set in the operation. But it is done before checking if the operation is set. covscan fails Fix description: move the setting after verification that operation != NULL relates: #4759 Reviewed by: Simon Pichugin Platforms tested: F34
Bug description: with #4218 (wtime, optime in access log), hrtime is set in the operation. But it is done before checking if the operation is set. covscan fails Fix description: move the setting after verification that operation != NULL relates: #4759 Reviewed by: Simon Pichugin Platforms tested: F34
Bug description: with #4218 (wtime, optime in access log), hrtime is set in the operation. But it is done before checking if the operation is set. covscan fails Fix description: move the setting after verification that operation != NULL relates: #4759 Reviewed by: Simon Pichugin Platforms tested: F34
Bug description: with #4218 (wtime, optime in access log), hrtime is set in the operation. But it is done before checking if the operation is set. covscan fails Fix description: move the setting after verification that operation != NULL relates: #4759 Reviewed by: Simon Pichugin Platforms tested: F34
Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/51165
Issue Description
There are customer performance cases where it would be very useful to know how long an operation waited in the work queue before being assigned to a worker thread. This should be easy to do, and we can add a new keyword in the access log, maybe something like "wtime", like "etime", but for how long it "waited" in the queue.
The text was updated successfully, but these errors were encountered: