-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added new montored value class #196
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this PR @graeme-a-stewart. In general it looks good. I have only a few minor comments below.
By the way, locally testing your changes I bumped into a failed unit test:
12/14 Test #12: testUnits ........................***Failed 3.82 sec
with the output:
12/14 Testing: testUnits
12/14 Test: testUnits
Command: "/cvmfs/sft-nightlies.cern.ch/lcg/views/dev4/Mon/x86_64-centos7-gcc11-dbg/bin/python3" "test_cpu.py" "--units" "--time" "3" "--slack" "0" "--interval" "1"
Directory: /afs/cern.ch/user/a/amete/workspace/prmon-workspace/build/package/tests
"testUnits" start time: Aug 02 16:06 CEST
Output:
----------------------------------------------------------
Will run for 3s using 1 process(es) and 1 thread(s)
[2021-08-02 16:06:17.109] [prmon] [warning] Wallclock time of monitored process was less than the monitoring interval, so average statistics will be unreliable
F
======================================================================
FAIL: test_run_test_with_params (__main__.setup_configurable_test.<locals>.ConfigurableProcessMonitor)
Actual test runner
----------------------------------------------------------------------
Traceback (most recent call last):
File "/afs/cern.ch/work/a/amete/prmon-workspace/build/package/tests/test_cpu.py", line 82, in test_run_test_with_params
self.assertLessEqual(
AssertionError: 4 not less than or equal to 3.0 : Too high value for wall time (expected maximum of 3.0, got 4)
----------------------------------------------------------------------
Ran 1 test in 3.724s
FAILED (failures=1)
<end of output>
Test time = 3.82 sec
----------------------------------------------------------
Test Failed.
"testUnits" end time: Aug 02 16:06 CEST
"testUnits" time elapsed: 00:00:03
----------------------------------------------------------
The test succeeded on the second try so we might need to take a closer look at the tests to make them more robust but that's for another issue/PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @graeme-a-stewart.
I am curious if there may arise a situation in the future where we have to add a metric to a monitor so that we have both monotonic and non-monotonic stats for the same monitor?
Latest changes should address all of the review comments - thank you both for them Still TODO:
|
It's not excluded, although it's not used any any of the current monitors (which simplifies their initalisation). |
Updated classes to use consistent 'using' statements and implemented check for monotonic value updates (prevent decreases) Migrated cpumon to use the new class, as first trial though more work and optimisation required
The parameter class is very useful to use to setup monitors so keep it and use it internally in monitored_value. Add a constructor for monitored_value that directly takes a parameter class and use that in cpumon.cpp.
Moving these using statements into the namespace makes the fact that they are prmon specific clearer when reading the source code and prevents any accidental namespace pollution Small update to monitored_value class to add a max_value, which is needed for parameters which are not montonic, but can go up and down
These are only valid for non-monotonic counters An internal iterations counter keeps track of how many iteration cycles have been called
Use the new features for non-monotonic value monitoring
Had used implicit constructor from parameter type, better to be explicit Correct missing monotonic flag fror cpumon parameters
Allows for the removal of the iomon specific protection against decrease of vales that has occasionally been observed IO_TEST compile flag allows checking of the monotonic protection which is working (albit not yet using the logger)
Allow the setting of an offset at construction time, which sets a "base" level for the parameter's raw values, against which the "true" value of the parameter is measured
Uses the new offset feature for the base levels of network statistics, which are also monotonic
Not yet tested, though by now changes should be pretty routine!
This is because the wallmon monitor does not know the offest at the time it constructs the class; instead it is only known when then mother PID is known, which happens on the first monitoring cycle.
New monitored value class used now to hold the number of seconds run for. Retrieved mother starttime gets a more complex pair return value to pass error code as well as measured value.
Follows sensible coding guidelines used by ATLAS
Unnecessary as emplace can take 2 arguments
Make sure that the offset/current value is included in the error message, which might be useful
Describe how to use new classes and types when implementing a new monitor
Change to consistent use of m_NAME convention for private member variables. Remove unused constructor for monitored_value from strings - the parameter class is always used.
6429e6d
to
6093d45
Compare
Test basic behaviour of parameter class Test behaviour of the monitored_value class, for each combination of monotonic and offset values Update test README to describe unit tests
ae97178
to
4c2bbbd
Compare
OK, all PR comments have been addressed and last issues fixed. Unit tests are running correctly for the new classes as well, thanks to the Google Test setup in #197. Will merge this so that @quantum-shift can work on rebasing #195. |
This is a big PR, as it introduces a specific class that handles monitored variables, thus refactoring of all monitors to use this class has been added. Also added are namespace
using
statements that define properly the types used by monitors.The big advantages of this are:
This enabled specific monitor protections for decreased values to be removed from a few monitors (iomon, netmon) as well as adding protection for CPU and walltime monitors.
Note that the class does not yet use loggers, but that this should be added (e.g., as a pointer to the logger instance, with
nullptr
meaning fallback tostd::cerr
).Closes #185