Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new montored value class #196

Merged
merged 27 commits into from
Aug 18, 2021
Merged

Added new montored value class #196

merged 27 commits into from
Aug 18, 2021

Conversation

graeme-a-stewart
Copy link
Member

This is a big PR, as it introduces a specific class that handles monitored variables, thus refactoring of all monitors to use this class has been added. Also added are namespace using statements that define properly the types used by monitors.

The big advantages of this are:

  • The intelligent handling of monitored values becomes part of the class, thus less specific code to catch errors needs to be added in each monitor itself
  • Specifically values can be flagged as monotonic, which protects them against ever decreasing
  • Data members and return types now use properly namespaced definitions

This enabled specific monitor protections for decreased values to be removed from a few monitors (iomon, netmon) as well as adding protection for CPU and walltime monitors.

Note that the class does not yet use loggers, but that this should be added (e.g., as a pointer to the logger instance, with nullptr meaning fallback to std::cerr).

Closes #185

@graeme-a-stewart graeme-a-stewart added the enhancement New feature or request label Aug 2, 2021
Copy link
Collaborator

@amete amete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR @graeme-a-stewart. In general it looks good. I have only a few minor comments below.

By the way, locally testing your changes I bumped into a failed unit test:

12/14 Test #12: testUnits ........................***Failed    3.82 sec

with the output:

12/14 Testing: testUnits
12/14 Test: testUnits
Command: "/cvmfs/sft-nightlies.cern.ch/lcg/views/dev4/Mon/x86_64-centos7-gcc11-dbg/bin/python3" "test_cpu.py" "--units" "--time" "3" "--slack" "0" "--interval" "1" 
Directory: /afs/cern.ch/user/a/amete/workspace/prmon-workspace/build/package/tests
"testUnits" start time: Aug 02 16:06 CEST
Output:
----------------------------------------------------------
Will run for 3s using 1 process(es) and 1 thread(s)
[2021-08-02 16:06:17.109] [prmon] [warning] Wallclock time of monitored process was less than the monitoring interval, so average statistics will be unreliable
F
======================================================================
FAIL: test_run_test_with_params (__main__.setup_configurable_test.<locals>.ConfigurableProcessMonitor)
Actual test runner
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/afs/cern.ch/work/a/amete/prmon-workspace/build/package/tests/test_cpu.py", line 82, in test_run_test_with_params
    self.assertLessEqual(
AssertionError: 4 not less than or equal to 3.0 : Too high value for wall time (expected maximum of 3.0, got 4)

----------------------------------------------------------------------
Ran 1 test in 3.724s

FAILED (failures=1)
<end of output>
Test time =   3.82 sec 
----------------------------------------------------------
Test Failed.
"testUnits" end time: Aug 02 16:06 CEST
"testUnits" time elapsed: 00:00:03
----------------------------------------------------------

The test succeeded on the second try so we might need to take a closer look at the tests to make them more robust but that's for another issue/PR.

package/src/netmon.cpp Outdated Show resolved Hide resolved
package/src/netmon.cpp Outdated Show resolved Hide resolved
package/src/parameter.h Outdated Show resolved Hide resolved
Copy link
Collaborator

@quantum-shift quantum-shift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @graeme-a-stewart.
I am curious if there may arise a situation in the future where we have to add a metric to a monitor so that we have both monotonic and non-monotonic stats for the same monitor?

package/src/countmon.cpp Outdated Show resolved Hide resolved
package/src/netmon.cpp Show resolved Hide resolved
package/src/iomon.cpp Outdated Show resolved Hide resolved
package/src/netmon.cpp Outdated Show resolved Hide resolved
@graeme-a-stewart
Copy link
Member Author

Latest changes should address all of the review comments - thank you both for them

Still TODO:

  • Documentation update
  • Unit tests

@graeme-a-stewart
Copy link
Member Author

Thank you, @graeme-a-stewart.
I am curious if there may arise a situation in the future where we have to add a metric to a monitor so that we have both monotonic and non-monotonic stats for the same monitor?

It's not excluded, although it's not used any any of the current monitors (which simplifies their initalisation).

graeme-a-stewart and others added 22 commits August 18, 2021 14:26
Updated classes to use consistent 'using' statements
and implemented check for monotonic value updates
(prevent decreases)

Migrated cpumon to use the new class, as first trial
though more work and optimisation required
The parameter class is very useful to use to setup monitors
so keep it and use it internally in monitored_value.

Add a constructor for monitored_value that directly takes
a parameter class and use that in cpumon.cpp.
Moving these using statements into the namespace makes the
fact that they are prmon specific clearer when reading the source code and
prevents any accidental namespace pollution

Small update to monitored_value class to add a max_value, which is needed for
parameters which are not montonic, but can go up and down
These are only valid for non-monotonic counters
An internal iterations counter keeps track of how many
iteration cycles have been called
Use the new features for non-monotonic value monitoring
Had used implicit constructor from parameter type, better to be explicit
Correct missing monotonic flag fror cpumon parameters
Allows for the removal of the iomon specific protection against decrease
of vales that has occasionally been observed

IO_TEST compile flag allows checking of the monotonic protection
which is working (albit not yet using the logger)
Allow the setting of an offset at construction time, which sets a "base"
level for the parameter's raw values, against which the "true" value
of the parameter is measured
Uses the new offset feature for the base levels of network statistics,
which are also monotonic
Not yet tested, though by now changes should be pretty routine!
This is because the wallmon monitor does not know the offest at the time
it constructs the class; instead it is only known when then mother PID
is known, which happens on the first monitoring cycle.
New monitored value class used now to hold the number of seconds run
for.

Retrieved mother starttime gets a more complex pair return value to pass
error code as well as measured value.
Follows sensible coding guidelines used by ATLAS
Unnecessary as emplace can take 2 arguments
Make sure that the offset/current value is included in the
error message, which might be useful
Describe how to use new classes and types when implementing a new
monitor
Change to consistent use of m_NAME convention
for private member variables.

Remove unused constructor for monitored_value
from strings - the parameter class is always used.
Test basic behaviour of parameter class

Test behaviour of the monitored_value class, for
each combination of monotonic and offset values

Update test README to describe unit tests
@graeme-a-stewart
Copy link
Member Author

OK, all PR comments have been addressed and last issues fixed. Unit tests are running correctly for the new classes as well, thanks to the Google Test setup in #197.

Will merge this so that @quantum-shift can work on rebasing #195.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Protect monotonic increasing stats against race conditions
3 participants