Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Add warning interval option for resource monitor plugin #9826

Merged
merged 12 commits into from
Dec 28, 2020

Conversation

bogniq
Copy link
Collaborator

@bogniq bogniq commented Dec 23, 2020

Change Description

Option resource-monitor-warning-interval is added to the resource monitor plugin to fix issue #9649. This option sets the number of monitor intervals after which the warning message is output if the resource threshold is hit (currently only used for filesystem space).

Before this option is added, the warning is output after every monitoring (e.g. by default every 2 seconds). Now this option can be set from 1 to 450 and has a default value of 30 (e.g. if the monitor interval is 2 seconds, and this warning interval is (by default) 30, then the warning will be output every 60 seconds if the threshold is hit).

Note: the naming of this option is discussed in the issue and needs to be documented to avoid confusion.

Change Type

Select ONE:

  • Documentation
  • Stability bug fix
  • Other
  • Other - special case

Testing Changes

Select ANY that apply:

  • New Tests
  • Existing Tests
  • Test Framework
  • CI System
  • Other

Consensus Changes

  • Consensus Changes

API Changes

  • API Changes

Documentation Additions

  • Documentation Additions

boost::asio::io_context ctx;
static constexpr uint32_t def_monitor_warning_interval = 30; // After this number of monitor intervals, warning is output if the threshold is hit
static constexpr uint32_t warning_interval_low = 1;
static constexpr uint32_t warning_interval_high = 450; // e.g. if the monitor interval is 2 sec, the warning interval is at most 15 minutes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using min and max in names

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I changed all _low and high in the limits names to _min and _max.

matches = re.compile("\s+([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3})\s").search(msg)
return datetime.strptime(matches.group(0).strip(), '%Y-%m-%dT%H:%M:%S.%f')

intervalTolerance = 0.1 # 10%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 0.1 just random number? if yes could you run this test in a loop and dump to log curInterval - timedelta(seconds=expectedInterval)
to see how volatile it could be?
This is to avoid another flakey test where we need restart tests 2-3 times to get it passed.

Copy link
Collaborator Author

@bogniq bogniq Dec 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.1 (i.e. 10%) is based on observed warning interval deviations on buildkite and my local dockers. On buildkite, the max deviation is observed on macOS 10.15, where for the first test case (10-seconds interval, from 2 seconds monitor interval and warning interval 5) the deviation is about 0.6 sec, i.e. 6%. The second test case (30 seconds) on buildkite and all my local dockers have smaller deviation percentage.
To be safer, I changed this tolerance to be 0.15 (15%) in the new commit.

Copy link
Contributor

@dimas1185 dimas1185 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks good to me but please double check with @iamveritas if we need any documentation changes related to this as we are adding new command line option.

@bogniq
Copy link
Collaborator Author

bogniq commented Dec 28, 2020

Thank you! Let me contact @iamveritas for the documents.

@iamveritas
Copy link
Contributor

Thank you! Let me contact @iamveritas for the documents.

I have created a ticket to track this issue from documentation point of view.
thank you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants