Skip to content
This repository was archived by the owner on Nov 24, 2025. It is now read-only.

Fix Monitor Thresholds#3646

Merged
mitchell852 merged 3 commits intoapache:masterfrom
rob05c:fix-tm-thresholds
Jun 18, 2019
Merged

Fix Monitor Thresholds#3646
mitchell852 merged 3 commits intoapache:masterfrom
rob05c:fix-tm-thresholds

Conversation

@rob05c
Copy link
Copy Markdown
Member

@rob05c rob05c commented May 29, 2019

This fixes the Monitor racing and the health poll marking Available
caches which were marked Unavailable by the Stat poll.

There was already logic to prevent that, but it wasn't accounting
for the Calculated stats, so the Health poll didn't check Calculated
stats for thresholds, but it does "have" those stats, so it thought
it could mark them available again.

This fixes the Health poll to also check thresholds for Calculated
stats, which not only fixes the race, but makes thresholds be
marked up and down faster with the quicker Health poll.

Includes unit tests.
Bug fix, no interface change, so no documentation change.
Includes changelog.

  • This PR is not related to any other Issue

Which Traffic Control components are affected by this PR?

  • Traffic Monitor

What is the best way to verify this PR?

Set low availableKbps thresholds, run the monitor, verify in the Event Log and Cache States that caches are marked unavailable from kbps threshold, verify health poll does not mark caches that remain above the threshold as available, and that caches remain Unavailable (unless they drop below the threshold).

If this is a bug fix, what versions of Traffic Ops are affected?

  • 2.2
  • 3.0.0
  • 3.0.1

The following criteria are ALL met by this PR

  • This PR includes tests OR I have explained why tests are unnecessary
  • This PR includes documentation OR I have explained why documentation is unnecessary
  • This PR includes an update to CHANGELOG.md OR such an update is not necessary
  • This PR includes any and all required license headers
  • This PR ensures that database migration sequence is correct OR this PR does not include a database migration
  • This PR DOES NOT FIX A SERIOUS SECURITY VULNERABILITY (see the Apache Software Foundation's security guidelines for details)

@rob05c rob05c added bug something isn't working as intended Traffic Monitor related to Traffic Monitor high impact impacts the basic function, deployment, or operation of a CDN labels May 29, 2019
@asfgit
Copy link
Copy Markdown
Contributor

asfgit commented May 30, 2019

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/trafficcontrol-PR/3760/
Test FAILed.

@asfgit
Copy link
Copy Markdown
Contributor

asfgit commented May 30, 2019

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/trafficcontrol-PR/3762/
Test PASSed.

Copy link
Copy Markdown
Member

@ezelkow1 ezelkow1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, will need to merge changelog.md

rob05c added 3 commits June 17, 2019 14:11
This fixes the Monitor racing and the health poll marking Available
caches which were marked Unavailable by the Stat poll.

There was already logic to prevent that, but it wasn't accounting
for the Calculated stats, so the Health poll didn't check Calculated
stats for thresholds, but it does "have" those stats, so it thought
it could mark them available again.

This fixes the Health poll to also check thresholds for Calculated
stats, which not only fixes the race, but makes thresholds be
marked up and down faster with the quicker Health poll.
@rob05c rob05c force-pushed the fix-tm-thresholds branch from f440a3d to 10ddeab Compare June 17, 2019 20:11
@rob05c
Copy link
Copy Markdown
Member Author

rob05c commented Jun 17, 2019

@ezelkow1 Done.

@asfgit
Copy link
Copy Markdown
Contributor

asfgit commented Jun 17, 2019

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/trafficcontrol-PR/3864/
Test PASSed.

Copy link
Copy Markdown

@dsouza93 dsouza93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I was able to reproduce the bug in the nightly environment. I set the threshold to a very high parameter and noticed that the state was flapping.
When running this same test with an updated traffic monitor, the caches are placed in a down state and remain there.

@mitchell852 mitchell852 merged commit 139d796 into apache:master Jun 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

bug something isn't working as intended high impact impacts the basic function, deployment, or operation of a CDN Traffic Monitor related to Traffic Monitor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants