Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TO update_status to exclude ADMIN_DOWN #3689

Closed
wants to merge 1 commit into from

Conversation

rob05c
Copy link
Member

@rob05c rob05c commented Jun 20, 2019

Specifically, changed to only include ONLINE and REPORTED, and thus
also exclude PREPROD and any other similar future states.

Fixes #3687

Still WIP, haven't tested.

What does this PR (Pull Request) do?

  • This PR fixes #REPLACE_ME OR is not related to any Issue

Which Traffic Control components are affected by this PR?

  • CDN in a Box
  • Documentation
  • Grove
  • Traffic Control Client
  • Traffic Monitor
  • Traffic Ops
  • Traffic Ops ORT
  • Traffic Portal
  • Traffic Router
  • Traffic Stats
  • Traffic Vault

What is the best way to verify this PR?

If this is a bug fix, what versions of Traffic Ops are affected?

The following criteria are ALL met by this PR

  • This PR includes tests OR I have explained why tests are unnecessary
  • This PR includes documentation OR I have explained why documentation is unnecessary
  • This PR includes an update to CHANGELOG.md OR such an update is not necessary
  • This PR includes any and all required license headers
  • This PR ensures that database migration sequence is correct OR this PR does not include a database migration
  • This PR DOES NOT FIX A SERIOUS SECURITY VULNERABILITY (see the Apache Software Foundation's security guidelines for details)

@rob05c rob05c added bug something isn't working as intended Traffic Ops related to Traffic Ops WIP "Work-in-Progress" - do not merge! (use 'draft' pull requests from now on) labels Jun 20, 2019
@rob05c rob05c force-pushed the to-fix-updatestatus-admindown branch from ffa15a3 to 690732b Compare June 20, 2019 16:27
Fixes apache#3687

Specifically, changed to only include ONLINE and REPORTED, and thus
also exclude PREPROD and any other similar future states.
@rob05c rob05c force-pushed the to-fix-updatestatus-admindown branch from 690732b to f2bbe61 Compare June 20, 2019 16:27
@mitchell852 mitchell852 added the high impact impacts the basic function, deployment, or operation of a CDN label Jun 20, 2019
@asfgit
Copy link
Contributor

asfgit commented Jun 20, 2019

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/trafficcontrol-PR/3883/
Test PASSed.

@@ -49,7 +49,7 @@ func getServerUpdateStatus(tx *sql.Tx, cfg *config.Config, hostName string) ([]t
baseSelectStatement :=
`WITH parentservers AS (SELECT ps.id, ps.cachegroup, ps.cdn_id, ps.upd_pending, ps.reval_pending FROM server ps
LEFT JOIN status AS pstatus ON pstatus.id = ps.status
WHERE pstatus.name != 'OFFLINE' ),
WHERE (pstatus.name = '` + string(tc.CacheStatusOnline) + `' OR pstatus.name = '` + string(tc.CacheStatusReported) + `') ),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like it could cause other problems. i.e. does an admin_down mid still need to be marked as updated from a upd_pending / reval_pending perspective? if it is excluded and an admin_down mid is flipped to reported suddenly, it might not have the required regex_revalidate.config file that it needs to invalidate content correctly.

my point is, since the perl endpoint only excluded OFFLINE, maybe that was done for a very valid reason.

https://github.com/apache/trafficcontrol/blob/master/traffic_ops/app/lib/UI/Server.pm#L917

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the ideal solution would handle that. But the alternative is Edges never being able to finish a Reval, if any Mid is ADMIN_DOWN and not running ORT. Until we can get the ideal solution in place, this seems like the less bad option.

Copy link
Member

@mitchell852 mitchell852 Jun 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rob05c - do you know off the top of your head what happens when a cache is moved from admin_down/offline/pre_prod to online/reported? i'm guessing nothing happens (except queue updates is triggered on all "child" caches").

my point is - if a mid is moved from admin_down/offline/pre_prod to online/reported, it could end up with an incorrect regex_revalidate.config file (which could screw up content invalidation) but maybe it's standard operating procedure to "badass" a cache when moving it to online/reported status?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding my thoughts, belive it may help some way.

I assume, the ADMIN_DOWN for the cache server will be set by the admin person through trafficops UI( ie., OFFLINE the server)
Consider the below setup and scenario:

SETUP:
mid_cache_group1: mid1 and mid2
edge_cache_group1: edge1 (Its parent is mid_cache_group1)

SCENARIO:
If operator ADMIN_DOWN mid1 for some reason(say mid1 is dead).
Then upd_pending should set to true for mid1,mid2 and edge1. Hope this will done by the operator through trafficops Queue update.
But immediate ORT run on edge1 should wait only for mid2 (for to complete upd_pending) NOT for mid1 (since mid1 is ADMIN_DOWN).
Though mid1 is ADMIN_DOWN, on its cron schedule ORT job, it will update its required configuration since upd_pending is set to True.

SUGGESTION:
Like OFFLINE in that query, can we handle something like <<<...WHERE pstatus.name != 'OFFLINE' AND pstatus.name != 'ADMIN_DOWN')..>>

@mitchell852
Copy link
Member

mitchell852 commented Oct 7, 2020

@rob05c - do you want to close this as it looks like there are other solutions to this? i.e. #5117

@rob05c rob05c closed this Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something isn't working as intended high impact impacts the basic function, deployment, or operation of a CDN Traffic Ops related to Traffic Ops WIP "Work-in-Progress" - do not merge! (use 'draft' pull requests from now on)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ignoring ADMIN_DOWN/PRE_PROD Cache Servers while setting parent_pending flag
4 participants