Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TO update_status to exclude ADMIN_DOWN #3689

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ func getServerUpdateStatus(tx *sql.Tx, cfg *config.Config, hostName string) ([]t
baseSelectStatement :=
`WITH parentservers AS (SELECT ps.id, ps.cachegroup, ps.cdn_id, ps.upd_pending, ps.reval_pending FROM server ps
LEFT JOIN status AS pstatus ON pstatus.id = ps.status
WHERE pstatus.name != 'OFFLINE' ),
WHERE (pstatus.name = '` + string(tc.CacheStatusOnline) + `' OR pstatus.name = '` + string(tc.CacheStatusReported) + `') ),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like it could cause other problems. i.e. does an admin_down mid still need to be marked as updated from a upd_pending / reval_pending perspective? if it is excluded and an admin_down mid is flipped to reported suddenly, it might not have the required regex_revalidate.config file that it needs to invalidate content correctly.

my point is, since the perl endpoint only excluded OFFLINE, maybe that was done for a very valid reason.

https://github.com/apache/trafficcontrol/blob/master/traffic_ops/app/lib/UI/Server.pm#L917

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the ideal solution would handle that. But the alternative is Edges never being able to finish a Reval, if any Mid is ADMIN_DOWN and not running ORT. Until we can get the ideal solution in place, this seems like the less bad option.

Copy link
Member

@mitchell852 mitchell852 Jun 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rob05c - do you know off the top of your head what happens when a cache is moved from admin_down/offline/pre_prod to online/reported? i'm guessing nothing happens (except queue updates is triggered on all "child" caches").

my point is - if a mid is moved from admin_down/offline/pre_prod to online/reported, it could end up with an incorrect regex_revalidate.config file (which could screw up content invalidation) but maybe it's standard operating procedure to "badass" a cache when moving it to online/reported status?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding my thoughts, belive it may help some way.

I assume, the ADMIN_DOWN for the cache server will be set by the admin person through trafficops UI( ie., OFFLINE the server)
Consider the below setup and scenario:

SETUP:
mid_cache_group1: mid1 and mid2
edge_cache_group1: edge1 (Its parent is mid_cache_group1)

SCENARIO:
If operator ADMIN_DOWN mid1 for some reason(say mid1 is dead).
Then upd_pending should set to true for mid1,mid2 and edge1. Hope this will done by the operator through trafficops Queue update.
But immediate ORT run on edge1 should wait only for mid2 (for to complete upd_pending) NOT for mid1 (since mid1 is ADMIN_DOWN).
Though mid1 is ADMIN_DOWN, on its cron schedule ORT job, it will update its required configuration since upd_pending is set to True.

SUGGESTION:
Like OFFLINE in that query, can we handle something like <<<...WHERE pstatus.name != 'OFFLINE' AND pstatus.name != 'ADMIN_DOWN')..>>

use_reval_pending AS (SELECT value::boolean FROM parameter WHERE name = 'use_reval_pending' AND config_file = 'global' UNION ALL SELECT FALSE FETCH FIRST 1 ROW ONLY)
SELECT s.id, s.host_name, type.name AS type, (s.reval_pending::boolean) as server_reval_pending, use_reval_pending.value, s.upd_pending, status.name AS status, COALESCE(bool_or(ps.upd_pending), FALSE) AS parent_upd_pending, COALESCE(bool_or(ps.reval_pending), FALSE) AS parent_reval_pending FROM use_reval_pending, server s
LEFT JOIN status ON s.status = status.id
Expand Down