Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traffic Monitor start doesn't recover when Traffic Ops is unavailable #6129

Closed
smalenfant opened this issue Aug 20, 2021 · 2 comments · Fixed by #6146
Closed

Traffic Monitor start doesn't recover when Traffic Ops is unavailable #6129

smalenfant opened this issue Aug 20, 2021 · 2 comments · Fixed by #6146
Assignees
Labels
bug something isn't working as intended Traffic Monitor related to Traffic Monitor
Milestone

Comments

@smalenfant
Copy link
Contributor

smalenfant commented Aug 20, 2021

This Bug Report affects these Traffic Control components:

  • Traffic Monitor

Current behavior:

When Traffic Ops is offline and Traffic Monitor is restarted. The following happens:

  • Doesn't use a back CR-Config (by design, probably)
  • It doesn't recover when Traffic Ops comes back
  • Manual Traffic Monitor restart needs to occur

Logs:

ERROR: opsconfig.go:77: 2021-08-20T12:58:06.738891318Z: OpsConfigManager: Error getting Traffic Ops data: Error getting CRconfig from Traffic Ops: nil session
WARNING: monitorconfig.go:107: 2021-08-20T12:58:10.05145308Z: MonitorConfigPoller: skipping this iteration, Session is nil
WARNING: monitorconfig.go:107: 2021-08-20T12:58:15.051570293Z: MonitorConfigPoller: skipping this iteration, Session is nil
WARNING: monitorconfig.go:107: 2021-08-20T12:58:20.050974835Z: MonitorConfigPoller: skipping this iteration, Session is nil
WARNING: monitorconfig.go:107: 2021-08-20T12:58:25.051700515Z: MonitorConfigPoller: skipping this iteration, Session is nil
WARNING: monitorconfig.go:107: 2021-08-20T12:58:30.050987553Z: MonitorConfigPoller: skipping this iteration, Session is nil

Expected behavior:

Traffic Monitor should recover from Traffic Ops coming back

Steps to reproduce:

systemctl stop traffic_ops # Stop Traffic Ops
systemctl restart  traffic_monitor # Restart Traffic Monitor while Traffic Ops is unavailable
systemctl start traffic_ops # Start Traffic Ops
  • Observe the Traffic Monitor logs
@smalenfant smalenfant added the bug something isn't working as intended label Aug 20, 2021
@smalenfant smalenfant added this to the 6.0.0 milestone Aug 20, 2021
@ocket8888 ocket8888 added the Traffic Monitor related to Traffic Monitor label Aug 20, 2021
@ocket8888
Copy link
Contributor

Doesn't use a back CR-Config (by design, probably)

Is that by design? I would think we'd want it to use the backup.

Also, is this a regression? Did it used to restart okay while TO was down?

@rawlinp
Copy link
Contributor

rawlinp commented Aug 20, 2021

Even if it's not an actual regression, it does seem like one, since this is one scenario I think we were supposed to have covered by introducing the snapshot backups. If it restarts when TO is unavailable, it should use the backup snapshots.

@rawlinp rawlinp self-assigned this Aug 23, 2021
rawlinp added a commit to rawlinp/trafficcontrol that referenced this issue Aug 27, 2021
Always set non-nil TO sessions that can eventually authenticate
themselves when TO recovers. Additionally, instead of exclusively using
one major client version after starting up and logging in, always
attempt the latest major version and fall back to the legacy version in
case of error. This will allow TM to seamlessly transition between using
either major version no matter which order TM and TO are upgraded in.

Closes: apache#6129
zrhoffman pushed a commit that referenced this issue Aug 30, 2021
* Make TM recover from TO being unavailable at startup

Always set non-nil TO sessions that can eventually authenticate
themselves when TO recovers. Additionally, instead of exclusively using
one major client version after starting up and logging in, always
attempt the latest major version and fall back to the legacy version in
case of error. This will allow TM to seamlessly transition between using
either major version no matter which order TM and TO are upgraded in.

Closes: #6129

* Organize imports, add UsingDummyTO back
zrhoffman pushed a commit that referenced this issue Aug 30, 2021
* Make TM recover from TO being unavailable at startup

Always set non-nil TO sessions that can eventually authenticate
themselves when TO recovers. Additionally, instead of exclusively using
one major client version after starting up and logging in, always
attempt the latest major version and fall back to the legacy version in
case of error. This will allow TM to seamlessly transition between using
either major version no matter which order TM and TO are upgraded in.

Closes: #6129

* Organize imports, add UsingDummyTO back

(cherry picked from commit 16dda64)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something isn't working as intended Traffic Monitor related to Traffic Monitor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants