Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operators can more easily see when routes and endpoints are added or removed from a given gorouter's routing table in the gorouter logs #172

Closed
mcwumbly opened this issue Jul 9, 2020 · 3 comments

Comments

@mcwumbly
Copy link
Member

@mcwumbly mcwumbly commented Jul 9, 2020

Summary

Our logs have very limited information about the state of the routing table as we only log when routes are unregistered not when they are registered.

In addition, the unregister messages themselves are noisier than need be because we log any time we receive an "unregister-route" message, and these messages are repeated a number of times by the route-emitter make sure they are received.

Rather than logging what messages we receive, it would be nice if instead we had the following Info level logs if and only if those messages result in an change to the routing table:

  • "route-added"
  • "route-removed"
  • "endpoint-added"
  • "endpoint-removed"

Expected Results

Given A new route is added
When a given gorouter adds the route to its routing table
Then I see exactly one log message in gorouter.stdout.log indicating the route was added.

Given A route is removed or pruned
When a given gorouter removes the route from its routing table
Then I see exactly one log message in gorouter.stdout.log indicating the route was added.

Given An endpoint is added for a route (e.g. when an app is scaled up from N to N+1 instances)
When a given gorouter adds the endpoint to its routing table
Then I see exactly one log message in gorouter.stdout.log for each endpoint added.

Given An endpoint is removed or pruned for a route (e.g. when an app is scaled down from N to N-1 instances)
When a given gorouter removes the endpoint from its routing table
Then I see exactly one log message in gorouter.stdout.log for each endpoint removed.

Context

An ongoing source of support burden circles around questions of "stale routes" which are often suspected as a possible cause of folks 'issues when getting errors in their requests to apps, and our documentation for troubleshooting error responses has an entire section devoted to diagnosing stale routes.

The gorouter routing table is eventually consistent (and therefore often inconsistent). The docs cover how to check the routing table at any moment using the /var/vcap/jobs/gorouter/bin/retreive-local-routes script that gets put on the router VM. But that only tells you what routes the router has in its table now. When handling support issues, we usually start with logs.

Our logs have very limited information about the state of the routing table as we only log when routes are unregistered not when they are registered. More specifically we log any time we receive an "unregister-route" message.

"register-route" messages and "unregister-route" messages are both sent repeatedly to make sure they are received, so logging them all is noisy. While still noisy, we do log "unregister-route" messages because they stop after some time. But "register-route" messages are repeated indefinitely, so we don't log them at all.

Rather than logging what messages we receive at Info level, it would be nice if instead we logged at Info level any time a register-route message results in a route or endpoint actually being added to or removed from the routing table.

@mcwumbly
Copy link
Member Author

@mcwumbly mcwumbly commented Aug 27, 2020

I've spiked this out and here are the results so far:

branch: https://github.com/cloudfoundry/gorouter/tree/log-route-table-changes
commit: cloudfoundry/gorouter@56c787d

The log messages below are the only log messages I see now... no more repeated "unregister-route" or "start-pruning" messages...

I am getting a couple duplicate messages when scaling apps up and down, but only single repeats... haven't dug into what's going on there yet.

Add a new route:

$ cf map-route foo springgreen.cf-app.com --hostname a
Mapping route a.springgreen.cf-app.com to app foo in org o / space s as admin...
OK

gorouter.stdout.log:

{"log_level":1,"timestamp":"2020-08-27T22:59:43.462087363Z","message":"route-registered","source":"vcap.gorouter.registry","data":{"uri":"a.springgreen.cf-app.com"}}
{"log_level":1,"timestamp":"2020-08-27T22:59:43.462279999Z","message":"endpoint-registered","source":"vcap.gorouter.registry","data":{"uri":"a.springgreen.cf-app.com","backend":"10.0.1.11:61002","modification_tag":{"guid":"","index":0},"isolation_segment":"-","isTLS":true}}

Scale app from 1 to 2 instances

$ cf scale foo -i 2
Scaling app foo in org o / space s as admin...

Instances starting...

Showing current scale of app foo in org o / space s as admin...

name:              foo
requested state:   started
routes:            foo.springgreen.cf-app.com, a.springgreen.cf-app.com
last uploaded:     Thu 27 Aug 15:27:34 PDT 2020
stack:             cflinuxfs3
buildpacks:        go

type:           web
sidecars:
instances:      1/2
memory usage:   32M
     state      since                  cpu    memory        disk          details
#0   running    2020-08-27T22:59:17Z   0.3%   8.1M of 32M   9.5M of 32M
#1   starting   2020-08-27T22:59:56Z   0.0%   0 of 32M      0 of 32M

gorouter.stdout.log:

{"log_level":1,"timestamp":"2020-08-27T22:59:59.350998043Z","message":"endpoint-registered","source":"vcap.gorouter.registry","data":{"uri":"a.springgreen.cf-app.com","backend":"10.0.1.11:61006","modification_tag":{"guid":"","index":0},"isolation_segment":"-","isTLS":true}}
{"log_level":1,"timestamp":"2020-08-27T22:59:59.351131999Z","message":"endpoint-registered","source":"vcap.gorouter.registry","data":{"uri":"foo.springgreen.cf-app.com","backend":"10.0.1.11:61006","modification_tag":{"guid":"","index":0},"isolation_segment":"-","isTLS":true}}

Scale app from 2 to 1 instances

$ cf scale foo -i 1
Scaling app foo in org o / space s as admin...

Instances starting...

Showing current scale of app foo in org o / space s as admin...

name:              foo
requested state:   started
routes:            foo.springgreen.cf-app.com, a.springgreen.cf-app.com
last uploaded:     Thu 27 Aug 15:27:34 PDT 2020
stack:             cflinuxfs3
buildpacks:        go

type:           web
sidecars:
instances:      1/1
memory usage:   32M
     state     since                  cpu    memory        disk          details
#0   running   2020-08-27T22:59:17Z   0.3%   8.1M of 32M   9.5M of 32M

gorouter.stdout.log:

{"log_level":1,"timestamp":"2020-08-27T23:00:27.122616625Z","message":"endpoint-unregistered","source":"vcap.gorouter.registry","data":{"uri":"a.springgreen.cf-app.com","backend":"10.0.1.11:61006","modification_tag":{"guid":"","index":0},"isolation_segment":"-","isTLS":true}}
{"log_level":1,"timestamp":"2020-08-27T23:00:27.123043785Z","message":"endpoint-unregistered","source":"vcap.gorouter.registry","data":{"uri":"foo.springgreen.cf-app.com","backend":"10.0.1.11:61006","modification_tag":{"guid":"","index":0},"isolation_segment":"-","isTLS":true}}

Remove a route

$ cf unmap-route foo springgreen.cf-app.com --hostname a
Removing route a.springgreen.cf-app.com from app foo in org o / space s as admin...
OK

gorouter.stdout.log:

{"log_level":1,"timestamp":"2020-08-27T23:00:46.702876112Z","message":"endpoint-unregistered","source":"vcap.gorouter.registry","data":{"uri":"a.springgreen.cf-app.com","backend":"10.0.1.11:61002","modification_tag":{"guid":"","index":0},"isolation_segment":"-","isTLS":true}}
{"log_level":1,"timestamp":"2020-08-27T23:00:46.703133349Z","message":"route-unregistered","source":"vcap.gorouter.registry","data":{"uri":"a.springgreen.cf-app.com"}}

@mcwumbly
Copy link
Member Author

@mcwumbly mcwumbly commented Aug 27, 2020

I am getting a couple duplicate messages when scaling apps up and down, but only single repeats... haven't dug into what's going on there yet.

Oh, this is because I had two routes mapped to the app :)

mcwumbly added a commit to cloudfoundry/gorouter that referenced this issue Aug 30, 2020
With this change, log noise is significantly reduced and signal is
significantly boosted. Now, no messsages are emitted at steady state,
and all changes to the routing table are logged. This should enable one
to reconstruct the routing table at any point in time based on log
messages alone.

Additional details:

Log noise reduction:

Before, route and endpoint registrations and unregistrations were only
logged at debug level, so these messages were often unavailable in
troubleshooting scenarios.

Furthermore, unregistration _messages_ were logged at info level, which
increases log noise, as these messages are repeated several times.

Finally, log messages marking the start and end of the route/endpoint
pruning polling cycles are now reduced to debug level. In current
versions of gorouter, these rarely result in any activity, as the
pruning strategy for stale routes is now based on lazy evaluation of application identity
based on the TLS connection to backends. And in the case where this
pruning cycle is still meaningful, while the polling cycle is not
demarcated, messages for each route/endpoint that are pruned are still
logged at info level.

Signal boosting:

The following log messages are emitted any time the routing table
changes:

- `route-registered`: a new route is added to the table
- `route-unregistred`: an existing route is removed from the table
- `endpoint-registered`: a new backend is added to the table
   e.g. an app is scaled up and a new app instance is started
- `endpoint-unregistered`: a backend is removed from the table
   e.g. an app is scaled down and an app instance is stopped

cloudfoundry/routing-release#172
@jrussett
Copy link
Contributor

@jrussett jrussett commented Sep 1, 2020

Hi 👋

This functionality is now available in Routing Release 0.207.0.

Thanks,
@jrussett

@jrussett jrussett closed this Sep 1, 2020
@jrussett jrussett removed the scheduled label Sep 1, 2020
stefanlay added a commit to stefanlay/gorouter that referenced this issue Jan 29, 2021
With this change, log noise is significantly reduced and signal is
significantly boosted. Now, no messsages are emitted at steady state,
and all changes to the routing table are logged. This should enable one
to reconstruct the routing table at any point in time based on log
messages alone.

Additional details:

Log noise reduction:

Before, route and endpoint registrations and unregistrations were only
logged at debug level, so these messages were often unavailable in
troubleshooting scenarios.

Furthermore, unregistration _messages_ were logged at info level, which
increases log noise, as these messages are repeated several times.

Finally, log messages marking the start and end of the route/endpoint
pruning polling cycles are now reduced to debug level. In current
versions of gorouter, these rarely result in any activity, as the
pruning strategy for stale routes is now based on lazy evaluation of application identity
based on the TLS connection to backends. And in the case where this
pruning cycle is still meaningful, while the polling cycle is not
demarcated, messages for each route/endpoint that are pruned are still
logged at info level.

Signal boosting:

The following log messages are emitted any time the routing table
changes:

- `route-registered`: a new route is added to the table
- `route-unregistred`: an existing route is removed from the table
- `endpoint-registered`: a new backend is added to the table
   e.g. an app is scaled up and a new app instance is started
- `endpoint-unregistered`: a backend is removed from the table
   e.g. an app is scaled down and an app instance is stopped

cloudfoundry/routing-release#172
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants