fix(logspout): discover logger connection continuously #5009

bacongobbler · 2016-04-04T17:40:12Z

Logspout would only discover the logger's host/port information at
boot. When the logger is rebooted and is re-scheduled onto another
host, the logspout agents were not updated. This change makes the
agent check etcd every 10 seconds and updates the logger route
information to ensure it's sending logs to the right address.

fixes #4293

bacongobbler · 2016-04-04T17:44:26Z

logspout/logspout.go

+				// NOTE(bacongobbler): router.Add() will overwrite an existing route. No connection
+				// cleanup is required; logspout spawns a one-off connection for every route then
+				// immediately closes them.
+				router.Add(&Route{ID: "etcd", Target: Target{Type: "syslog", Addr: host, Protocol: protocol}})


This is the most significant change that makes this all possible: by using a static ID for the route (etcd), we can overwrite its value since RouteManager just stores them in a map[string]Route, where the string is the route's ID.

nathansamson · 2016-04-05T09:44:30Z

Will this add additional load on etcd (the network) or is it just purely a "CPU" operation reading from what is currently in cache on the current etcd node? (which I actually assume goes over localhost but not over a real network)?
Would it make sense to make the timeout configurable?

Previously the config loading was done synchronously before proceeding. It looks to me like now there is a possibility the code already advanced, before adding the etcd route. Not sure if this will cause actually problems (probably a few missing logs)
Maybe loading the etcd router once first to preserve old behaviour?

bacongobbler · 2016-04-05T15:51:08Z

Will this add additional load on etcd (the network) or is it just purely a "CPU" operation reading from what is currently in cache on the current etcd node? (which I actually assume goes over localhost but not over a real network)?

Yes, this creates one more client connection every ten seconds to etcd. The trade-off is not having to restart logspout every time the logger jumps hosts.

Would it make sense to make the timeout configurable?

The timeouts on logger's publish intervals are not configurable either so I did not want to make that change.

Maybe loading the etcd router once first to preserve old behaviour?

Sure.

mboersma · 2016-04-05T16:53:48Z

I'm seeing duplicated log lines when testing this interactively:

$ deis logs
2016-04-05T16:38:33UTC elfish-tinkerer[deis-controller]: matt created initial release
2016-04-05T16:38:33UTC elfish-tinkerer[deis-controller]: matt created initial release
2016-04-05T16:38:33UTC elfish-tinkerer[deis-controller]: matt created initial release
2016-04-05T16:38:33UTC elfish-tinkerer[deis-controller]: matt created initial release
... (repeated 50 times)
2016-04-05T16:39:41UTC elfish-tinkerer[deis-controller]: matt deployed 6b142e5
2016-04-05T16:39:41UTC elfish-tinkerer[deis-controller]: matt deployed 6b142e5
2016-04-05T16:39:41UTC elfish-tinkerer[deis-controller]: matt deployed 6b142e5
2016-04-05T16:39:41UTC elfish-tinkerer[deis-controller]: matt deployed 6b142e5
... (you get the idea)

I reverted to master logspout just as a sanity check and deployed a new app:

$ deis logs
2016-04-05T16:47:05UTC zephyr-huntress[deis-controller]: matt created initial release
2016-04-05T16:48:42UTC zephyr-huntress[deis-controller]: matt deployed 6b142e5
2016-04-05T16:49:18UTC zephyr-huntress[deis-controller]: matt scaled containers web=1
2016-04-05T16:49:48UTC zephyr-huntress[web.1]: Server listening on port 5000 in production mode

I don't see any significant CPU load as a result of this change, however. I watched top on each of the 3 CoreOS nodes while testing.

bacongobbler · 2016-04-05T17:53:29Z

thanks @mboersma. Sorry about that! My refactor missed a single line of code that was very important. It's been added back in 4d610ca

mboersma · 2016-04-05T17:58:01Z

Nice. Works well now, still no CPU load noticeable. 🚢 🇮🇹

carmstrong · 2016-04-05T18:34:26Z

Is there a chance for lost log messages if they are sent between when the router removes the old "etcd" route and adds the new one? i.e. zero routes?

bacongobbler · 2016-04-05T18:59:04Z

Yes, there is a condition where logs incoming from the docker socket could not be sent to the logger. Unfortunately I don't see a way around this as calling router.Remove("etcd") immediately following router.Add() will clear the new route we just added. We also cannot edit an existing route's Target in the routing table as that value is immediately shipped off to a goroutine which spawns a TCP connection, so that cannot be modified on the fly, only killed off with .Remove.

I think the best course of action would be to fetch the etcd route from the routing table, compare against the route we want to add and see if their targets are identical. If they differ then we can remove the old one since that route is stale and add the new one. If they're identical then it's a no-op. Does that seem fair?

bacongobbler · 2016-04-05T19:19:50Z

Done. We now check the old route against the new proposed route. If they're identical it's a no-op.

Logspout would only discover the logger's host/port information at boot. When the logger is rebooted and is re-scheduled onto another host, the logspout agents were not updated. This change makes the agent check etcd every 10 seconds and updates the logger route information to ensure it's sending logs to the right address.

bacongobbler self-assigned this Apr 4, 2016

bacongobbler added logspout awaiting review labels Apr 4, 2016

bacongobbler added this to the v1.13 LTS milestone Apr 4, 2016

jchauncey added the Code LGTM1 label Apr 4, 2016

bacongobbler reviewed Apr 4, 2016
View reviewed changes

bacongobbler added the requires-manual-testing label Apr 4, 2016

bacongobbler force-pushed the 4293-update-logger-target branch from 3941af2 to 1a7be8d Compare April 5, 2016 16:03

mboersma added Code LGTM2 and removed awaiting review labels Apr 5, 2016

bacongobbler removed the requires-manual-testing label Apr 5, 2016

bacongobbler force-pushed the 4293-update-logger-target branch from 54f1c46 to 5ec1e80 Compare April 5, 2016 19:23

Matthew Fisher added 4 commits April 5, 2016 12:25

fix(logspout): remove old route

0be5061

fix(logspout): perform a no-op if etcd route is identical

1dd208f

ref(logspout); remove magic IDs

7509f31

bacongobbler force-pushed the 4293-update-logger-target branch from 5ec1e80 to 7509f31 Compare April 5, 2016 19:25

bacongobbler merged commit e2aeace into deis:master Apr 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(logspout): discover logger connection continuously #5009

fix(logspout): discover logger connection continuously #5009

bacongobbler commented Apr 4, 2016

bacongobbler Apr 4, 2016

nathansamson commented Apr 5, 2016

bacongobbler commented Apr 5, 2016

mboersma commented Apr 5, 2016

bacongobbler commented Apr 5, 2016

mboersma commented Apr 5, 2016

carmstrong commented Apr 5, 2016

bacongobbler commented Apr 5, 2016

bacongobbler commented Apr 5, 2016

fix(logspout): discover logger connection continuously #5009

fix(logspout): discover logger connection continuously #5009

Conversation

bacongobbler commented Apr 4, 2016

bacongobbler Apr 4, 2016

Choose a reason for hiding this comment

nathansamson commented Apr 5, 2016

bacongobbler commented Apr 5, 2016

mboersma commented Apr 5, 2016

bacongobbler commented Apr 5, 2016

mboersma commented Apr 5, 2016

carmstrong commented Apr 5, 2016

bacongobbler commented Apr 5, 2016

bacongobbler commented Apr 5, 2016