Skip to content
This repository has been archived by the owner on Nov 30, 2021. It is now read-only.

fix(logspout): discover logger connection continuously #5009

Merged
merged 4 commits into from
Apr 6, 2016

Conversation

bacongobbler
Copy link
Member

Logspout would only discover the logger's host/port information at
boot. When the logger is rebooted and is re-scheduled onto another
host, the logspout agents were not updated. This change makes the
agent check etcd every 10 seconds and updates the logger route
information to ensure it's sending logs to the right address.

fixes #4293

// NOTE(bacongobbler): router.Add() will overwrite an existing route. No connection
// cleanup is required; logspout spawns a one-off connection for every route then
// immediately closes them.
router.Add(&Route{ID: "etcd", Target: Target{Type: "syslog", Addr: host, Protocol: protocol}})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most significant change that makes this all possible: by using a static ID for the route (etcd), we can overwrite its value since RouteManager just stores them in a map[string]Route, where the string is the route's ID.

@nathansamson
Copy link
Contributor

Will this add additional load on etcd (the network) or is it just purely a "CPU" operation reading from what is currently in cache on the current etcd node? (which I actually assume goes over localhost but not over a real network)?
Would it make sense to make the timeout configurable?

Previously the config loading was done synchronously before proceeding. It looks to me like now there is a possibility the code already advanced, before adding the etcd route. Not sure if this will cause actually problems (probably a few missing logs)
Maybe loading the etcd router once first to preserve old behaviour?

@bacongobbler
Copy link
Member Author

Will this add additional load on etcd (the network) or is it just purely a "CPU" operation reading from what is currently in cache on the current etcd node? (which I actually assume goes over localhost but not over a real network)?

Yes, this creates one more client connection every ten seconds to etcd. The trade-off is not having to restart logspout every time the logger jumps hosts.

Would it make sense to make the timeout configurable?

The timeouts on logger's publish intervals are not configurable either so I did not want to make that change.

Maybe loading the etcd router once first to preserve old behaviour?

Sure.

@mboersma
Copy link
Member

mboersma commented Apr 5, 2016

I'm seeing duplicated log lines when testing this interactively:

$ deis logs
2016-04-05T16:38:33UTC elfish-tinkerer[deis-controller]: matt created initial release
2016-04-05T16:38:33UTC elfish-tinkerer[deis-controller]: matt created initial release
2016-04-05T16:38:33UTC elfish-tinkerer[deis-controller]: matt created initial release
2016-04-05T16:38:33UTC elfish-tinkerer[deis-controller]: matt created initial release
... (repeated 50 times)
2016-04-05T16:39:41UTC elfish-tinkerer[deis-controller]: matt deployed 6b142e5
2016-04-05T16:39:41UTC elfish-tinkerer[deis-controller]: matt deployed 6b142e5
2016-04-05T16:39:41UTC elfish-tinkerer[deis-controller]: matt deployed 6b142e5
2016-04-05T16:39:41UTC elfish-tinkerer[deis-controller]: matt deployed 6b142e5
... (you get the idea)

I reverted to master logspout just as a sanity check and deployed a new app:

$ deis logs
2016-04-05T16:47:05UTC zephyr-huntress[deis-controller]: matt created initial release
2016-04-05T16:48:42UTC zephyr-huntress[deis-controller]: matt deployed 6b142e5
2016-04-05T16:49:18UTC zephyr-huntress[deis-controller]: matt scaled containers web=1
2016-04-05T16:49:48UTC zephyr-huntress[web.1]: Server listening on port 5000 in production mode

I don't see any significant CPU load as a result of this change, however. I watched top on each of the 3 CoreOS nodes while testing.

@bacongobbler
Copy link
Member Author

thanks @mboersma. Sorry about that! My refactor missed a single line of code that was very important. It's been added back in 4d610ca

@mboersma
Copy link
Member

mboersma commented Apr 5, 2016

Nice. Works well now, still no CPU load noticeable. 🚢 🇮🇹

@carmstrong
Copy link
Contributor

Is there a chance for lost log messages if they are sent between when the router removes the old "etcd" route and adds the new one? i.e. zero routes?

@bacongobbler
Copy link
Member Author

Yes, there is a condition where logs incoming from the docker socket could not be sent to the logger. Unfortunately I don't see a way around this as calling router.Remove("etcd") immediately following router.Add() will clear the new route we just added. We also cannot edit an existing route's Target in the routing table as that value is immediately shipped off to a goroutine which spawns a TCP connection, so that cannot be modified on the fly, only killed off with .Remove.

I think the best course of action would be to fetch the etcd route from the routing table, compare against the route we want to add and see if their targets are identical. If they differ then we can remove the old one since that route is stale and add the new one. If they're identical then it's a no-op. Does that seem fair?

@bacongobbler
Copy link
Member Author

Done. We now check the old route against the new proposed route. If they're identical it's a no-op.

Matthew Fisher added 4 commits April 5, 2016 12:25
Logspout would only discover the logger's host/port information at
boot. When the logger is rebooted and is re-scheduled onto another
host, the logspout agents were not updated. This change makes the
agent check etcd every 10 seconds and updates the logger route
information to ensure it's sending logs to the right address.
@bacongobbler bacongobbler merged commit e2aeace into deis:master Apr 6, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Logspout only discovers logger at startup
5 participants