Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re #149 support event streaming #151

Merged
merged 7 commits into from
Nov 12, 2015

Conversation

bsideup
Copy link
Contributor

@bsideup bsideup commented Jul 25, 2015

I'm happy to contribute this small, but very helpful improvement - event stream listener.

Starting from Marathon 0.9.0 it provides /v2/events endpoint, which is implementation of Server-sent Events. It's much better compared to event callbacks since there is no need to register Bamboo as callback on Marathon.

Implementation is backward compatible with 0.7.0 because this mode is disabled by default. It could be enabled in config like that:

{
  "Marathon": {
    "Endpoint": "...",
    "UseEventStream": true
  }
}

NB: if "UseEventStream" is enabled then callbacks are not auto-registered.

When Bamboo will drop support for Marathon < 0.9.0 then this property could be true by default.

Implementation was tested with Marathon 0.9.0, covered cases:

  1. Listening for events
  2. Re-connection when Marathon is not responding
  3. Bad data from Marathon
  4. Many events at a time


resp, err := client.Do(req)
if err != nil {
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be logged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to handle stopped Marathon? Should I use exponential backoff (like this one: https://github.com/cenkalti/backoff) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that much complexity is needed, something as simple as giving every gorouting having a time.Ticker(time.Second), and then waiting for it before creating a new connection would be enough to ensure that Bamboo isn't spamming Marathon too hard, while still allowing the odd disconnection without error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just log error and sleep for 1 second then? Usually backoff is used in such sutiations, but if you fill it complex then simple delay here should be fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ticker is redundant in case of just closed connection because closed connection doesn't mean to be a disaster, it could mean just "reconnect to me"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me describe whole scheme:

  1. try to open connection
  2. if failed, wait 1 sec and go to 1)
  3. if not, read data from connection
  4. if failed, log and go to 1 without delay
  5. if not, read event and go to 3)

Which means that:
If failure happened while reading event on open connection, then whole cycle will start from 1)
If failure happened while connection opening then each attempt will happen not more than once per second.

So:
It's ok to wait 1 second if it's impossible to open connection to marathon.
If error happened during "read" phase, then we should retry connection immediately because it could be just dropped connection, but Marathon is still able to serve requests. If it's not, then it will loop on 1)-2).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, it's roughly the same in the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No :) EOF (as well as other read errors) will not happen if your server is down because connection will not be open :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.wikipedia.org/wiki/Byzantine_fault_tolerance Treating particular errors specially isn't reliable. There are many things that can go wrong other than a server just being up or down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meh...... ok, I will add Ticker. Still not sure that it's required here :)

@lclarkmichalek
Copy link
Contributor

Looks good for the most part, I've added a couple of comments. On a broader note, I'd love to see some of this broken out into smaller functions; a function to parse a single message and return an event would be great, as it'd allow for very easy testing. At the moment, this is a rather had function to test.

@bsideup
Copy link
Contributor Author

bsideup commented Jul 26, 2015

@bluepeppers I've made some changes, including:

  1. Reuse code from EventSubscriptionAPI for parsing and dispatching events
  2. Use Ticker for retrying
  3. Log every error

Currently listenToEventStream contains only SSE handling logic. Should I split it more?

@lclarkmichalek
Copy link
Contributor

I like this. I'd like to split it up more, and break the whole thing out behind an interface, but this is a move in the right direction. Seeing as this is opt in, I'd be happy merging it now, but will probably wait till Monday, and confer with @activars about the best way to test this mode with some decent traffic.

@bsideup
Copy link
Contributor Author

bsideup commented Jul 27, 2015

@bluepeppers ok, lets wait what @activars will say :)

registerMarathonEvent(conf)
if conf.Marathon.UseEventStream {
// Listen events stream from Marathon
listenToEventStream(conf, eventSubAPI)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor consistency issue. Do you think i's better to align with the callback model function name? e.g. renaming listenToEventSteam to registerMarathonEventStream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right. Fixing...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, stop :) listenTo<...> was used to be consistent with listenToZookeeper. registerMarathonEvent will register Bamboo in Marathon, but listenToEventStream will listen stream and will not register anything. So I will go with listenToEventStream if you don't mind

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Maybe this is more accurate?
listenToMarathonEventStream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, no objections :) Will rename it now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :)

@j1n6
Copy link
Contributor

j1n6 commented Jul 27, 2015

Looks good 👍
We will test this branch on our dev cluster.

@j1n6
Copy link
Contributor

j1n6 commented Jul 27, 2015

don't worry about the abstraction too much, we will refactor some common features in next release.

@drewrobb
Copy link

docker build with this was broken for me-- use of the http.Client timeout requires go 1.4, the dockerfile uses go 1.2

@drewrobb
Copy link

You also may want to wait until the fix for mesosphere/marathon#1926 is included in a marathon release (probably 0.11). Otherwise there could in some cases be a delay for bamboo to receive events.

@lclarkmichalek
Copy link
Contributor

@bsideup hey, sorry we haven't done anything on this. I've finally automated some bamboo integration testing stuff, and so I'll give this a shot on the staging cluster in the next week.

@bsideup
Copy link
Contributor Author

bsideup commented Sep 8, 2015

Hi @bluepeppers ! Sounds great :) Ping me if anything will not work for you :)

@lclarkmichalek
Copy link
Contributor

One minor thing; we have a username and password for marathon configurable: https://github.com/QubitProducts/bamboo/blob/master/configuration/marathon.go#L10-L15. The event stream request should probably support HTTP basic auth also

@bsideup
Copy link
Contributor Author

bsideup commented Sep 14, 2015

@bluepeppers it was introduced after this PR :) But I can support it if you want

@bsideup
Copy link
Contributor Author

bsideup commented Sep 14, 2015

@drewrobb good catch. Looks like ubuntu base image comes with Go 1.2 by default. Any idea how to fix it?

@bsideup
Copy link
Contributor Author

bsideup commented Sep 14, 2015

@bluepeppers rebased on top of last master

@bsideup
Copy link
Contributor Author

bsideup commented Sep 14, 2015

@bluepeppers implemented basic auth

@lclarkmichalek
Copy link
Contributor

The go vet message at https://travis-ci.org/QubitProducts/bamboo/builds/80229954 was very useful. Without that change, the goroutine was listening to the same (last) marathon always, which causes the problem @drewrobb linked with marathon to be much more of an issue. I was having problems getting event streaming to work with this patch, but I think that go vet may have found the problem.

@bsideup
Copy link
Contributor Author

bsideup commented Sep 14, 2015

@bluepeppers yes, already fixed :)

@bsideup
Copy link
Contributor Author

bsideup commented Oct 15, 2015

@bluepeppers any news?

@drewrobb
Copy link

+1 to getting this in, I have been using it for a while now successfully

@lavcraft
Copy link

+1

@j1n6
Copy link
Contributor

j1n6 commented Oct 27, 2015

Appologize we haven't got chance to test this out because there are a lot of other tasks we need to work on internally.
If anyone has time to test it out & sign-off on the status (with 3 thumbs up), we can speed up the release process.

We will deploy this on an internal test cluster to test it out in November.

@bsideup
Copy link
Contributor Author

bsideup commented Oct 27, 2015

Since this is an optional mode I think it's pretty safe to merge, I know at
least 2 production deployments running my fork so it's battle tested:)
On Tue, 27 Oct 2015 at 6:59 PM Jing Dong notifications@github.com wrote:

Appologize we haven't got chance to test this out because there are a lot
of other tasks we need to work on internally.
If anyone has time to test it out & sign-off on the status (with 3 thumbs
up), we can speed up the release process.

We will deploy this on an internal test cluster to test it out in
November.


Reply to this email directly or view it on GitHub
#151 (comment).

@lclarkmichalek
Copy link
Contributor

I'll merge this today (absent any issues). Sorry it took so long

@lclarkmichalek lclarkmichalek merged commit 7f0a0f8 into QubitProducts:master Nov 12, 2015
@bsideup
Copy link
Contributor Author

bsideup commented Nov 12, 2015

Awesome, thanks!
On Thu, 12 Nov 2015 at 8:58 PM Laurie Clark-Michalek <
notifications@github.com> wrote:

Merged #151 #151.


Reply to this email directly or view it on GitHub
#151 (comment).

@lavcraft
Copy link

Yeah, thanks guys!

@bsideup
Copy link
Contributor Author

bsideup commented Nov 13, 2015

@bluepeppers was it released btw?

@lclarkmichalek
Copy link
Contributor

No. The next release will probably be 0.3, with this and #180 being the features I want to test the most. I'm running master + #180 in dev atm, though there's still some extra work to be done around the UI for #180.

@bsideup
Copy link
Contributor Author

bsideup commented Nov 13, 2015

@bluepeppers ok, keep me in touch then :) Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants