Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework how the client connects to brokers. #10

Merged
merged 4 commits into from
Aug 14, 2013

Conversation

eapache
Copy link
Contributor

@eapache eapache commented Aug 14, 2013

@burke @fw42 Thorough review please - I want this concurrent locking pattern to make sense to somebody else before I merge it.

CC @tobi

Fixes #9.

This ended up being more complicated than I had hoped and touched several
different areas. TL;DR is that we now connect to the other brokers in the
cluster asynchronously. Errors connecting only show up when somebody tries to
use that broker.

This is better than the old behaviour since it means that if some brokers in a
cluster go down but the topics we care about are still available, we just keep
going instead of blowing up for no reason.

The complicated part is that simply calling go broker.Connect() doesn't do
what we want, so I had to write a broker.AsyncConnect(). The problem occurs if
you've got code like this:

go broker.Connect()
// do some stuff
broker.SendSomeMessage()

What can happen is that SendSomeMessage can be run before the Connect
goroutine ever gets scheduled, in which case SendSomeMessage will simply return
NotConnected. The desired behaviour is that SendSomeMessage waits for Connect
to finish, which means AsyncConnect has to synchronously take the broker lock
before it launches the asynchronous connect goroutine. Lots of fun.

And bonus change in this commit: rather than special-casing leader == -1 in
client.cachedLeader and adding a big long comment to the LEADER_NOT_AVAILABLE
case explaining the fallthrough statement, just delete that partition from the
hash. So much easier to follow, I must have been on crack when I wrote the old
way.

Fixes #9.

This ended up being more complicated than I had hoped and touched several
different areas. TL;DR is that we now connect to the other brokers in the
cluster asynchronously. Errors connecting only show up when somebody tries to
use that broker.

This is better than the old behaviour since it means that if some brokers in a
cluster go down but the topics we care about are still available, we just keep
going instead of blowing up for no reason.

The complicated part is that simply calling `go broker.Connect()` doesn't do
what we want, so I had to write a `broker.AsyncConnect()`. The problem occurs if
you've got code like this:
    go broker.Connect()
    // do some stuff
    broker.SendSomeMessage()
What can happen is that SendSomeMessage can be run before the Connect()
goroutine ever gets scheduled, in which case SendSomeMessage will simply return
NotConnected. The desired behaviour is that SendSomeMessage waits for Connect()
to finish, which means Connect() has to *synchronously* take the broker lock
before it launches the asynchronous connect call. Lots of fun.

And bonus change in this commit: rather than special-casing leader == -1 in
`client.cachedLeader` and adding a big long comment to the LEADER_NOT_AVAILABLE
case explaining the fallthrough statement, just delete that partition from the
hash. So much easier to follow, I must have been on crack when I wrote the old
way.
@tobi
Copy link

tobi commented Aug 14, 2013

It looks like a good change. Afaik the db.Sql package works just like this.
In that case I think they use sql.Open() instead of AsyncConnect so it may
make sense to match their pattern.

  • tobi
    CEO Shopify

On Tue, Aug 13, 2013 at 7:01 PM, Evan Huus notifications@github.com wrote:

@burke https://github.com/burke @fw42 https://github.com/fw42Thorough review please - I want this concurrent locking pattern to make
sense to somebody else before I merge it.

CC @tobi https://github.com/tobi

Fixes #9 #9.

This ended up being more complicated than I had hoped and touched several
different areas. TL;DR is that we now connect to the other brokers in the
cluster asynchronously. Errors connecting only show up when somebody tries
to
use that broker.

This is better than the old behaviour since it means that if some brokers
in a
cluster go down but the topics we care about are still available, we just
keep
going instead of blowing up for no reason.

The complicated part is that simply calling go broker.Connect() doesn't do
what we want, so I had to write a broker.AsyncConnect(). The problem
occurs if
you've got code like this:

go broker.Connect()
// do some stuff
broker.SendSomeMessage()

What can happen is that SendSomeMessage can be run before the Connect
goroutine ever gets scheduled, in which case SendSomeMessage will simply
return
NotConnected. The desired behaviour is that SendSomeMessage waits for
Connect
to finish, which means AsyncConnect has to synchronously take the
broker lock
before it launches the asynchronous connect goroutine. Lots of fun.

And bonus change in this commit: rather than special-casing leader == -1in
client.cachedLeader and adding a big long comment to the
LEADER_NOT_AVAILABLE
case explaining the fallthrough statement, just delete that partition from
the
hash. So much easier to follow, I must have been on crack when I wrote the
old

way.

You can merge this Pull Request by running

git pull https://github.com/Shopify/sarama handle_unreachable_brokers

Or view, comment on, or merge it at:

#10
Commit Summary

  • Rework how the client connects to brokers.

File Changes

Patch Links:

if b.conn != nil {
return AlreadyConnected
}
b.conn_err = nil

addr, err := net.ResolveIPAddr("ip", b.host)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider that you could do addr, b.conn_err = .... Just a thought; feel free to ignore if you have some reason not to do that.

@burke
Copy link
Contributor

burke commented Aug 14, 2013

Change looks good to me. Nice work :shipit:

@eapache
Copy link
Contributor Author

eapache commented Aug 14, 2013

@tobi does db.Sql not provide a synchronous open then? That seems an odd choice.

Nevermind, I RTFM, that's a pattern worth matching.

Replaces Connect and AsyncConnect with Open and Connected
@eapache
Copy link
Contributor Author

eapache commented Aug 14, 2013

ping @burke for a quick re-review of the new pattern please?

@burke
Copy link
Contributor

burke commented Aug 14, 2013

Looks great. :shipit:

eapache added a commit that referenced this pull request Aug 14, 2013
Rework how the client connects to brokers.
@eapache eapache merged commit 7626e3d into master Aug 14, 2013
@eapache eapache deleted the handle_unreachable_brokers branch August 14, 2013 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handling Unreachable Brokers
3 participants