Concurrent metadata refresh #172

crxpandion · 2014-11-04T23:25:57Z

Changes broker metadata to refresh concurrently rather than sequentially. If all brokers error out the last call to Broker.GetMetadata(...) that returned an error is exposed and the client enters the resurrection retry loop.

This also make users of sarama.Client (e.g. consumer/producer) be solely responsible for disconnecting brokers who timeout or misbehave.

Related to #15

crxpandion · 2014-11-05T00:21:36Z

seems to have an issue in the go 1.1 build. Investigating...

crxpandion · 2014-11-05T00:57:41Z

unable to reliably replicate locally - are you guys aware of any issues with functional_test.go?

wvanbergen · 2014-11-05T11:40:19Z

Yes, it seems like you got caught in our functional test heisenbug which strikes every now and then :(
I triggered a rebuilt, let's see if this fixes it.

wvanbergen · 2014-11-05T11:43:07Z

The tests are passing now. If you have any idea what could cause this bug (most likely due to concurrency), please let us know.

wvanbergen · 2014-11-05T11:47:51Z

metadata_fanout.go

+	// Public channels
+	Get chan *fanoutResult
+	// optional channel to notify outside when clean up is complete
+	Done chan struct{}


Why do these channels have to be public? This seems like an internal thing.

Can you add some godoc to explain what these struct members are for? The name Get is not very enlightening to me.

wvanbergen · 2014-11-05T11:51:21Z

@eapache thoughts?

eapache · 2014-11-05T14:58:28Z

I'm not sure I like this approach in general. To quote old me in #15 "connecting to all the brokers just to pick one and drop the others is gross". It's also inefficient, and could cause issues on clusters with a large number of brokers. Originally in that issue I'd suggested that we should wait a small amount before trying the next broker, but I now think that's basically over-complicated. The much better solution to achieve that behaviour is just to reduce the connect timeout on the brokers to something reasonable for LANs (if it takes more than 2 seconds to connect over a LAN, something is probably wrong).

@crxpandion why did you decide to implement this? What problem precisely were you trying to solve?

crxpandion · 2014-11-05T22:05:34Z

@eapache - I started on this path when testing sarama's ability to weather broker failures. I noticed that even with smaller timeouts, as you suggest, the metadata refresh can take a long time, particularly in large clusters. That delay is not only frustrating when running lots of integration type tests, but could be detrimental in a production system.

As for the inefficiency - I can see how its not computationally efficient but there it seems we should mainly be concerned with getting the cluster metadata quickly, so that the user of this library can get on with their computation. The problems with large cluster (20-100s) could be solved by throttling the number of outbound attempts.

Overall I think that abstracting the approach to fetching metadata (regardless of strategy) from the refreshMetadata function, which mainly contains retry logic, serves a lot to make this section of code not as gross.

Anyways I appreciate the feedback.

@wvanbergen re the function_test heisenbug. I think there are at least two modes of failure that i see locally: the one that is easy to diagnose is one that is the result of OOM killers in the vm (I am not sure on how travis ci works so this might not be the issue we saw above) and judging from recent commits it seems you guys are aware of that one. An alternate approach that might make the memory increase more reliable would be to reduce the number of brokers to 3.

Judging from stack traces taken when metadata is refreshed, I think there is at least one other failure mode besides the OOM killer, but I haven't been able to isolate it yet - I might have time later this week to look into it again later this week if you guys want.

eapache · 2014-11-06T15:26:02Z

OK, so I'm not opposed to the general concept of trying multiple brokers concurrently for metadata, but:

we want to avoid overkill in the common case (where the first broker responds quickly)
Sarama is used in very large deployments (thousands of clients connecting to a reasonably-large cluster); the current approach is likely to DDoS the cluster (and/or zookeeper) in that case by generating (#clients * #brokers) metadata requests simultaneously when a broker fails over
there are probably smaller deployments where "try all the brokers right away" is the best approach

Given the above, here are a few suggestions:

the fanout timing and growth pattern should be configurable so that users can tune it appropriately for their use case
the default should not be too aggressive; perhaps linearly try another broker every 500ms?

crxpandion · 2014-11-07T01:56:16Z

Sure both of those suggestions make sense - I'll work to amend my patch. I am coming down with a cold so it might take me a few days. :(

I feel that the "thundering horde" problem is often solved by random jitter in delay before attempting refresh. What are you thoughts on this strategy? Obviously we'd want to use it in conjunction with fanout throttling.

eapache · 2014-11-07T15:02:37Z

I don't have a problem with jitter to avoid thundering horde - I'm not sure it's necessary for the first connection of a refresh (clusters should be OK as long as each client makes only one request, not n requests), but it may be worth adding as fanout occurs.

Let us know when you've got a new version of the patch ready.

crxpandion · 2014-11-11T22:35:18Z

@eapache I added a throttle into the fanout. The default is 1 connection attempt at a time. The logic is a tad more complicated.

eapache · 2014-11-12T15:39:11Z

client.go

+	} else {
+		for _, broker := range client.brokers {
+			Logger.Printf("Fanning out %v\n", broker)
+			b := broker


this intermediate variable seems unnecessary?

…tadata

crxpandion · 2014-11-13T01:27:09Z

I rebased on shopify/sarama to accommodate the seedBroker change so apologizes for the messed up the history of this branch.

The way seedBrokers are handled definitely complicates the code. Is there a reason for the logical separation between user supplied broker addresses and brokers found via metadata requests?

eapache · 2014-11-13T14:28:58Z

Per the comment in client.go:

// the broker addresses given to us through the constructor are not guaranteed to be returned in
// the cluster metadata (I *think* it only returns brokers who are currently leading partitions?)
// so we store them separately

There are also some bits documented in my initial post on #15.

eapache · 2014-12-22T21:43:33Z

@crxpandion do you plan to revisit this?

crxpandion · 2014-12-24T00:59:50Z

The PR as it stand fulfills what was requested - however I feel that it does not help alleviate the root issues with this section of the code. Therefore I'm inclined to stash this until the client gets sorted out further.

Thoughts?

eapache · 2014-12-24T01:23:49Z

I'm not sure how much more change there is really going to be to the client at this point. The current design certainly has issues, but as per the fairly extensive conversation in the other tickets there doesn't seem to be a magic bullet that doesn't introduce other issues. I wouldn't hold your breath waiting for that to happen.

eapache · 2015-02-24T16:26:29Z

The client does need to be worked on before this can be tackled properly, but once that is done this will probably have to be rewritten anyways, so there's not much point in leaving the PR open. It can/should be mined for ideas at some point.

wvanbergen reviewed Nov 5, 2014
View reviewed changes

eapache reviewed Nov 12, 2014
View reviewed changes

Kai Hayashi added 8 commits November 12, 2014 15:53

first pass at integrating parallel requests into the sarama broker me…

bdd1518

…tadata

changed the error handling

9ea8bd0

made tests pass

e03667a

added tests for metadatafanout

198dce1

removed unnecessary debug info

07e5744

removed any()

2ed13ae

incorperated will's suggestions and added throttling on the fanout.

6c4fe8d

adding in evan's suggestions

42ee2da

crxpandion force-pushed the concurrent_metadata_refresh branch from 57161ad to 42ee2da Compare November 13, 2014 01:21

eapache closed this Feb 24, 2015

eapache mentioned this pull request Feb 24, 2015

Better Broker Connection Management #15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrent metadata refresh #172

Concurrent metadata refresh #172

crxpandion commented Nov 4, 2014

crxpandion commented Nov 5, 2014

crxpandion commented Nov 5, 2014

wvanbergen commented Nov 5, 2014

wvanbergen commented Nov 5, 2014

wvanbergen Nov 5, 2014

wvanbergen Nov 5, 2014

wvanbergen commented Nov 5, 2014

eapache commented Nov 5, 2014

crxpandion commented Nov 5, 2014

eapache commented Nov 6, 2014

crxpandion commented Nov 7, 2014

eapache commented Nov 7, 2014

crxpandion commented Nov 11, 2014

eapache Nov 12, 2014

crxpandion commented Nov 13, 2014

eapache commented Nov 13, 2014

eapache commented Dec 22, 2014

crxpandion commented Dec 24, 2014

eapache commented Dec 24, 2014

eapache commented Feb 24, 2015

Concurrent metadata refresh #172

Concurrent metadata refresh #172

Conversation

crxpandion commented Nov 4, 2014

crxpandion commented Nov 5, 2014

crxpandion commented Nov 5, 2014

wvanbergen commented Nov 5, 2014

wvanbergen commented Nov 5, 2014

wvanbergen Nov 5, 2014

Choose a reason for hiding this comment

wvanbergen Nov 5, 2014

Choose a reason for hiding this comment

wvanbergen commented Nov 5, 2014

eapache commented Nov 5, 2014

crxpandion commented Nov 5, 2014

eapache commented Nov 6, 2014

crxpandion commented Nov 7, 2014

eapache commented Nov 7, 2014

crxpandion commented Nov 11, 2014

eapache Nov 12, 2014

Choose a reason for hiding this comment

crxpandion commented Nov 13, 2014

eapache commented Nov 13, 2014

eapache commented Dec 22, 2014

crxpandion commented Dec 24, 2014

eapache commented Dec 24, 2014

eapache commented Feb 24, 2015