-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add least-loaded pickers #19
Conversation
TCN-1913 Add least-loaded picker
To avoid pathological/poor load balancing in the face of low request volume (e.g. most connections always have just zero or one active request), this needs to be smarter than just a simple heap. Ideally, we'd either pick a backend at random in the event of a tie or use round-robin across backends that tie. Round-robin is deterministic and thus easier to test. But random may be easier to implement? We may even want to drive this (round-robin vs random) via user-provided flag. So we may end up with two different implementations, one for each "tie breaker" strategy. |
balancer/picker/leastloaded.go
Outdated
} | ||
|
||
func newLeastLoadedHeap(prev Picker, allConns conn.Connections) *connHeap { | ||
var newHeap = new(connHeap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is going to work correctly. I think the heap actually needs to be shared.
Otherwise, for any operations in progress with prev
, their "when done" call will be decrementing the a heap entry in a different heap. So to the new picker, those calls will be "stuck" with that baseline load.
What I think you instead need is a set (map[conn.Conn]struct{}
) of the actual conns the picker uses (those passed to the factory). If you pop an entry from the heap that is not in the set, discard it and pop another.
As far as when to add the new connections to the heap, I think the best choice is to probably to share the entire picker. So basically, if the previous picker is a least-loaded picker, then you need to reconcile its set of connections and then return the same picker. And if it's not a least-loaded picker (generally, this would only happen for new pools, where the previous picker is nil), create a new picker with all connections set to zero load.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the relevant bit from the docs for Factory.New
: https://github.com/bufbuild/go-http-balancer/blob/main/balancer/picker/picker.go#L47-L50
// The previous picker may still be in use, concurrently, while the factory
// is creating the new one, and even for some small amount of time after this
// method returns. Also, operations might be started with the previous picker
// but not completed until after the new picker is put in use.
I can add to this doc that it is safe to simply mutate the previous picker and return it (as long as the mutation is thread-safe, given it is still being used concurrently).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that was definitely not going to work. I would be okay with that clarification btw, as I actually did think the intent was that we'd return a new picker each time. I've rewritten this to not return a new picker each time. It took a while, but I think the resulting code is far less complex.
Looks great overall! Just one remark about how data is shared. Re: lock contention, I think there is no other way to do least-loaded picker. So I think it's fine. The whole reason power-of-two exists is to get some of the benefit of least-loaded, but without the cost of maintaining the heap (which is mainly a cost because it must be done with a mutex). I suppose you could do something clever and lock-free with a skip-list, but that seems so much harder to get right (unless you know of a solid and well-tested concurrent, skip-list map implementation in Go -- I suppose you could transliterate the one from Java' standard lib). So I think what you have looks great. Re: random, I think it's probably fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I just left a couple of comments that I think need to be addressed. But they should be minor changes, and then let's merge this.
balancer/picker/leastloaded.go
Outdated
|
||
func (h *connHeap) Push(x any) { | ||
n := len(*h) | ||
item, _ := x.(*connItem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove the second assignee here. I think it will be better for a panic to pinpoint this line as an invalid type, vs. a panic on the next line with a nil pointer dereference (i.e. former panic would be more obvious how to troubleshoot). None of this will panic anyway, the way the code is written, so just use a //nolint:forcetypeassert
comment to make the linter happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I do that, I get an errcheck error. I need to figure out how to ignore multiple linters per line. edit: Oh, comma works. Well, nevermind then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//nolint:forcetypeassert,errcheck
Hmm, this one was hard. I'm almost positive it will need some iteration. I have two primary concerns:
[0, len(heap))
, and we shuffled it on each cycle? Perhaps I have missed the obvious way to do something like this? Is there simply a better data structure choice I missed?