Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add least-loaded pickers #19

Merged
merged 3 commits into from
Jun 9, 2023
Merged

Add least-loaded pickers #19

merged 3 commits into from
Jun 9, 2023

Conversation

jchadwick-buf
Copy link
Member

Hmm, this one was hard. I'm almost positive it will need some iteration. I have two primary concerns:

  • Lock contention. I'm a bit worried about contention caused by the factory here. I used a mixed locks+atomics solution to try to minimize it without having to have a ridiculous amount of locks. Now the lock contention of the factory constructor is mostly limited to a memcpy. Can/should we do better? Too complicated?
  • The random picker. I'm almost confident it's bad, but I didn't think of an obvious better way to do it. Maybe it would be better if the random values were picked out of a permutation of the half-open set [0, len(heap)), and we shuffled it on each cycle? Perhaps I have missed the obvious way to do something like this? Is there simply a better data structure choice I missed?

@jchadwick-buf jchadwick-buf requested a review from jhump June 8, 2023 16:45
@linear
Copy link

linear bot commented Jun 8, 2023

TCN-1913 Add least-loaded picker

To avoid pathological/poor load balancing in the face of low request volume (e.g. most connections always have just zero or one active request), this needs to be smarter than just a simple heap.

Ideally, we'd either pick a backend at random in the event of a tie or use round-robin across backends that tie. Round-robin is deterministic and thus easier to test. But random may be easier to implement?

We may even want to drive this (round-robin vs random) via user-provided flag. So we may end up with two different implementations, one for each "tie breaker" strategy.

}

func newLeastLoadedHeap(prev Picker, allConns conn.Connections) *connHeap {
var newHeap = new(connHeap)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is going to work correctly. I think the heap actually needs to be shared.

Otherwise, for any operations in progress with prev, their "when done" call will be decrementing the a heap entry in a different heap. So to the new picker, those calls will be "stuck" with that baseline load.

What I think you instead need is a set (map[conn.Conn]struct{}) of the actual conns the picker uses (those passed to the factory). If you pop an entry from the heap that is not in the set, discard it and pop another.

As far as when to add the new connections to the heap, I think the best choice is to probably to share the entire picker. So basically, if the previous picker is a least-loaded picker, then you need to reconcile its set of connections and then return the same picker. And if it's not a least-loaded picker (generally, this would only happen for new pools, where the previous picker is nil), create a new picker with all connections set to zero load.

Does that make sense?

Copy link
Member

@jhump jhump Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the relevant bit from the docs for Factory.New: https://github.com/bufbuild/go-http-balancer/blob/main/balancer/picker/picker.go#L47-L50

// The previous picker may still be in use, concurrently, while the factory
// is creating the new one, and even for some small amount of time after this
// method returns. Also, operations might be started with the previous picker
// but not completed until after the new picker is put in use.

I can add to this doc that it is safe to simply mutate the previous picker and return it (as long as the mutation is thread-safe, given it is still being used concurrently).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that was definitely not going to work. I would be okay with that clarification btw, as I actually did think the intent was that we'd return a new picker each time. I've rewritten this to not return a new picker each time. It took a while, but I think the resulting code is far less complex.

@jhump
Copy link
Member

jhump commented Jun 8, 2023

Hmm, this one was hard.

Looks great overall! Just one remark about how data is shared.

Re: lock contention, I think there is no other way to do least-loaded picker. So I think it's fine. The whole reason power-of-two exists is to get some of the benefit of least-loaded, but without the cost of maintaining the heap (which is mainly a cost because it must be done with a mutex). I suppose you could do something clever and lock-free with a skip-list, but that seems so much harder to get right (unless you know of a solid and well-tested concurrent, skip-list map implementation in Go -- I suppose you could transliterate the one from Java' standard lib). So I think what you have looks great.

Re: random, I think it's probably fine.

@jchadwick-buf jchadwick-buf requested a review from jhump June 9, 2023 13:06
Copy link
Member

@jhump jhump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I just left a couple of comments that I think need to be addressed. But they should be minor changes, and then let's merge this.

balancer/picker/leastloaded.go Show resolved Hide resolved
balancer/picker/leastloaded.go Outdated Show resolved Hide resolved

func (h *connHeap) Push(x any) {
n := len(*h)
item, _ := x.(*connItem)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the second assignee here. I think it will be better for a panic to pinpoint this line as an invalid type, vs. a panic on the next line with a nil pointer dereference (i.e. former panic would be more obvious how to troubleshoot). None of this will panic anyway, the way the code is written, so just use a //nolint:forcetypeassert comment to make the linter happy.

Copy link
Member Author

@jchadwick-buf jchadwick-buf Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do that, I get an errcheck error. I need to figure out how to ignore multiple linters per line. edit: Oh, comma works. Well, nevermind then.

Copy link
Member

@jhump jhump Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//nolint:forcetypeassert,errcheck

@jchadwick-buf jchadwick-buf merged commit 452e157 into main Jun 9, 2023
@jchadwick-buf jchadwick-buf deleted the jchadwick/leastloaded branch June 9, 2023 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants