New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate why channels are faster #13
Comments
Check the |
Why do you think c.Poll only fetches a single event? My understanding is that it wraps the Kafka fetch request which can return any number of events (you can even configure a minimum). In addition, c.Poll(100) will only block if there are no new request, so in a standard benchmark where the Topic is pre-loaded, I wouldn't expect any blocking at all. |
The third argument in the call to // Poll the consumer for messages or events.
//
// Will block for at most timeoutMs milliseconds
//
// The following callbacks may be triggered:
// Subscribe()'s rebalanceCb
//
// Returns nil on timeout, else an Event
func (c *Consumer) Poll(timeoutMs int) (event Event) {
ev, _ := c.handle.eventPoll(nil, timeoutMs, 1, nil)
return ev
} Combined with WRT to the blocking: yes, that's why I said that |
Perhaps it would make sense to put a configurable receive batch size in |
@binary132 That would mean ch := make(chan []Event, 10)
go func() {
if events, err := c.Poll(100, 10); err != nil { // 10 being the max events to fetch
log.Warn(err) // or something
}
ch <- events
}()
go func() {
if events, err := c.Poll(100, 5); err != nil {
log.Warn(err)
}
ch <- events
} Wouldn't this pose problems WRT order of the messages and how they're pushed onto the channel? And all things aside, the // current implementation:
ev, _ := c.handle.eventPoll(nil, timeoutMs, 1, nil)
// should be
tch := make(chan bool)
ch := make(chan Event, maxEvents) // max events being the second param
_, err := c.handle.eventPoll(ch, timeoutMs, maxEvents, tch)
// handle err, check/wait for <-tch etc... (all the stuff consumeReader does, basically)
close(ch)
es := make([]Event, 0, maxEvents)
for e := range ch {
es = append(es, e)
}
return es And this is grossly oversimplifying it, in part because I'm going through the source while writing this response. Everything that you need to do, just to get |
On a quick first glance I had assumed I looked over the librdkafka documentation, and now I see why a slice return value isn't right here. Maybe a compromise could be to offer the user a lower-level Introducing more channels introduces more mutex thrashing, and introducing more coroutines introduces more concurrent (i.e. unpredictable) behaviors the package user is unaware of. I know coroutines are used in other packages such as |
@binary132 Adding a // type for brevity
type EventCallback func(Event)
func (c *Consumer) PollQueue(timeoutMS, maxEvents int, callback EventCallback) int {
// poll underlying consumer and return the actual number of events processed?
} |
Yes, I'm saying a blocking call without the use of goroutines or callbacks would be a desirable API feature to some users, especially those to whom performance is the critical feature of this package. As to callbacks: Go isn't NodeJS, callbacks in an exported API are almost never idiomatic Go, and adding a callback on every message will reintroduce the stack thrashing performance drawback of the current synchronous My concern with simply exposing The typical Go idiom is that the package user implements concurrency as desired. You'll notice there are nearly zero async calls in the Go stdlib, and even the async internals of |
This is important because Go has the capacity to match well-written C++ in performance, but in practice seldom does, and at times is considered a second-class citizen of the high-performance world. This is an issue at the user and package level rather than at the language level. My team is using a channel-driven Go implementation of a Kafka client ( It is certainly worth exposing a synchronous client API as something besides a layer on top of an underlying concurrent wrapper over an underlying synchronous call to an underlying poll. |
@binary132 I was just suggesting that, if you feel there is a clear, valid, use-case for a synchronous,(potentially) blocking call, then the best course of action would be to open a separate issue requesting that feature. The suggestion/notion of callbacks stems from having used Either way, might be worth opening a new issue to get some feedback on whether or not this feature is likely to be added. The author of librdkafka also seems to be very active on this repo. There probably is no better place to ask than here 😄 |
I'm curious why channels are faster than the "function-based" consumer API. Has anyone profiled this?
The text was updated successfully, but these errors were encountered: