Variable events buffer #17

EVODelavega · 2016-12-06T15:36:58Z

Having the maxEvents param in the c.handle.eventPoll call hard-coded to 1000 seems wrong, given that the buffer size can be changed through config parameters, or the channel can be full. In case the maxEvents is 0, perhaps a sleep should be added?

ghost · 2016-12-06T15:37:00Z

Hey @EVODelavega,
thank you for your Pull Request.

It looks like you haven't signed our Contributor License Agreement, yet.

The purpose of a CLA is to ensure that the guardian of a project's outputs has the necessary ownership or grants of rights over all contributions to allow them to distribute under the chosen licence.
Wikipedia

You can read and sign our full Contributor License Agreement here.

Once you've signed reply with [clabot:check] to prove it.

Appreciation of efforts,

clabot

EVODelavega · 2016-12-06T16:02:00Z

[clabot:check]

ghost · 2016-12-06T16:02:02Z

@confluentinc It looks like @EVODelavega just signed our Contributor License Agreement. 👍

Always at your service,

clabot

edenhill · 2016-12-11T18:09:55Z

kafka/consumer.go

+			_, term := c.handle.eventPoll(
+				c.events,
+				100,
+				eventsBufferSize-len(c.events), // max events depends on available buffer


Must check that the computed value is > 0 otherwise this will just busy-loop.

edenhill · 2016-12-11T18:10:17Z

kafka/consumer.go

@@ -329,15 +329,20 @@ func (c *Consumer) rebalance(ev Event) bool {
 // consumerReader reads messages and events from the librdkafka consumer queue
 // and posts them on the consumer channel.
 // Runs until termChan closes
-func consumerReader(c *Consumer, termChan chan bool) {
+func consumerReader(c *Consumer, termChan chan bool, eventsBufferSize int) {


call it the same thing as in caller, eventsChanSize

edenhill · 2016-12-11T18:14:01Z

kafka/producer.go

@@ -287,10 +287,9 @@ func channelProducer(p *Producer) {

 // channelBatchProducer serves the ProduceChannel channel and attempts to
 // improve cgo performance by using the produceBatch() interface.
-func channelBatchProducer(p *Producer) {
+func channelBatchProducer(p *Producer, batchSize int) {


I dont believe there is any use for synchronizing the batch size to the channel size:
since the channel is read message-by-message it is possible (likely and desired) to have a batch larger than the channel size.

The experimental batch interface aims to cut down on the number of C calls (since cgo has a high overhead), there is thus no logical correlation to the channel size.

Fair enough, but what if I configure the publish stack to be greater than the default? ie: have batchSize default to the current value, but increase it depending on the config value?

I'm not sure it is warranted, this is only an internal optimization in the Go bindings, this batching does not correlate to the application nor librdkafka internals or Kafka protocol batches.

edenhill · 2016-12-12T11:21:22Z

Please revert and rebase your branch to avoid revert-commits.

Something like this, after you've reverted the commits:

$ git rebase -i master

# Mark all follow-up fixes as   f   (fixup) and put them under your top-level commit

# Be careful to only use --force on PR branches, never on long-lived branches (such as master)
$ git push --force origin <your_pr_branch>

…ffer is 10

EVODelavega · 2016-12-12T11:35:08Z

@edenhill Done (twice, because I forgot to sync with upstream master 😄 )

edenhill · 2016-12-12T11:36:33Z

kafka/consumer.go

+			_, term := c.handle.eventPoll(
+				c.events,
+				100,
+				eventsChanSize - len(c.events), // max events depends on available buffer


Must check that this doesn't reach 0 or it will busy loop.

Which would be the preferred way to handle this? sleep for 100ms, default to eventsChanSize, or poll for a single event?

Let's back up a bit; what is the original problem you were seeing prior to this fix?

Well, considering you can configure the channel size to less than 1000, the current call in consumer.go (_, term := c.handle.eventPoll(c.events, 100, 1000, termChan)) doesn't quite feel right. If I want to buffer 10,000 events, then surely that should be doable in a single call (rather than 10 calls as is the case now). Similarly, if I set the buffer to 10, it doesn't make sense to have eventPoll loop with a max of 1000 events. Sure, either the channel will be full, or there won't be any messages left to consume, but it feels/looks a bit odd to my eye

Yeah, I agree on the channelSize > pollCnt case: we should try to fetch as many events as possible to fill the channel buffer, but I think we should be more careful with the reverse where the channel buffer size is smaller than the poll count.
For performance we want to avoid idle periods and I imagine that a small channelSize and pollCnt will result in idle periods when the application side is as fast as the Go client side:

Loop 1: Go client polls 100 entries from librdkafka and enqueues them on channel.

Loop 2: Go client calculates that pollCnt - len(chan) == 0 and backs off with a delay

Application reads all events from the channel

Idle period while loop 2's backoff delay times out..

Go to loop 1

Compare this to the current behaviour:

Loop 1: Go client polls 100 entries from librdkafka and enqueues them on channel.

Loop 2: Go client pools another 100 entries from librdkafka but blocks on the first channel produce since the channel is full.

Application reads all events from the channel

Go client immediately unblocks and when channel space is available

Go to loop 1

There's probably a golden ratio where the pollCnt is some multiple of channelSize, but this depends on actual usage, so I feel the current approach with simply blocking provides the necessary and desired backpressure mechanism.
Having said that there is no point in polling 1000 events if the channelSize is 2, so some reasonable multiplier should be in place.
Maybe something like: pollCnt = math.Max(channelSize * 3, 100)

EVODelavega mentioned this pull request Dec 7, 2016

Channel consumer and batch producers ignore configured batch sizes internally #18

Closed

edenhill suggested changes Dec 11, 2016

View reviewed changes

EVODelavega force-pushed the variable-events-buffer branch from 0d03c37 to 3e9343c Compare December 12, 2016 11:29

events buffer is variable, no need to poll for 1000 events, if the bu…

2fdfbbf

…ffer is 10

EVODelavega force-pushed the variable-events-buffer branch from 3e9343c to 2fdfbbf Compare December 12, 2016 11:34

edenhill suggested changes Dec 12, 2016

View reviewed changes

edenhill closed this Feb 13, 2018

lintang0 mentioned this pull request Jun 18, 2018

Crash on message retry enqueue #203

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variable events buffer #17

Variable events buffer #17

EVODelavega commented Dec 6, 2016

ghost commented Dec 6, 2016

EVODelavega commented Dec 6, 2016

ghost commented Dec 6, 2016

edenhill Dec 11, 2016

edenhill Dec 11, 2016

edenhill Dec 11, 2016

EVODelavega Dec 12, 2016

edenhill Dec 12, 2016

edenhill commented Dec 12, 2016

EVODelavega commented Dec 12, 2016

edenhill Dec 12, 2016

EVODelavega Dec 12, 2016

edenhill Dec 12, 2016

EVODelavega Dec 12, 2016

edenhill Dec 12, 2016

Variable events buffer #17

Variable events buffer #17

Conversation

EVODelavega commented Dec 6, 2016

ghost commented Dec 6, 2016

EVODelavega commented Dec 6, 2016

ghost commented Dec 6, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edenhill commented Dec 12, 2016

EVODelavega commented Dec 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment