chore(plugin-server): Improve kafka producer wrapper #10968

macobo · 2022-07-26T09:49:38Z

Problem

Kafka.produce calls are failing

Changes

Remove Buffer.from usage from plugin-server produce usage - this allows for more accurate message size estimation
Make estimation of buffer size better
Fix: immediately flush if enqueueing a too-large message. This way unrelated messages don't end up in DLQ and sentry reports have proper context.

How did you test this code?

Tests + manual checking things work

We're failing to send batches of messages to kafka on a semi-regular basis due to message sizes. It's unclear why this is the case as we try to limit each message batch size. This PR adds information on these failed batches to sentry error messages. Example error: https://sentry.io/organizations/posthog2/issues/3291755686/?project=6423401&query=is%3Aunresolved+level%3Aerror

This allows us to be much more accurate estimating message sizes, hopefully eliminating a class of errors

This helps avoid 'message too large' type errors (see #10968) by compressing in-flight messages. I would have preferred to use zstd, but the libraries did not compile cleanly on my machine.

tiina303 · 2022-07-27T10:36:25Z

plugin-server/tests/main/kafka-producer-wrapper.test.ts


            expect(producer.currentBatch.length).toEqual(1)
-            expect(producer.currentBatchSize).toEqual(40)
+            expect(producer.currentBatchSize).toBeGreaterThan(40)


why not check for what it's now supposed to be exactly?

Because the result is kind of random (73) since it now accounts for sizes of the keys and so on.

This gets the intent across better.

yup, but we might mess up that function in the future to be something way too big

tiina303 · 2022-07-27T10:49:01Z

plugin-server/src/utils/db/kafka-producer-wrapper.ts

            const timeSinceLastFlush = Date.now() - this.lastFlushTime
-            if (timeSinceLastFlush > this.flushFrequencyMs || this.currentBatch.length >= this.maxQueueSize) {
+            if (
+                this.currentBatchSize > this.maxBatchSize ||


Fix: immediately flush if enqueueing a too-large message. This way unrelated messages don't end up in DLQ and sentry reports have proper context.

Does this help here because this.flush() threw before (line 55) making us miss the next message, if it happened the next time we called queueMessage?

No. Previously if the batch was empty and a single too-large message came in, we would call flush() before appending and then append. Then when the next message comes along, that flush fails since the message in the queue is too large.

Now we append and flush immediately. Check the new test added - it would have failed before.

ok, so that's what I thought - clearly I didn't word the question well, thanks for confirming.

This helps avoid 'message too large' type errors (see #10968) by compressing in-flight messages. I would have preferred to use zstd, but the libraries did not compile cleanly on my machine.

#10974) * feat(plugin-server): Use Snappy compression codec for kafka production This helps avoid 'message too large' type errors (see #10968) by compressing in-flight messages. I would have preferred to use zstd, but the libraries did not compile cleanly on my machine. * Update tests

macobo added 5 commits July 26, 2022 12:29

refactor(plugin-server): Remove Buffer.from from kafka messages

ee1866f

This allows us to be much more accurate estimating message sizes, hopefully eliminating a class of errors

estimateMessageSize

60f86bf

Track histogram with message sizes

9f5bed0

Flush immediately for too large messages

3a8a5f6

macobo requested a review from tiina303 July 26, 2022 12:02

macobo changed the title ~~WIP: More accurate produce~~ chore(plugin-server): Improve kafka producer wrapper Jul 26, 2022

macobo marked this pull request as ready for review July 26, 2022 12:02

macobo mentioned this pull request Jul 26, 2022

feat(plugin-server): Use Snappy compression codec for kafka production #10974

Merged

tiina303 approved these changes Jul 27, 2022

View reviewed changes

fud

ec311d1

macobo enabled auto-merge (squash) July 27, 2022 11:25

macobo merged commit d00d587 into master Jul 27, 2022

macobo deleted the more-accurate-produce branch July 27, 2022 11:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(plugin-server): Improve kafka producer wrapper #10968

chore(plugin-server): Improve kafka producer wrapper #10968

Uh oh!

macobo commented Jul 26, 2022 •

edited

Loading

Uh oh!

tiina303 Jul 27, 2022

Uh oh!

macobo Jul 27, 2022

Uh oh!

tiina303 Jul 27, 2022

Uh oh!

tiina303 Jul 27, 2022

Uh oh!

macobo Jul 27, 2022

Uh oh!

tiina303 Jul 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chore(plugin-server): Improve kafka producer wrapper #10968

chore(plugin-server): Improve kafka producer wrapper #10968

Uh oh!

Conversation

macobo commented Jul 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this code?

Uh oh!

tiina303 Jul 27, 2022

Choose a reason for hiding this comment

Uh oh!

macobo Jul 27, 2022

Choose a reason for hiding this comment

Uh oh!

tiina303 Jul 27, 2022

Choose a reason for hiding this comment

Uh oh!

tiina303 Jul 27, 2022

Choose a reason for hiding this comment

Uh oh!

macobo Jul 27, 2022

Choose a reason for hiding this comment

Uh oh!

tiina303 Jul 27, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

macobo commented Jul 26, 2022 •

edited

Loading