chore(bottlecap): reduce lock contention in logs #256

duncanista · 2024-05-23T21:27:04Z

What?

Drop logs Aggregator lock as soon as possible.
Make the Telemetry API listener to send the events directly to the logs agent, and send events of interest to the main thread.

Previously, we were sending first the events to the main thread event bus, which was bounded to 100 elements, and we had to wait for them to be processed there to be forwarded to the logs agent. This is not ideal.

…tion

it doesnt make sense to send thousands of events to the main thread, and then to the logs agent, inverting makes more sense

so the telemetry api can directly send the logs to it, insted of the main thread

…h://github.com/DataDog/datadog-lambda-extension into jordan.gonzalez/bottlecap/reduce-lock-contention

added some debug logs, found that issue is in serialization of huge payloads

update how we read from stream

blt · 2024-05-30T17:56:05Z

bottlecap/src/logs/aggregator.rs

@@ -1,11 +1,10 @@
-use serde::Serialize;
-use std::collections::VecDeque;
+use crossbeam::queue::SegQueue;


FWIW SegQueue is an unbounded multi-producer, multi-consumer. Minor issue, the multi-consumer bit requires more coordination in the implementation so you're leaving performance on the table by not using a single-consumer. More pressing, this is an unbounded queue and represents a potential site for unbounded allocations.

I'm aware of the tradeoffs, I have to clean this PR still, will mark it as draft – plan to use a bounded queue here

SegQueue is also theoretically lock free but it uses Acquire/Release semantics internally and so the semantics are equivalent to a mutex. Except Acq/Rel done on atomic structures cuts the Linux scheduler out so your latency will be worse generally speaking, except in specialized circumstances where the acq/rel using software has the highest CPU priority anyway and is always scheduled first.

Tokio's mpsc is good, will play well with the Linux scheduler and also nicely between async/sync code.

duncanista · 2024-08-24T04:04:49Z

Closing due to other PRs doing this

duncanista and others added 19 commits May 23, 2024 17:22

add LogsBatch event

ce5cbb9

add add_batch to aggregator

7c44422

receive multiple whole telemetry api batch instead of one event

eaebb76

general processor to receive multiple events

d21b62f

telemetry listener now sends all events received

d197e75

send event_bus.tx to logs agent

eaf8816

forward wanted telemetry events to main thread

3a1e320

release lock as soon as possible

242561e

Merge branch 'main' into jordan.gonzalez/bottlecap/reduce-lock-conten…

82460f2

…tion

remove LogsBatch event

4d6e626

it doesnt make sense to send thousands of events to the main thread, and then to the logs agent, inverting makes more sense

switch logic back to original for main thread events

5effb33

export a sender copy function in logs agent

ce05a9d

so the telemetry api can directly send the logs to it, insted of the main thread

remove debug logs

f397b57

do not aggregate if logs to send is empty

b8014bc

update sender to be non-sync

86cb0bd

add unit tests back

93f6654

Merge branch 'jordan.gonzalez/bottlecap/reduce-lock-contention' of ss…

b88358f

…h://github.com/DataDog/datadog-lambda-extension into jordan.gonzalez/bottlecap/reduce-lock-contention

remove comment hiding logic no longer used

46143e3

fmt

b7cd11f

duncanista marked this pull request as ready for review May 24, 2024 18:52

duncanista requested a review from a team as a code owner May 24, 2024 18:52

astuyve and others added 4 commits May 24, 2024 15:15

wip: try lock and log, then block

a25136f

add crossbeam

7942d28

add lock-free ds

3a87b00

added some debug logs, found that issue is in serialization of huge payloads

add from_stream method for HttpRequestParser

f6a3899

update how we read from stream

blt reviewed May 30, 2024

View reviewed changes

duncanista marked this pull request as draft May 30, 2024 18:02

This was referenced May 30, 2024

fix(bottlecap): fix how Telemetry API stream is being read #259

Merged

chore(bottlecap): Re-architect Telemetry API events forwarding #262

Merged

duncanista closed this Aug 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(bottlecap): reduce lock contention in logs #256

chore(bottlecap): reduce lock contention in logs #256

Uh oh!

duncanista commented May 23, 2024 •

edited

Loading

Uh oh!

blt May 30, 2024

Uh oh!

duncanista May 30, 2024 •

edited

Loading

Uh oh!

blt May 30, 2024

Uh oh!

duncanista commented Aug 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chore(bottlecap): reduce lock contention in logs #256

chore(bottlecap): reduce lock contention in logs #256

Uh oh!

Conversation

duncanista commented May 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What?

Uh oh!

blt May 30, 2024

Choose a reason for hiding this comment

Uh oh!

duncanista May 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blt May 30, 2024

Choose a reason for hiding this comment

Uh oh!

duncanista commented Aug 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

duncanista commented May 23, 2024 •

edited

Loading

duncanista May 30, 2024 •

edited

Loading