Performance/Throughput Impact with auto instrumentation in Spring 5 applications #59

kurvatch · 2021-05-18T19:57:11Z

Describe the bug
We are seeing more than 50% performance degradation with instrumenting otel agents, Our application instrumented with otel runs on EKS cluster. OTel Collector running as daemon set in the same EKS cluster collects traces and ingest data to AWS Xray.

Steps to reproduce
This is Spring 5 project with webflux and spring cloud stream support interacting with SQS, DynamoDB and AWS MSK

What did you expect to see?
Without Otel Agent, application could reach upto 250 request per second with 2Gi memory.

What did you see instead?
After OTel agent, we are seeing ~65 request per second with same settings, I was expecting some degradation in the throughput but this is more 50%

Additional context
We are using aws-opentelemetry-agent-1.1.0 with default settings for BSP and sampling is set to 100% and metrics exporter is set to logging.

stnor · 2021-05-19T08:39:00Z

What sampler are you using? Why 100% sampling? That will have a perf impact.

I used -Dotel.traces.sampler.traceidratio=true -Dotel.traces.sampler.arg=0.005 and had perf issues (cpu util rose w/ 50%).

Switched to the parent based ratio sampler parentbased_traceidratio and that had a huge impact for me, but I am using a lot of internal spans.

kurvatch · 2021-05-19T10:55:24Z

@stnor parentbased_always_on is the sampler. I am testing things out with default configurations, it was expected there will be a performance impact with 100% sampling but was shocked to see more than 50% degradation of throughput. I am doing another round of testing with parentbased_traceidratio and -Dotel.traces.sampler.arg=0.25 for the same application.

stnor · 2021-05-19T20:42:26Z

25% is a very high sampling frequency in my experience.

kurvatch · 2021-05-20T00:30:50Z

Is there a standards recommendation available. It would defiantly help to publish benchmark performance with different samplers and ratios with a demo application interacting database and a message system

anuraaga · 2021-05-20T09:38:32Z

Hi @kurvatch - I agree that the performance impact seems much larger than we'd expect. Sampling rate is great for reducing load on backends, but we wouldn't expect such that much overhead at hundreds of QPS.

I have filed open-telemetry/opentelemetry-java-instrumentation#3047 as that repo is where the actual code is and the performance bottlenecks can be investigated.

github-actions · 2022-10-02T20:05:53Z

This issue is stale because it has been open 90 days with no activity. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled

github-actions · 2022-11-06T20:06:08Z

This issue was closed because it has been marked as stale for 30 days with no activity.

anuraaga mentioned this issue May 20, 2021

Performance/Throughput Impact with auto instrumentation in Spring 5 applications open-telemetry/opentelemetry-java-instrumentation#3047

Closed

github-actions bot added the stale label Oct 2, 2022

github-actions bot closed this as completed Nov 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance/Throughput Impact with auto instrumentation in Spring 5 applications #59

Performance/Throughput Impact with auto instrumentation in Spring 5 applications #59

kurvatch commented May 18, 2021 •

edited

Loading

stnor commented May 19, 2021 •

edited

Loading

kurvatch commented May 19, 2021 •

edited

Loading

stnor commented May 19, 2021 •

edited

Loading

kurvatch commented May 20, 2021

anuraaga commented May 20, 2021

github-actions bot commented Oct 2, 2022

github-actions bot commented Nov 6, 2022

Performance/Throughput Impact with auto instrumentation in Spring 5 applications #59

Performance/Throughput Impact with auto instrumentation in Spring 5 applications #59

Comments

kurvatch commented May 18, 2021 • edited Loading

stnor commented May 19, 2021 • edited Loading

kurvatch commented May 19, 2021 • edited Loading

stnor commented May 19, 2021 • edited Loading

kurvatch commented May 20, 2021

anuraaga commented May 20, 2021

github-actions bot commented Oct 2, 2022

github-actions bot commented Nov 6, 2022

kurvatch commented May 18, 2021 •

edited

Loading

stnor commented May 19, 2021 •

edited

Loading

kurvatch commented May 19, 2021 •

edited

Loading

stnor commented May 19, 2021 •

edited

Loading