-
Notifications
You must be signed in to change notification settings - Fork 467
fix(grpc): leaking spans when using gRPC future interface #14875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We've tracked down some issues related to broken span hierarchy down to the way that the gRPC instrumentation deals with entering spans, which causes the grpc span to leak outside it's call when using the .future() interface. This PR changes the instrumentation to explicitly only enter the span when calling the continuations instead of just entering immediately. Signed-off-by: Sam Schlegel <sam.s@discordapp.com>
|
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 266 ± 6 ms. The average import time from base is: 268 ± 5 ms. The import time difference between this PR and base is: -2.1 ± 0.3 ms. Import time breakdownThe following import paths have shrunk:
|
Performance SLOsComparing candidate PR-13847 (5eeef86) with baseline main (a75ed3e) 📈 Performance Regressions (1 suite)📈 iast_aspects - 40/40✅ re_expand_aspectTime: ✅ 33.893µs (SLO: <40.000µs 📉 -15.3%) vs baseline: +7.1% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ re_expand_noaspectTime: ✅ 28.598µs (SLO: <40.000µs 📉 -28.5%) vs baseline: ~same Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +4.8% ✅ re_findall_aspectTime: ✅ 2.933µs (SLO: <10.000µs 📉 -70.7%) vs baseline: +1.0% Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +5.0% ✅ re_findall_noaspectTime: ✅ 1.425µs (SLO: <10.000µs 📉 -85.7%) vs baseline: +0.7% Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +4.8% ✅ re_finditer_aspectTime: ✅ 4.521µs (SLO: <10.000µs 📉 -54.8%) vs baseline: +1.1% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ re_finditer_noaspectTime: ✅ 1.420µs (SLO: <10.000µs 📉 -85.8%) vs baseline: -0.3% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.8% ✅ re_fullmatch_aspectTime: ✅ 2.958µs (SLO: <10.000µs 📉 -70.4%) vs baseline: 📈 +10.5% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ re_fullmatch_noaspectTime: ✅ 1.287µs (SLO: <10.000µs 📉 -87.1%) vs baseline: -0.2% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.6% ✅ re_group_aspectTime: ✅ 2.943µs (SLO: <10.000µs 📉 -70.6%) vs baseline: +0.4% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +5.0% ✅ re_group_noaspectTime: ✅ 1.614µs (SLO: <10.000µs 📉 -83.9%) vs baseline: +1.5% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ re_groups_aspectTime: ✅ 3.101µs (SLO: <10.000µs 📉 -69.0%) vs baseline: +1.0% Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +5.0% ✅ re_groups_noaspectTime: ✅ 1.737µs (SLO: <10.000µs 📉 -82.6%) vs baseline: +2.5% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ re_match_aspectTime: ✅ 2.767µs (SLO: <10.000µs 📉 -72.3%) vs baseline: +2.4% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.8% ✅ re_match_noaspectTime: ✅ 1.299µs (SLO: <10.000µs 📉 -87.0%) vs baseline: +0.3% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +4.8% ✅ re_search_aspectTime: ✅ 2.563µs (SLO: <10.000µs 📉 -74.4%) vs baseline: -0.2% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +5.0% ✅ re_search_noaspectTime: ✅ 1.201µs (SLO: <10.000µs 📉 -88.0%) vs baseline: -0.7% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ re_sub_aspectTime: ✅ 3.434µs (SLO: <10.000µs 📉 -65.7%) vs baseline: +0.4% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +5.0% ✅ re_sub_noaspectTime: ✅ 1.541µs (SLO: <10.000µs 📉 -84.6%) vs baseline: +0.3% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +5.0% ✅ re_subn_aspectTime: ✅ 3.654µs (SLO: <10.000µs 📉 -63.5%) vs baseline: +0.7% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.9% ✅ re_subn_noaspectTime: ✅ 1.604µs (SLO: <10.000µs 📉 -84.0%) vs baseline: -0.1% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +5.0% 🟡 Near SLO Breach (5 suites)🟡 djangosimple - 30/30✅ appsecTime: ✅ 20.439ms (SLO: <22.300ms -8.3%) vs baseline: -0.5% Memory: ✅ 65.482MB (SLO: <67.000MB -2.3%) vs baseline: +4.8% ✅ exception-replay-enabledTime: ✅ 1.346ms (SLO: <1.450ms -7.2%) vs baseline: -0.4% Memory: ✅ 64.572MB (SLO: <67.000MB -3.6%) vs baseline: +5.0% ✅ iastTime: ✅ 20.419ms (SLO: <22.250ms -8.2%) vs baseline: ~same Memory: ✅ 65.410MB (SLO: <67.000MB -2.4%) vs baseline: +4.7% ✅ profilerTime: ✅ 15.270ms (SLO: <16.550ms -7.7%) vs baseline: +0.5% Memory: ✅ 53.807MB (SLO: <54.500MB 🟡 -1.3%) vs baseline: +5.0% ✅ resource-renamingTime: ✅ 20.592ms (SLO: <21.750ms -5.3%) vs baseline: +0.1% Memory: ✅ 65.461MB (SLO: <67.000MB -2.3%) vs baseline: +4.8% ✅ span-code-originTime: ✅ 26.186ms (SLO: <28.200ms -7.1%) vs baseline: ~same Memory: ✅ 68.098MB (SLO: <69.500MB -2.0%) vs baseline: +4.8% ✅ tracerTime: ✅ 20.482ms (SLO: <21.750ms -5.8%) vs baseline: -0.3% Memory: ✅ 65.439MB (SLO: <67.000MB -2.3%) vs baseline: +4.8% ✅ tracer-and-profilerTime: ✅ 22.121ms (SLO: <23.500ms -5.9%) vs baseline: +0.1% Memory: ✅ 66.649MB (SLO: <67.500MB 🟡 -1.3%) vs baseline: +4.9% ✅ tracer-dont-create-db-spansTime: ✅ 19.290ms (SLO: <21.500ms 📉 -10.3%) vs baseline: +0.3% Memory: ✅ 65.425MB (SLO: <66.000MB 🟡 -0.9%) vs baseline: +4.8% ✅ tracer-minimalTime: ✅ 16.625ms (SLO: <17.500ms -5.0%) vs baseline: +0.4% Memory: ✅ 65.417MB (SLO: <66.000MB 🟡 -0.9%) vs baseline: +4.8% ✅ tracer-nativeTime: ✅ 20.503ms (SLO: <21.750ms -5.7%) vs baseline: -0.3% Memory: ✅ 71.288MB (SLO: <72.500MB 🟡 -1.7%) vs baseline: +4.7% ✅ tracer-no-cachesTime: ✅ 18.420ms (SLO: <19.650ms -6.3%) vs baseline: ~same Memory: ✅ 65.514MB (SLO: <67.000MB -2.2%) vs baseline: +4.9% ✅ tracer-no-databasesTime: ✅ 18.793ms (SLO: <20.100ms -6.5%) vs baseline: +0.1% Memory: ✅ 65.305MB (SLO: <67.000MB -2.5%) vs baseline: +4.6% ✅ tracer-no-middlewareTime: ✅ 20.226ms (SLO: <21.500ms -5.9%) vs baseline: +0.3% Memory: ✅ 65.468MB (SLO: <67.000MB -2.3%) vs baseline: +4.9% ✅ tracer-no-templatesTime: ✅ 20.331ms (SLO: <22.000ms -7.6%) vs baseline: +0.4% Memory: ✅ 65.454MB (SLO: <67.000MB -2.3%) vs baseline: +4.9% 🟡 errortrackingdjangosimple - 6/6✅ errortracking-enabled-allTime: ✅ 18.053ms (SLO: <19.850ms -9.1%) vs baseline: -0.5% Memory: ✅ 65.294MB (SLO: <66.500MB 🟡 -1.8%) vs baseline: +4.9% ✅ errortracking-enabled-userTime: ✅ 18.033ms (SLO: <19.400ms -7.0%) vs baseline: ~same Memory: ✅ 65.267MB (SLO: <66.500MB 🟡 -1.9%) vs baseline: +4.8% ✅ tracer-enabledTime: ✅ 18.076ms (SLO: <19.450ms -7.1%) vs baseline: -0.1% Memory: ✅ 65.323MB (SLO: <66.500MB 🟡 -1.8%) vs baseline: +4.9% 🟡 flasksimple - 18/18✅ appsec-getTime: ✅ 4.583ms (SLO: <4.750ms -3.5%) vs baseline: ~same Memory: ✅ 61.951MB (SLO: <65.000MB -4.7%) vs baseline: +4.8% ✅ appsec-postTime: ✅ 6.597ms (SLO: <6.750ms -2.3%) vs baseline: ~same Memory: ✅ 61.991MB (SLO: <65.000MB -4.6%) vs baseline: +4.9% ✅ appsec-telemetryTime: ✅ 4.564ms (SLO: <4.750ms -3.9%) vs baseline: -0.5% Memory: ✅ 61.971MB (SLO: <65.000MB -4.7%) vs baseline: +4.8% ✅ debuggerTime: ✅ 1.856ms (SLO: <2.000ms -7.2%) vs baseline: +0.3% Memory: ✅ 45.456MB (SLO: <47.000MB -3.3%) vs baseline: +4.9% ✅ iast-getTime: ✅ 1.861ms (SLO: <2.000ms -7.0%) vs baseline: +0.1% Memory: ✅ 42.310MB (SLO: <49.000MB 📉 -13.7%) vs baseline: +4.8% ✅ profilerTime: ✅ 1.919ms (SLO: <2.100ms -8.6%) vs baseline: +0.4% Memory: ✅ 46.478MB (SLO: <47.000MB 🟡 -1.1%) vs baseline: +4.7% ✅ resource-renamingTime: ✅ 3.389ms (SLO: <3.650ms -7.1%) vs baseline: +0.2% Memory: ✅ 52.160MB (SLO: <53.500MB -2.5%) vs baseline: +4.8% ✅ tracerTime: ✅ 3.372ms (SLO: <3.650ms -7.6%) vs baseline: -0.2% Memory: ✅ 52.239MB (SLO: <53.500MB -2.4%) vs baseline: +4.9% ✅ tracer-nativeTime: ✅ 3.370ms (SLO: <3.650ms -7.7%) vs baseline: -0.1% Memory: ✅ 58.347MB (SLO: <60.000MB -2.8%) vs baseline: +5.0% 🟡 otelspan - 22/22✅ add-eventTime: ✅ 42.258ms (SLO: <47.150ms 📉 -10.4%) vs baseline: -0.6% Memory: ✅ 44.495MB (SLO: <47.000MB -5.3%) vs baseline: +5.1% ✅ add-metricsTime: ✅ 316.834ms (SLO: <344.800ms -8.1%) vs baseline: ~same Memory: ✅ 595.960MB (SLO: <600.000MB 🟡 -0.7%) vs baseline: +5.0% ✅ add-tagsTime: ✅ 288.864ms (SLO: <314.000ms -8.0%) vs baseline: +0.6% Memory: ✅ 596.967MB (SLO: <600.000MB 🟡 -0.5%) vs baseline: +4.8% ✅ get-contextTime: ✅ 80.755ms (SLO: <92.350ms 📉 -12.6%) vs baseline: +0.4% Memory: ✅ 39.951MB (SLO: <46.500MB 📉 -14.1%) vs baseline: +4.9% ✅ is-recordingTime: ✅ 38.864ms (SLO: <44.500ms 📉 -12.7%) vs baseline: ~same Memory: ✅ 43.939MB (SLO: <47.500MB -7.5%) vs baseline: +5.0% ✅ record-exceptionTime: ✅ 58.735ms (SLO: <67.650ms 📉 -13.2%) vs baseline: ~same Memory: ✅ 40.174MB (SLO: <47.000MB 📉 -14.5%) vs baseline: +4.6% ✅ set-statusTime: ✅ 44.717ms (SLO: <50.400ms 📉 -11.3%) vs baseline: +0.5% Memory: ✅ 43.968MB (SLO: <47.000MB -6.5%) vs baseline: +4.9% ✅ startTime: ✅ 38.340ms (SLO: <43.450ms 📉 -11.8%) vs baseline: +0.5% Memory: ✅ 43.967MB (SLO: <47.000MB -6.5%) vs baseline: +5.0% ✅ start-finishTime: ✅ 82.898ms (SLO: <88.000ms -5.8%) vs baseline: +0.6% Memory: ✅ 34.524MB (SLO: <46.500MB 📉 -25.8%) vs baseline: +4.7% ✅ start-finish-telemetryTime: ✅ 84.210ms (SLO: <89.000ms -5.4%) vs baseline: ~same Memory: ✅ 34.544MB (SLO: <46.500MB 📉 -25.7%) vs baseline: +4.7% ✅ update-nameTime: ✅ 40.200ms (SLO: <45.150ms 📉 -11.0%) vs baseline: +0.7% Memory: ✅ 44.258MB (SLO: <47.000MB -5.8%) vs baseline: +5.1% 🟡 span - 26/26✅ add-eventTime: ✅ 20.752ms (SLO: <22.500ms -7.8%) vs baseline: +0.5% Memory: ✅ 50.322MB (SLO: <53.000MB -5.1%) vs baseline: +4.8% ✅ add-metricsTime: ✅ 90.561ms (SLO: <93.500ms -3.1%) vs baseline: -0.9% Memory: ✅ 661.193MB (SLO: <961.000MB 📉 -31.2%) vs baseline: +4.8% ✅ add-tagsTime: ✅ 147.558ms (SLO: <155.000ms -4.8%) vs baseline: ~same Memory: ✅ 661.927MB (SLO: <962.500MB 📉 -31.2%) vs baseline: +4.9% ✅ get-contextTime: ✅ 19.638ms (SLO: <20.500ms -4.2%) vs baseline: +2.5% Memory: ✅ 49.045MB (SLO: <53.000MB -7.5%) vs baseline: +4.5% ✅ is-recordingTime: ✅ 19.576ms (SLO: <20.500ms -4.5%) vs baseline: +0.1% Memory: ✅ 49.124MB (SLO: <53.000MB -7.3%) vs baseline: +4.7% ✅ record-exceptionTime: ✅ 38.494ms (SLO: <40.000ms -3.8%) vs baseline: +0.7% Memory: ✅ 42.737MB (SLO: <53.000MB 📉 -19.4%) vs baseline: +4.8% ✅ set-statusTime: ✅ 21.309ms (SLO: <22.000ms -3.1%) vs baseline: +0.7% Memory: ✅ 49.169MB (SLO: <53.000MB -7.2%) vs baseline: +4.7% ✅ startTime: ✅ 18.938ms (SLO: <20.500ms -7.6%) vs baseline: ~same Memory: ✅ 49.184MB (SLO: <53.000MB -7.2%) vs baseline: +4.8% ✅ start-finishTime: ✅ 51.599ms (SLO: <52.500ms 🟡 -1.7%) vs baseline: ~same Memory: ✅ 32.185MB (SLO: <34.000MB -5.3%) vs baseline: +5.1% ✅ start-finish-telemetryTime: ✅ 52.657ms (SLO: <54.500ms -3.4%) vs baseline: -0.3% Memory: ✅ 32.106MB (SLO: <34.000MB -5.6%) vs baseline: +4.8% ✅ start-finish-traceid128Time: ✅ 54.702ms (SLO: <56.000ms -2.3%) vs baseline: -0.4% Memory: ✅ 32.185MB (SLO: <34.000MB -5.3%) vs baseline: +4.8% ✅ start-traceid128Time: ✅ 19.961ms (SLO: <22.500ms 📉 -11.3%) vs baseline: +1.6% Memory: ✅ 49.075MB (SLO: <53.000MB -7.4%) vs baseline: +4.7% ✅ update-nameTime: ✅ 20.091ms (SLO: <22.000ms -8.7%) vs baseline: -0.3% Memory: ✅ 49.731MB (SLO: <53.000MB -6.2%) vs baseline: +4.8%
|
58c086e to
e9fda2c
Compare
We've tracked down some issues related to broken span hierarchy down to the way that the gRPC instrumentation deals with entering spans, which causes the grpc span to leak outside it's call when using the .future() interface. This PR changes the instrumentation to explicitly only enter the span when calling the continuations instead of just entering immediately. ## Description <!-- Provide an overview of the change and motivation for the change --> ## Testing I've added tests, but haven't run them for a while and working on getting it running locally. Will update if I run into any issues ## Risks <!-- Note any risks associated with this change, or "None" if no risks --> ## Additional Notes --------- Signed-off-by: Sam Schlegel <sam.s@discordapp.com> Co-authored-by: Sam Schlegel <sam.s@discordapp.com> (cherry picked from commit 75a8246)
|
@samschlegel thank you so much for the contribution, apologies on the very long delay getting this merged. |
We've tracked down some issues related to broken span hierarchy down to the way that the gRPC instrumentation deals with entering spans, which causes the grpc span to leak outside it's call when using the .future() interface. This PR changes the instrumentation to explicitly only enter the span when calling the continuations instead of just entering immediately. ## Description <!-- Provide an overview of the change and motivation for the change --> ## Testing I've added tests, but haven't run them for a while and working on getting it running locally. Will update if I run into any issues ## Risks <!-- Note any risks associated with this change, or "None" if no risks --> ## Additional Notes --------- Signed-off-by: Sam Schlegel <sam.s@discordapp.com> Co-authored-by: Sam Schlegel <sam.s@discordapp.com> (cherry picked from commit 75a8246)
….16] (#14900) Backport 75a8246 from #14875 to 3.16. We've tracked down some issues related to broken span hierarchy down to the way that the gRPC instrumentation deals with entering spans, which causes the grpc span to leak outside it's call when using the .future() interface. This PR changes the instrumentation to explicitly only enter the span when calling the continuations instead of just entering immediately. ## Description <!-- Provide an overview of the change and motivation for the change --> ## Testing I've added tests, but haven't run them for a while and working on getting it running locally. Will update if I run into any issues ## Risks <!-- Note any risks associated with this change, or "None" if no risks --> ## Additional Notes Co-authored-by: Sam Schlegel <sam.s@discordapp.com> Signed-off-by: Sam Schlegel <sam.s@discordapp.com> Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com> Co-authored-by: Sam Schlegel <sam.s@discordapp.com> Co-authored-by: Christophe Papazian <114495376+christophe-papazian@users.noreply.github.com>
We've tracked down some issues related to broken span hierarchy down to the way that the gRPC instrumentation deals with entering spans, which causes the grpc span to leak outside it's call when using the .future() interface. This PR changes the instrumentation to explicitly only enter the span when calling the continuations instead of just entering immediately.
Description
Testing
I've added tests, but haven't run them for a while and working on getting it running locally. Will update if I run into any issues
Risks
Additional Notes
Co-authored-by: Sam Schlegel sam.s@discordapp.com