feat(profiling): profile threading.Semaphore primitives with Python Lock profiler#15327
feat(profiling): profile threading.Semaphore primitives with Python Lock profiler#15327vlad-scherbich merged 3 commits intomainfrom
threading.Semaphore primitives with Python Lock profiler#15327Conversation
|
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 255 ± 4 ms. The average import time from base is: 259 ± 5 ms. The import time difference between this PR and base is: -3.9 ± 0.2 ms. Import time breakdownThe following import paths have shrunk:
|
Performance SLOsComparing candidate vlad/proflock-implement-semaphore (cec6168) with baseline main (2590370) 📈 Performance Regressions (2 suites)📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 5.117µs (SLO: <10.000µs 📉 -48.8%) vs baseline: 📈 +24.5% Memory: ✅ 40.088MB (SLO: <41.000MB -2.2%) vs baseline: +4.4% ✅ ospathbasename_noaspectTime: ✅ 1.080µs (SLO: <10.000µs 📉 -89.2%) vs baseline: -0.2% Memory: ✅ 40.147MB (SLO: <41.000MB -2.1%) vs baseline: +4.6% ✅ ospathjoin_aspectTime: ✅ 6.129µs (SLO: <10.000µs 📉 -38.7%) vs baseline: +0.3% Memory: ✅ 40.088MB (SLO: <41.000MB -2.2%) vs baseline: +4.9% ✅ ospathjoin_noaspectTime: ✅ 2.268µs (SLO: <10.000µs 📉 -77.3%) vs baseline: -0.8% Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.8% ✅ ospathnormcase_aspectTime: ✅ 3.452µs (SLO: <10.000µs 📉 -65.5%) vs baseline: +0.9% Memory: ✅ 40.128MB (SLO: <41.000MB -2.1%) vs baseline: +4.9% ✅ ospathnormcase_noaspectTime: ✅ 0.563µs (SLO: <10.000µs 📉 -94.4%) vs baseline: -1.6% Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +5.0% ✅ ospathsplit_aspectTime: ✅ 4.701µs (SLO: <10.000µs 📉 -53.0%) vs baseline: -0.6% Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +5.0% ✅ ospathsplit_noaspectTime: ✅ 1.589µs (SLO: <10.000µs 📉 -84.1%) vs baseline: -0.7% Memory: ✅ 40.167MB (SLO: <41.000MB -2.0%) vs baseline: +4.7% ✅ ospathsplitdrive_aspectTime: ✅ 3.581µs (SLO: <10.000µs 📉 -64.2%) vs baseline: -1.7% Memory: ✅ 40.285MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +4.9% ✅ ospathsplitdrive_noaspectTime: ✅ 0.696µs (SLO: <10.000µs 📉 -93.0%) vs baseline: +0.2% Memory: ✅ 40.088MB (SLO: <41.000MB -2.2%) vs baseline: +4.4% ✅ ospathsplitext_aspectTime: ✅ 4.536µs (SLO: <10.000µs 📉 -54.6%) vs baseline: +0.5% Memory: ✅ 40.206MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +4.9% ✅ ospathsplitext_noaspectTime: ✅ 1.380µs (SLO: <10.000µs 📉 -86.2%) vs baseline: ~same Memory: ✅ 40.324MB (SLO: <41.000MB 🟡 -1.6%) vs baseline: +4.9% 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.355µs (SLO: <20.000µs 📉 -83.2%) vs baseline: 📈 +14.3% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +5.0% ✅ 1-count-metrics-100-timesTime: ✅ 200.462µs (SLO: <220.000µs -8.9%) vs baseline: -0.6% Memory: ✅ 34.701MB (SLO: <35.500MB -2.2%) vs baseline: +4.4% ✅ 1-distribution-metric-1-timesTime: ✅ 3.230µs (SLO: <20.000µs 📉 -83.8%) vs baseline: -1.7% Memory: ✅ 34.701MB (SLO: <35.500MB -2.2%) vs baseline: +4.7% ✅ 1-distribution-metrics-100-timesTime: ✅ 212.794µs (SLO: <230.000µs -7.5%) vs baseline: -1.5% Memory: ✅ 34.701MB (SLO: <35.500MB -2.2%) vs baseline: +4.6% ✅ 1-gauge-metric-1-timesTime: ✅ 2.150µs (SLO: <20.000µs 📉 -89.2%) vs baseline: ~same Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.8% ✅ 1-gauge-metrics-100-timesTime: ✅ 136.049µs (SLO: <150.000µs -9.3%) vs baseline: +0.5% Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.9% ✅ 1-rate-metric-1-timesTime: ✅ 3.037µs (SLO: <20.000µs 📉 -84.8%) vs baseline: -1.7% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +5.1% ✅ 1-rate-metrics-100-timesTime: ✅ 213.037µs (SLO: <250.000µs 📉 -14.8%) vs baseline: -0.7% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +5.2% ✅ 100-count-metrics-100-timesTime: ✅ 20.201ms (SLO: <22.000ms -8.2%) vs baseline: +0.2% Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +4.7% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.236ms (SLO: <2.300ms -2.8%) vs baseline: -0.6% Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +5.0% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.401ms (SLO: <1.550ms -9.6%) vs baseline: +0.5% Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +5.0% ✅ 100-rate-metrics-100-timesTime: ✅ 2.193ms (SLO: <2.550ms 📉 -14.0%) vs baseline: -0.4% Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.9% ✅ flush-1-metricTime: ✅ 4.511µs (SLO: <20.000µs 📉 -77.4%) vs baseline: -0.6% Memory: ✅ 35.114MB (SLO: <35.500MB 🟡 -1.1%) vs baseline: +5.0% ✅ flush-100-metricsTime: ✅ 175.315µs (SLO: <250.000µs 📉 -29.9%) vs baseline: ~same Memory: ✅ 35.075MB (SLO: <35.500MB 🟡 -1.2%) vs baseline: +4.3% ✅ flush-1000-metricsTime: ✅ 2.172ms (SLO: <2.500ms 📉 -13.1%) vs baseline: -0.5% Memory: ✅ 35.979MB (SLO: <36.500MB 🟡 -1.4%) vs baseline: +5.0% 🟡 Near SLO Breach (15 suites)🟡 coreapiscenario - 10/10 (1 unstable)
|
334e0f5 to
703ce51
Compare
threading.Semaphore objects in Python Lock Profiler
threading.Semaphore objects in Python Lock Profilerthreading.Semaphore objects in Python Lock profiler
e4ffe7b to
36b7ed5
Compare
threading.Semaphore objects in Python Lock profilerthreading.Semaphore primitives in Python Lock profiler
threading.Semaphore primitives in Python Lock profilerthreading.Semaphore primitives in Python Lock profiler
threading.Semaphore primitives in Python Lock profilerthreading.Semaphore primitives with Python Lock profiler
d57fa82 to
ae2c3c9
Compare
15e5ed6 to
b6d6cf5
Compare
61f7f5c to
8068688
Compare
KowalskiThomas
left a comment
There was a problem hiding this comment.
Makes sense to me overall.
8068688 to
cec6168
Compare
https://datadoghq.atlassian.net/browse/PROF-12727
Description
This PR adds profiling support for
threading.Semaphoreto the Lock Profiler, including critical double-counting prevention when multiple lock collectors are active simultaneously.What
threading.Semaphoreusagethreading.pyinternalsWhy
Without direct Semaphore profiling:
threading.pyinternals (e.g.,Condition.__enter__at line 502)The double-counting problem:
Without prevention, one user operation generates multiple profile samples, making metrics inaccurate.
Changes
1. New Semaphore Collector (
ddtrace/profiling/collector/threading.py)2. Double-Counting Prevention (
ddtrace/profiling/collector/_lock.py)3. Drive-by fixes (
tests/profiling_v2/collector/test_threading.py)Testing
New Unit Tests
test_stack_trace_points_to_user_codetest_internal_lock_marked_correctlytest_no_double_counting_with_lock_collectornote: location of manual test scripts: https://github.com/DataDog/dd-trace-py/compare/vlad/proflock-semaphores-demo?expand=1
UI: BEFORE
Flame graph -- shows internal lib locks in
threading.py:threading.py:401:_lockin Semaphore.wait_for method (its internalLockmember)UI: AFTER
Flamegraph -- shows actual
threading.Semaphorename from user code:connection_pool_semaphoreImplementation Notes
Semaphore → Condition → Lockchain is not visiblePerformance Impact
__slots__field (~8% increase from optimized version)sys._getframe(2)call, +1 path comparison (one-time per lock)if self.is_internalcheck (negligible)Risk
is_internaldetection (mitigated by testing)