-
Notifications
You must be signed in to change notification settings - Fork 470
fix(iast): wrong memory address in subprocess in mcp servers #15514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 247 ± 2 ms. The average import time from base is: 251 ± 2 ms. The import time difference between this PR and base is: -4.3 ± 0.1 ms. Import time breakdownThe following import paths have shrunk:
|
Performance SLOsComparing candidate avara1986/APPSEC-60135_iast_potencial_error (88aa593) with baseline main (21b25aa) ❌ Test Failures (1 suite)❌ packagespackageforrootmodulemapping - 3/4❌ cache_offTime: ✅ 345.408ms (SLO: <354.300ms -2.5%) vs baseline: -1.0% Memory: ❌ 41.625MB (SLO: <41.500MB +0.3%) vs baseline: +5.1% ✅ cache_onTime: ✅ 0.381µs (SLO: <10.000µs 📉 -96.2%) vs baseline: -0.4% Memory: ✅ 40.170MB (SLO: <41.000MB -2.0%) vs baseline: +5.3% 📈 Performance Regressions (2 suites)📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 5.113µs (SLO: <10.000µs 📉 -48.9%) vs baseline: 📈 +24.1% Memory: ✅ 38.496MB (SLO: <41.000MB -6.1%) vs baseline: +0.3% ✅ ospathbasename_noaspectTime: ✅ 1.082µs (SLO: <10.000µs 📉 -89.2%) vs baseline: -0.8% Memory: ✅ 38.555MB (SLO: <41.000MB -6.0%) vs baseline: +0.3% ✅ ospathjoin_aspectTime: ✅ 5.971µs (SLO: <10.000µs 📉 -40.3%) vs baseline: -4.0% Memory: ✅ 38.240MB (SLO: <41.000MB -6.7%) vs baseline: ~same ✅ ospathjoin_noaspectTime: ✅ 2.300µs (SLO: <10.000µs 📉 -77.0%) vs baseline: +0.3% Memory: ✅ 38.457MB (SLO: <41.000MB -6.2%) vs baseline: +0.5% ✅ ospathnormcase_aspectTime: ✅ 3.449µs (SLO: <10.000µs 📉 -65.5%) vs baseline: +0.2% Memory: ✅ 38.555MB (SLO: <41.000MB -6.0%) vs baseline: +0.4% ✅ ospathnormcase_noaspectTime: ✅ 0.575µs (SLO: <10.000µs 📉 -94.2%) vs baseline: +1.5% Memory: ✅ 38.555MB (SLO: <41.000MB -6.0%) vs baseline: +0.6% ✅ ospathsplit_aspectTime: ✅ 4.864µs (SLO: <10.000µs 📉 -51.4%) vs baseline: +1.3% Memory: ✅ 38.476MB (SLO: <41.000MB -6.2%) vs baseline: -0.1% ✅ ospathsplit_noaspectTime: ✅ 1.592µs (SLO: <10.000µs 📉 -84.1%) vs baseline: ~same Memory: ✅ 38.221MB (SLO: <41.000MB -6.8%) vs baseline: -0.4% ✅ ospathsplitdrive_aspectTime: ✅ 3.636µs (SLO: <10.000µs 📉 -63.6%) vs baseline: -1.0% Memory: ✅ 38.417MB (SLO: <41.000MB -6.3%) vs baseline: ~same ✅ ospathsplitdrive_noaspectTime: ✅ 0.694µs (SLO: <10.000µs 📉 -93.1%) vs baseline: -0.9% Memory: ✅ 38.437MB (SLO: <41.000MB -6.3%) vs baseline: +0.1% ✅ ospathsplitext_aspectTime: ✅ 4.579µs (SLO: <10.000µs 📉 -54.2%) vs baseline: +1.2% Memory: ✅ 38.516MB (SLO: <41.000MB -6.1%) vs baseline: +0.3% ✅ ospathsplitext_noaspectTime: ✅ 1.379µs (SLO: <10.000µs 📉 -86.2%) vs baseline: -0.2% Memory: ✅ 38.594MB (SLO: <41.000MB -5.9%) vs baseline: +0.4% 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.414µs (SLO: <20.000µs 📉 -82.9%) vs baseline: 📈 +15.4% Memory: ✅ 34.800MB (SLO: <35.500MB 🟡 -2.0%) vs baseline: +5.0% ✅ 1-count-metrics-100-timesTime: ✅ 202.229µs (SLO: <220.000µs -8.1%) vs baseline: ~same Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.9% ✅ 1-distribution-metric-1-timesTime: ✅ 3.341µs (SLO: <20.000µs 📉 -83.3%) vs baseline: +0.8% Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.9% ✅ 1-distribution-metrics-100-timesTime: ✅ 218.085µs (SLO: <230.000µs -5.2%) vs baseline: ~same Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.7% ✅ 1-gauge-metric-1-timesTime: ✅ 2.167µs (SLO: <20.000µs 📉 -89.2%) vs baseline: -1.1% Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +4.9% ✅ 1-gauge-metrics-100-timesTime: ✅ 137.210µs (SLO: <150.000µs -8.5%) vs baseline: -1.2% Memory: ✅ 34.800MB (SLO: <35.500MB 🟡 -2.0%) vs baseline: +5.0% ✅ 1-rate-metric-1-timesTime: ✅ 3.088µs (SLO: <20.000µs 📉 -84.6%) vs baseline: -0.1% Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +5.0% ✅ 1-rate-metrics-100-timesTime: ✅ 216.509µs (SLO: <250.000µs 📉 -13.4%) vs baseline: -0.3% Memory: ✅ 34.701MB (SLO: <35.500MB -2.2%) vs baseline: +4.8% ✅ 100-count-metrics-100-timesTime: ✅ 20.354ms (SLO: <22.000ms -7.5%) vs baseline: ~same Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +4.8% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.286ms (SLO: <2.550ms 📉 -10.3%) vs baseline: +0.3% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +5.2% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.434ms (SLO: <1.550ms -7.5%) vs baseline: +0.9% Memory: ✅ 34.721MB (SLO: <35.500MB -2.2%) vs baseline: +4.7% ✅ 100-rate-metrics-100-timesTime: ✅ 2.237ms (SLO: <2.550ms 📉 -12.3%) vs baseline: +0.4% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +5.2% ✅ flush-1-metricTime: ✅ 4.613µs (SLO: <20.000µs 📉 -76.9%) vs baseline: +0.4% Memory: ✅ 34.918MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +4.3% ✅ flush-100-metricsTime: ✅ 173.688µs (SLO: <250.000µs 📉 -30.5%) vs baseline: +0.4% Memory: ✅ 35.095MB (SLO: <35.500MB 🟡 -1.1%) vs baseline: +4.9% ✅ flush-1000-metricsTime: ✅ 2.170ms (SLO: <2.500ms 📉 -13.2%) vs baseline: -0.1% Memory: ✅ 35.960MB (SLO: <36.500MB 🟡 -1.5%) vs baseline: +4.8% 🟡 Near SLO Breach (15 suites)🟡 coreapiscenario - 10/10 (1 unstable)
|
e676dc2 to
05a9d46
Compare
|
The backport to To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-4.0 4.0
# Navigate to the new working tree
cd .worktrees/backport-4.0
# Create a new branch
git switch --create backport-15514-to-4.0
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 d1b4fd8d387aefa177afd726f3c44f616927c0b7
# Push it to GitHub
git push --set-upstream origin backport-15514-to-4.0
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-4.0Then, create a pull request where the |
|
The backport to To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-3.19 3.19
# Navigate to the new working tree
cd .worktrees/backport-3.19
# Create a new branch
git switch --create backport-15514-to-3.19
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 d1b4fd8d387aefa177afd726f3c44f616927c0b7
# Push it to GitHub
git push --set-upstream origin backport-15514-to-3.19
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-3.19Then, create a pull request where the |
IAST-enabled applications using Gunicorn/Uvicorn workers were experiencing segmentation faults (~33% crash rate on MCP streaming requests) due to memory corruption when processes fork. - C++ global singletons (`taint_engine_context`, `initializer`) initialized at module load - Taint maps storing PyObject pointers by memory address - Child processes after fork inherited stale pointers from parent process memory - Accessing these stale pointers → use-after-free → SIGSEGV crash - Implemented `pthread_atfork` handler that automatically resets C++ global state in child processes after every fork: - Added comprehensive null-check wrappers around all native functions to prevent crashes when native state is - Fixed test regression issues where context slots weren't being freed: **[AddressSanitizer (ASAN)](https://github.com/google/sanitizers/wiki/AddressSanitizer)** is a fast memory error detector that catches use-after-free, buffer overflows, and other memory corruption bugs at runtime. **1. Runtime Environment (No Recompilation Required)** The simplest way to test is using LD_PRELOAD with the system's libasan: ```bash ASAN_LIB=$(gcc -print-file-name=libasan.so) LD_PRELOAD=$ASAN_LIB \ ASAN_OPTIONS="detect_leaks=0:symbolize=1:abort_on_error=0" \ python3 -m pytest tests/appsec/iast/test_fork_handler_regression.py -v ``` **ASAN_OPTIONS explained:** - `detect_leaks=0` - Disable leak detection (Python has many false positives) - `symbolize=1` - Show human-readable stack traces - `abort_on_error=0` - Continue after first error (collect all errors) **2. Build with ASAN (Optional, for deeper analysis)** For more thorough testing, compile the native extension with ASAN: ```bash export CFLAGS="-fsanitize=address -fno-omit-frame-pointer -g" export CXXFLAGS="-fsanitize=address -fno-omit-frame-pointer -g" export LDFLAGS="-fsanitize=address" pip install --no-build-isolation --force-reinstall -e . ASAN_OPTIONS="detect_leaks=0:symbolize=1:abort_on_error=0" \ python3 -m pytest tests/appsec/iast/test_fork_handler_regression.py -v ``` This script demonstrates the fork safety fix and can be used to verify ASAN finds no errors: ```python """Minimal fork safety reproduction test for ASAN verification.""" import os from ddtrace.appsec._iast._taint_tracking import OriginType from ddtrace.appsec._iast._taint_tracking._native import initialize_native_state from ddtrace.appsec._iast._taint_tracking._taint_objects import taint_pyobject from ddtrace.appsec._iast._taint_tracking._context import ( start_request_context, debug_context_array_free_slots_number, debug_num_tainted_objects ) def main(): print(f"[Parent PID {os.getpid()}] Initializing IAST...") # Initialize native state initialize_native_state() # Create context and tainted objects in parent ctx_id = start_request_context() print(f"[Parent] Context created: {ctx_id}") # Create some tainted objects (populates native maps) for i in range(10): taint_pyobject(f"data_{i}", "source", f"value_{i}", OriginType.PARAMETER) tainted_count = debug_num_tainted_objects(ctx_id) print(f"[Parent] Tainted objects: {tainted_count}") # Fork pid = os.fork() if pid == 0: # Child process print(f"[Child PID {os.getpid()}] Started after fork") # Verify pthread_atfork reset worked free_slots = debug_context_array_free_slots_number() print(f"[Child] Free slots: {free_slots}") if free_slots > 0: print("[Child] ✅ Context slots were reset (pthread_atfork worked!)") os._exit(0) # Success else: print("[Child] ❌ Context slots NOT reset") os._exit(1) # Failure else: # Parent waits for child _, status = os.waitpid(pid, 0) exit_code = os.WEXITSTATUS(status) if exit_code == 0: print(f"[Parent] ✅ Child exited cleanly - Fork safety verified!") return 0 else: print(f"[Parent] ❌ Child failed with exit code {exit_code}") return 1 if __name__ == "__main__": exit(main()) ``` **Run with ASAN:** ```bash LD_PRELOAD=$(gcc -print-file-name=libasan.so) \ ASAN_OPTIONS="detect_leaks=0:symbolize=1:abort_on_error=0" \ python3 test_fork_asan.py ``` **Expected output (success):** ``` [Parent PID 12345] Initializing IAST... [Parent] Context created: 0 [Parent] Tainted objects: 10 [Child PID 12346] Started after fork [Child] Free slots: 2 [Child] ✅ Context slots were reset (pthread_atfork worked!) [Parent] ✅ Child exited cleanly - Fork safety verified! ``` **What ASAN would report WITHOUT this fix:** ``` ==12346==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000040 ==12346==The signal is caused by a READ memory access. #0 0x7f... in get_tainted_object_map_by_ctx_id #1 0x7f... in debug_context_array_free_slots_number ``` (cherry picked from commit d1b4fd8)
IAST-enabled applications using Gunicorn/Uvicorn workers were experiencing segmentation faults (~33% crash rate on MCP streaming requests) due to memory corruption when processes fork. - C++ global singletons (`taint_engine_context`, `initializer`) initialized at module load - Taint maps storing PyObject pointers by memory address - Child processes after fork inherited stale pointers from parent process memory - Accessing these stale pointers → use-after-free → SIGSEGV crash - Implemented `pthread_atfork` handler that automatically resets C++ global state in child processes after every fork: - Added comprehensive null-check wrappers around all native functions to prevent crashes when native state is - Fixed test regression issues where context slots weren't being freed: **[AddressSanitizer (ASAN)](https://github.com/google/sanitizers/wiki/AddressSanitizer)** is a fast memory error detector that catches use-after-free, buffer overflows, and other memory corruption bugs at runtime. **1. Runtime Environment (No Recompilation Required)** The simplest way to test is using LD_PRELOAD with the system's libasan: ```bash ASAN_LIB=$(gcc -print-file-name=libasan.so) LD_PRELOAD=$ASAN_LIB \ ASAN_OPTIONS="detect_leaks=0:symbolize=1:abort_on_error=0" \ python3 -m pytest tests/appsec/iast/test_fork_handler_regression.py -v ``` **ASAN_OPTIONS explained:** - `detect_leaks=0` - Disable leak detection (Python has many false positives) - `symbolize=1` - Show human-readable stack traces - `abort_on_error=0` - Continue after first error (collect all errors) **2. Build with ASAN (Optional, for deeper analysis)** For more thorough testing, compile the native extension with ASAN: ```bash export CFLAGS="-fsanitize=address -fno-omit-frame-pointer -g" export CXXFLAGS="-fsanitize=address -fno-omit-frame-pointer -g" export LDFLAGS="-fsanitize=address" pip install --no-build-isolation --force-reinstall -e . ASAN_OPTIONS="detect_leaks=0:symbolize=1:abort_on_error=0" \ python3 -m pytest tests/appsec/iast/test_fork_handler_regression.py -v ``` This script demonstrates the fork safety fix and can be used to verify ASAN finds no errors: ```python """Minimal fork safety reproduction test for ASAN verification.""" import os from ddtrace.appsec._iast._taint_tracking import OriginType from ddtrace.appsec._iast._taint_tracking._native import initialize_native_state from ddtrace.appsec._iast._taint_tracking._taint_objects import taint_pyobject from ddtrace.appsec._iast._taint_tracking._context import ( start_request_context, debug_context_array_free_slots_number, debug_num_tainted_objects ) def main(): print(f"[Parent PID {os.getpid()}] Initializing IAST...") # Initialize native state initialize_native_state() # Create context and tainted objects in parent ctx_id = start_request_context() print(f"[Parent] Context created: {ctx_id}") # Create some tainted objects (populates native maps) for i in range(10): taint_pyobject(f"data_{i}", "source", f"value_{i}", OriginType.PARAMETER) tainted_count = debug_num_tainted_objects(ctx_id) print(f"[Parent] Tainted objects: {tainted_count}") # Fork pid = os.fork() if pid == 0: # Child process print(f"[Child PID {os.getpid()}] Started after fork") # Verify pthread_atfork reset worked free_slots = debug_context_array_free_slots_number() print(f"[Child] Free slots: {free_slots}") if free_slots > 0: print("[Child] ✅ Context slots were reset (pthread_atfork worked!)") os._exit(0) # Success else: print("[Child] ❌ Context slots NOT reset") os._exit(1) # Failure else: # Parent waits for child _, status = os.waitpid(pid, 0) exit_code = os.WEXITSTATUS(status) if exit_code == 0: print(f"[Parent] ✅ Child exited cleanly - Fork safety verified!") return 0 else: print(f"[Parent] ❌ Child failed with exit code {exit_code}") return 1 if __name__ == "__main__": exit(main()) ``` **Run with ASAN:** ```bash LD_PRELOAD=$(gcc -print-file-name=libasan.so) \ ASAN_OPTIONS="detect_leaks=0:symbolize=1:abort_on_error=0" \ python3 test_fork_asan.py ``` **Expected output (success):** ``` [Parent PID 12345] Initializing IAST... [Parent] Context created: 0 [Parent] Tainted objects: 10 [Child PID 12346] Started after fork [Child] Free slots: 2 [Child] ✅ Context slots were reset (pthread_atfork worked!) [Parent] ✅ Child exited cleanly - Fork safety verified! ``` **What ASAN would report WITHOUT this fix:** ``` ==12346==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000040 ==12346==The signal is caused by a READ memory access. #0 0x7f... in get_tainted_object_map_by_ctx_id #1 0x7f... in debug_context_array_free_slots_number ``` (cherry picked from commit d1b4fd8)
IAST-enabled applications using Gunicorn/Uvicorn workers were experiencing segmentation faults (~33% crash rate on MCP streaming requests) due to memory corruption when processes fork.
Root Cause
taint_engine_context,initializer) initialized at module loadSolution
pthread_atforkhandler that automatically resets C++ global state in child processes after every fork:ASAN Verification
AddressSanitizer (ASAN) is a fast memory error detector that catches use-after-free, buffer overflows, and other memory corruption bugs at runtime.
1. Runtime Environment (No Recompilation Required)
The simplest way to test is using LD_PRELOAD with the system's libasan:
ASAN_OPTIONS explained:
detect_leaks=0- Disable leak detection (Python has many false positives)symbolize=1- Show human-readable stack tracesabort_on_error=0- Continue after first error (collect all errors)2. Build with ASAN (Optional, for deeper analysis)
For more thorough testing, compile the native extension with ASAN:
Minimal Reproduction Test
This script demonstrates the fork safety fix and can be used to verify ASAN finds no errors:
Run with ASAN:
Expected output (success):
What ASAN would report WITHOUT this fix: