fix: compile ffi from source per-runtime so AppSec loads on Ruby 3.2#152
Closed
zarirhamza wants to merge 1 commit into
Closed
fix: compile ffi from source per-runtime so AppSec loads on Ruby 3.2#152zarirhamza wants to merge 1 commit into
zarirhamza wants to merge 1 commit into
Conversation
dd-trace-rb 2.30's AppSec component (which v3.28.0 bundled into all
Datadog-Ruby{3-2,3-3,3-4,4-0} layers) `require`s libddwaf → ffi → ffi_c
at boot when DD_APPSEC_ENABLED=true. The `ffi-1.17.4-x86_64-linux-gnu`
precompiled gem on rubygems ships ABI-specific subdirs
(`lib/3.3/ffi_c.so`, `lib/3.4/ffi_c.so`, etc.) and ffi 1.17 does NOT
include a Ruby 3.2 subdir. Result: every cold start on Datadog-Ruby3-2
crashes with
Init<LoadError>: cannot load such file -- ffi_c
…stranding any Ruby 3.2 Lambda customer who turns on AppSec.
Force ffi to be (re)installed from source in the per-runtime builder
right after the datadog gem install. The build container is the
matching `ruby:X.Y` Docker image, so the resulting `ffi_c.so` is
compiled against the same Ruby ABI as the target Lambda runtime by
construction, regardless of which ABIs the rubygems precompiled bundle
chooses to ship.
Adds `make libffi-dev pkg-config` to apt-get so `gem install --platform
ruby` has the headers + linker it needs.
Confirmed via single-lambda sandbox repro 2026-05-15:
- Datadog-Ruby3-2:28 + DD_APPSEC_ENABLED=true → Init<LoadError>
- Datadog-Ruby3-4:28 + DD_APPSEC_ENABLED=true → HTTP 200, "ok"
After this PR ships and the next prod layer publishes, the Ruby 3.2
case should also return HTTP 200. xfailed in
DataDog/serverless-e2e-tests#223 in the meantime.
Also defensive against future ffi releases dropping additional Ruby
ABIs from their precompiled bundle.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes a boot crash on the
Datadog-Ruby3-2layer when customers setDD_APPSEC_ENABLED=true. Every cold start currently fails with:Motivation
The
v3.28.0release (#149) globally bumped the bundleddatadog(dd-trace-rb) gem from2.12→2.30so the layer could install on Ruby 4.0. dd-trace-rb 2.30 added a real AppSec component — whenDD_APPSEC_ENABLED=true, the tracer's boot path is now:The precompiled
ffi-1.17.4-x86_64-linux-gnurubygems package ships ABI-specific subdirs:ffi 1.17dropped Ruby 3.2 from its precompiled bundle. On Ruby 3.2, the loader's rescue fires, falls back torequire 'ffi_c'(no subdir), fails again, propagatesLoadError. dd-trace-rb has arescueinappsec/component.rb:66that catches the initial error, but the re-raise via the nextrequire 'datadog/auto_instrument'is unrescued, killing the function at init.Same combo works on Ruby 3.4 and Ruby 4.0 because their ABI subdirs ARE present in the precompiled bundle. Customer impact is bounded by Ruby 3.2 being deprecated by AWS (no new function creation) but existing Ruby 3.2 Lambda functions are affected.
Fix
After the regular
gem install datadogstep, force-reinstallffifrom source in the per-runtime builder. The build container is the matchingruby:X.YDocker image (passed via--build-arg image=ruby:${1}in.gitlab/scripts/build_layer.sh), so the resultingffi_c.sois compiled against the same Ruby ABI as the target Lambda runtime by construction — for every ruby version. Also defensive against futureffireleases dropping additional Ruby ABIs.apt-get install make libffi-dev pkg-configadded so the source build has the headers + linker it needs.Testing Guidelines
Same flow as
v3.28.0:publish layer sandbox (3.2, amd64)(and the other 7 if you want full coverage) to publish test layers to the sandbox account425362996713.DD_APPSEC_ENABLED=true+ the sandbox test layer. The exact repro script that produced the originalInit<LoadError>is at/tmp/appsec-repro/onzarir.hamza's laptop and reproducible in ~3 minutes (deploy → invoke once → check CloudWatch → delete). Expected after this fix:StatusCode: 200,"ok", noffi_cLoadError in logs.Integration tests in this PR should continue to pass — they don't set
DD_APPSEC_ENABLED=trueso they don't exercise the libddwaf load path.Additional Notes
Layer size impact
Source-built
ffiis roughly comparable in size to the precompiled variant — the precompiled gem ships multiple ABI subdirs, the source build ships exactly one. Net size delta should be small (likely ~1–2 MB reduction);check_layer_size.shwill catch any regression.CI time impact
Building
ffifrom source adds ~30s per(ruby_version, arch)build, so ~4 minutes total across the 8 combos. Acceptable.Why not pin
ffito a version that still ships Ruby 3.2 binaries?libddwaf 1.30.xhas a transitive dependency range onffithat we'd have to keep manually aligned, and pinning ffi would freeze a known-CVE-able transitive dep. Source-compiling is more robust and future-proof.Coordinated with
(ruby32, appsec-tracer)xfails for the 7 affected lambda-features tests. Once this PR lands and the prodDatadog-Ruby3-2republishes, those xfails turn into xpasses that need to be removed.Types of changes
Check all that apply