Skip to content

fix: compile ffi from source per-runtime so AppSec loads on Ruby 3.2#152

Closed
zarirhamza wants to merge 1 commit into
mainfrom
zarir.hamza/fix-ruby32-appsec-ffi
Closed

fix: compile ffi from source per-runtime so AppSec loads on Ruby 3.2#152
zarirhamza wants to merge 1 commit into
mainfrom
zarir.hamza/fix-ruby32-appsec-ffi

Conversation

@zarirhamza
Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes a boot crash on the Datadog-Ruby3-2 layer when customers set DD_APPSEC_ENABLED=true. Every cold start currently fails with:

Init<LoadError>: cannot load such file -- ffi_c

Motivation

The v3.28.0 release (#149) globally bumped the bundled datadog (dd-trace-rb) gem from 2.122.30 so the layer could install on Ruby 4.0. dd-trace-rb 2.30 added a real AppSec component — when DD_APPSEC_ENABLED=true, the tracer's boot path is now:

require 'datadog/lambda'                                 (user handler)
→ Datadog::Lambda.configure_apm
  → Datadog.configure
    → AppSec::Component.build_appsec_component             (NEW in 2.30)
      → require 'libddwaf-1.30.0.0.2-x86_64-linux/libddwaf'
        → require 'ffi-1.17.4-x86_64-linux-gnu/ffi'
          → require 'ffi_c'                              💥 LoadError on Ruby 3.2

The precompiled ffi-1.17.4-x86_64-linux-gnu rubygems package ships ABI-specific subdirs:

gems/ffi-1.17.4-x86_64-linux-gnu/
  lib/
    ffi.rb                  (loader → require "#{RUBY_VERSION[0..2]}/ffi_c")
    3.3/ffi_c.so            ✓
    3.4/ffi_c.so            ✓
    (no 3.2 subdir)         ← problem

ffi 1.17 dropped Ruby 3.2 from its precompiled bundle. On Ruby 3.2, the loader's rescue fires, falls back to require 'ffi_c' (no subdir), fails again, propagates LoadError. dd-trace-rb has a rescue in appsec/component.rb:66 that catches the initial error, but the re-raise via the next require 'datadog/auto_instrument' is unrescued, killing the function at init.

Same combo works on Ruby 3.4 and Ruby 4.0 because their ABI subdirs ARE present in the precompiled bundle. Customer impact is bounded by Ruby 3.2 being deprecated by AWS (no new function creation) but existing Ruby 3.2 Lambda functions are affected.

Fix

After the regular gem install datadog step, force-reinstall ffi from source in the per-runtime builder. The build container is the matching ruby:X.Y Docker image (passed via --build-arg image=ruby:${1} in .gitlab/scripts/build_layer.sh), so the resulting ffi_c.so is compiled against the same Ruby ABI as the target Lambda runtime by construction — for every ruby version. Also defensive against future ffi releases dropping additional Ruby ABIs.

apt-get install make libffi-dev pkg-config added so the source build has the headers + linker it needs.

Testing Guidelines

Same flow as v3.28.0:

  1. Wait for build + integration tests in this PR's gitlab pipeline.
  2. Manually click publish layer sandbox (3.2, amd64) (and the other 7 if you want full coverage) to publish test layers to the sandbox account 425362996713.
  3. Smoke-test via a single throwaway Ruby 3.2 lambda + DD_APPSEC_ENABLED=true + the sandbox test layer. The exact repro script that produced the original Init<LoadError> is at /tmp/appsec-repro/ on zarir.hamza's laptop and reproducible in ~3 minutes (deploy → invoke once → check CloudWatch → delete). Expected after this fix: StatusCode: 200, "ok", no ffi_c LoadError in logs.

Integration tests in this PR should continue to pass — they don't set DD_APPSEC_ENABLED=true so they don't exercise the libddwaf load path.

Additional Notes

Layer size impact

Source-built ffi is roughly comparable in size to the precompiled variant — the precompiled gem ships multiple ABI subdirs, the source build ships exactly one. Net size delta should be small (likely ~1–2 MB reduction); check_layer_size.sh will catch any regression.

CI time impact

Building ffi from source adds ~30s per (ruby_version, arch) build, so ~4 minutes total across the 8 combos. Acceptable.

Why not pin ffi to a version that still ships Ruby 3.2 binaries?

libddwaf 1.30.x has a transitive dependency range on ffi that we'd have to keep manually aligned, and pinning ffi would freeze a known-CVE-able transitive dep. Source-compiling is more robust and future-proof.

Coordinated with

  • DataDog/serverless-e2e-tests#223 — added (ruby32, appsec-tracer) xfails for the 7 affected lambda-features tests. Once this PR lands and the prod Datadog-Ruby3-2 republishes, those xfails turn into xpasses that need to be removed.

Types of changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

  • This PR's description is comprehensive
  • This PR contains breaking changes that are documented in the description
  • This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
  • This PR impacts documentation, and it has been updated (or a ticket has been logged)
  • This PR's changes are covered by the automated tests (integration tests exercise the standard boot path; AppSec-specific verification documented in Testing Guidelines)
  • This PR collects user input/sensitive content into Datadog

dd-trace-rb 2.30's AppSec component (which v3.28.0 bundled into all
Datadog-Ruby{3-2,3-3,3-4,4-0} layers) `require`s libddwaf → ffi → ffi_c
at boot when DD_APPSEC_ENABLED=true. The `ffi-1.17.4-x86_64-linux-gnu`
precompiled gem on rubygems ships ABI-specific subdirs
(`lib/3.3/ffi_c.so`, `lib/3.4/ffi_c.so`, etc.) and ffi 1.17 does NOT
include a Ruby 3.2 subdir. Result: every cold start on Datadog-Ruby3-2
crashes with

    Init<LoadError>: cannot load such file -- ffi_c

…stranding any Ruby 3.2 Lambda customer who turns on AppSec.

Force ffi to be (re)installed from source in the per-runtime builder
right after the datadog gem install. The build container is the
matching `ruby:X.Y` Docker image, so the resulting `ffi_c.so` is
compiled against the same Ruby ABI as the target Lambda runtime by
construction, regardless of which ABIs the rubygems precompiled bundle
chooses to ship.

Adds `make libffi-dev pkg-config` to apt-get so `gem install --platform
ruby` has the headers + linker it needs.

Confirmed via single-lambda sandbox repro 2026-05-15:

- Datadog-Ruby3-2:28 + DD_APPSEC_ENABLED=true → Init<LoadError>
- Datadog-Ruby3-4:28 + DD_APPSEC_ENABLED=true → HTTP 200, "ok"

After this PR ships and the next prod layer publishes, the Ruby 3.2
case should also return HTTP 200. xfailed in
DataDog/serverless-e2e-tests#223 in the meantime.

Also defensive against future ffi releases dropping additional Ruby
ABIs from their precompiled bundle.
@zarirhamza zarirhamza requested review from a team as code owners May 15, 2026 16:34
@zarirhamza zarirhamza marked this pull request as draft May 15, 2026 16:37
@zarirhamza zarirhamza closed this May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant