Skip to content

android: patch DT_NEEDED on GPU samplers so they actually load (fixes #270)#271

Open
prithidevghosh wants to merge 1 commit into
DenisovAV:mainfrom
prithidevghosh:fix/android-sampler-dt-needed
Open

android: patch DT_NEEDED on GPU samplers so they actually load (fixes #270)#271
prithidevghosh wants to merge 1 commit into
DenisovAV:mainfrom
prithidevghosh:fix/android-sampler-dt-needed

Conversation

@prithidevghosh
Copy link
Copy Markdown

@prithidevghosh prithidevghosh commented May 12, 2026

Summary

Fixes #270: the two GPU sampler .sos flutter_gemma ships in prebuilt/android_arm64/ (libLiteRtTopKOpenClSampler.so and libLiteRtTopKWebGpuSampler.so) silently fail to dlopen at runtime because they reference LiteRtCreateEnvironment (and friends) as undefined symbols but don't declare any NEEDED dependency on the library that provides it. The engine then falls back to CPU sampling, which costs ~3× in end-to-end decode throughput on Gemma 4 E2B INT4 (3 tok/s vs 8.7 tok/s — upstream google-ai-edge/LiteRT-LM#2211 measured a 2.87× speedup after the equivalent patch).

Fix

One new step (8b.) in native/litert_lm/build_android.sh: after copying Google's prebuilt companion .sos into prebuilt/android_arm64/, run patchelf --add-needed libLiteRtLm.so on both samplers. The verification step at the end of the script now also prints the post-patch DT_NEEDED list so CI logs make the fix visible.

The upstream workaround in google-ai-edge/LiteRT-LM#2211 targets libLiteRt.so. This distribution links LiteRt symbols statically into the rebuilt libLiteRtLm.so (see the comment above the bazelisk build, lines 96–102), so the correct target here is libLiteRtLm.so. I verified with llvm-readelf --dyn-syms that LiteRtCreateEnvironment is exported GLOBAL DEFAULT from the existing libLiteRtLm.so build, so adding it as NEEDED is sufficient — no other changes needed.

patchelf is added as a host-side build prerequisite (brew install patchelf / apt-get install patchelf); the script exits early with a clear message if it's missing. The --add-needed step is idempotent — it skips if libLiteRtLm.so is already in the NEEDED list, which lets the script be re-run safely.

Test plan

  • CI build runner has patchelf installed (or installs it before running build_android.sh).
  • After build, llvm-readelf -d prebuilt/android_arm64/libLiteRtTopKOpenClSampler.so | grep NEEDED shows libLiteRtLm.so.
  • Same for libLiteRtTopKWebGpuSampler.so.
  • On a real Android arm64 device with flutter_gemma's example app + Gemma 4 E2B INT4, logcat no longer prints "OpenCL sampler not available" / "WebGPU sampler not available" / "GPU sampler unavailable. Falling back to CPU sampling.".
  • Measured decode rate on Gemma 4 E2B INT4 returns to GPU-sampling speeds (≈8–9 tok/s on flagship arm64 hardware, vs. the ~3 tok/s seen with the regression).

Notes for the maintainer

  • 16KB page alignment (-Wl,-z,max-page-size=16384) is preserved: patchelf --add-needed only touches the dynamic section (PT_DYNAMIC), not PT_LOAD alignment. Recent patchelf (≥0.18) handles this correctly.
  • A long-term fix belongs in upstream LiteRT-LM's Bazel linkoptsgoogle-ai-edge/LiteRT-LM#2211 is still open. When upstream lands the proper fix, this patchelf step becomes a no-op (the idempotency check short-circuits it) and can be removed.

gh pr create --repo DenisovAV/flutter_gemma --base main --head prithidevghosh:fix/android-sampler-dt-needed --title "android: patch DT_NEEDED on GPU samplers so they actually load (fixes #270)" --body-file /tmp/pr_body.md 2>&1 | tail -5

The two sampler .sos in upstream LiteRT-LM's prebuilt/android_arm64/
(libLiteRtTopKOpenClSampler.so, libLiteRtTopKWebGpuSampler.so) reference
LiteRtCreateEnvironment as undefined but declare no NEEDED dependency on
the library that provides it. Bionic's per-library linker namespace then
refuses to resolve the symbol at dlopen, the samplers fail to load, and
the engine silently falls back to CPU sampling.

Measured impact in the wild: ~3 tok/s decode on Gemma 4 E2B INT4 instead
of ~8.7 tok/s with GPU sampling (upstream google-ai-edge/LiteRT-LM#2211
reports a 2.87x speedup after the equivalent patch).

This distribution links LiteRt symbols statically into the rebuilt
libLiteRtLm.so, so unlike the upstream workaround (which adds
libLiteRt.so) we add libLiteRtLm.so as the NEEDED dependency. patchelf
is added as a host-side build dep (brew/apt).

Fixes DenisovAV#270.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Android: GPU samplers silently fall back to CPU — libLiteRtTopK{OpenCl,WebGpu}Sampler.so missing DT_NEEDED libLiteRtLm.so

1 participant