Skip to content

ci: revert test-elixir cache path fix#5831

Merged
max-sixty merged 1 commit intomainfrom
fix/issue-5830
Apr 26, 2026
Merged

ci: revert test-elixir cache path fix#5831
max-sixty merged 1 commit intomainfrom
fix/issue-5830

Conversation

@prql-bot
Copy link
Copy Markdown
Collaborator

Problem

After #5826 merged at 08:44 UTC and ran on main (cache miss → fresh build → cache saved), the next run on main (the 11:04 UTC nightly, run 24955057135) hit the cache and failed:

** (UndefinedFunctionError) function PRQL.Native.compile/2 is undefined (module PRQL.Native is not available)
Failed to load NIF library: '.../prqlc/bindings/elixir/_build/test/lib/prql/priv/native/prql.so: cannot open shared object file: No such file or directory

Root cause

The _build cache from #5826 doesn't include the Rustler-built NIF. By default Mix symlinks _build/<env>/lib/<app>/priv to priv/ in the project root, and priv/ sits outside the cached path. The saved cache is only ~288KB (beam files only — a real NIF would be multi-MB).

On a cache hit:

  1. _build/ is restored (beam files + dangling priv symlink)
  2. priv/native/prql.so is gone
  3. mix compile sees beam files as up-to-date → skips Rustler → no rebuild
  4. mix test tries to load the NIF → ENOENT

The deps cache is fine on its own, but #5826 added them as a pair and the _build half is the broken one.

Solution

Revert #5826. This restores the prior state where the cache silently never restored — slow but correct. A proper re-add needs:

  • Either include prqlc/bindings/elixir/priv in the cache paths, and make the cache key sensitive to Rust source (native/**, the prqlc crate sources) so a stale prql.so doesn't get reused when Rust changes;
  • Or drop just the _build cache and keep the deps cache, accepting a fresh ~1.5 min Rust rebuild every run.

Leaving that design call to a maintainer.

Testing

The verification is the next nightly (or any tests run on main after this lands) showing test-elixir green again — same posture as pre-#5826, since this restores the exact prior workflow.


Closes #5830 — automated triage

Reverts #5826. The corrected cache paths exposed a latent bug: the
`_build` cache restores beam files but not the Rustler-built NIF
(`prql.so`), because `_build/test/lib/prql/priv` is a symlink to
`priv/` outside the cached path. After the cache hit, `mix compile`
sees beam files as up-to-date and skips Rustler, leaving the test
runtime with no NIF to load.

Reverting restores the prior state where the cache silently never
restored — slow but correct. A maintainer can redesign caching with
attention to the priv/ symlink and Rust-source-aware cache keys.

Closes #5830

Co-Authored-By: Claude <noreply@anthropic.com>
@prql-bot prql-bot mentioned this pull request Apr 26, 2026
@prql-bot
Copy link
Copy Markdown
Collaborator Author

A separate ci-fix session landed on the same root cause but went forward instead of back. Posting the diff here so a maintainer can pick between the two — happy with whichever lands.

The forward fix is also small: cache priv/ alongside _build, bump the cache name to invalidate the stale entry from #5826, and add Cargo.lock to the key so Rust dep changes invalidate the build cache.

--- a/.github/workflows/test-elixir.yaml
+++ b/.github/workflows/test-elixir.yaml
@@ -68,19 +68,29 @@ jobs:
       # Step: Define how to cache the `_build` directory. After the first run,
       # this speeds up tests runs a lot. This includes not re-compiling our
       # project's downloaded deps every run.
+      #
+      # Also cache `priv/`: Rustler builds the NIF via cargo into the workspace
+      # `target/` and then copies it to `priv/native/prql.so`. `_build` only
+      # holds Elixir artifacts (and a symlink to `priv`), so without caching
+      # `priv` the restored cache is missing the .so but mix still considers
+      # the project compiled and skips the rebuild.
+      #
+      # Cargo.lock is in the key because the .so depends on Rust deps; an
+      # update there should invalidate the build cache.
       - name: Cache compiled build
         id: cache-build
         uses: actions/cache@v4
         env:
-          cache-name: cache-compiled-build
+          cache-name: cache-compiled-build-v2
         with:
-          path: prqlc/bindings/elixir/_build
+          path: |
+            prqlc/bindings/elixir/_build
+            prqlc/bindings/elixir/priv
           key:
             ${{ runner.os }}-mix-${{ env.cache-name }}-${{
-            hashFiles('**/mix.lock') }}
+            hashFiles('**/mix.lock', '**/Cargo.lock') }}
           restore-keys: |
             ${{ runner.os }}-mix-${{ env.cache-name }}-
-            ${{ runner.os }}-mix-

Trade-off vs. the revert here: the revert is a one-line round-trip that gets CI green tonight and leaves caching for later. The forward fix keeps caching working — the elixir job currently spends ~2 min on the Rust build, so reverting concedes that on every run, including for the deps cache half (which was always correct on its own; #5826 just bundled it with the broken _build half).

Source for the priv/native location: the successful PR run for #5826 logged Copying /home/runner/work/prql/prql/target/release/libprql.so to priv/native/prql.so (job 73060377548) — the priv/ is relative to the job's working-directory prqlc/bindings/elixir/, not under _build/, which is why _build-only caching loses it. Also dropped the loose ${{ runner.os }}-mix- restore-key as a small drive-by — it would cross-restore the deps cache into the build slot.

@max-sixty max-sixty merged commit 8ff3a61 into main Apr 26, 2026
41 checks passed
@max-sixty max-sixty deleted the fix/issue-5830 branch April 26, 2026 16:16
@max-sixty
Copy link
Copy Markdown
Member

do a full fix and link from here please

@prql-bot
Copy link
Copy Markdown
Collaborator Author

Forward fix opened as #5832 — caches priv/ alongside _build, bumps the cache name, and includes Cargo.lock + prqlc/**/*.rs in the build cache key so the .so is rebuilt whenever any Rust code that produces it changes. Drops restore-keys on the build cache to avoid partial-prefix restores delivering a stale .so. Description there has full reasoning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Nightly tests failed

2 participants