[PRI-280] Nightly notebook install check by adrian-prior · Pull Request #901 · PriorLabs/TabPFN

adrian-prior · 2026-04-27T22:04:44Z

Summary

Adds a nightly workflow that reproduces every examples/notebooks/*.ipynb install sequence in a fresh venv across Python 3.11 / 3.12 / 3.13, and asserts tabpfn resolves to PyPI's latest. Failures open or comment on a tracking issue (no email by default for scheduled runs).

Catches the regression class behind PRI-280 — a transitive dep capping tabpfn below the latest release.

Files

tests/test_notebook_installs.py — pytest test, gated by RUN_NOTEBOOK_INSTALL_CHECK=1. Parameterized across notebooks.
.github/workflows/nightly-notebook-check.yml — daily cron + workflow_dispatch, matrix over Python 3.11/3.12/3.13.
changelog/901.added.md — towncrier fragment.

Verified end-to-end

Sample run on this PR: https://github.com/PriorLabs/TabPFN/actions/runs/25063167744/job/73422829076?pr=901

Catches the failure mode that caused PRI-280: a transitive dependency in one of the example notebooks silently caps `tabpfn` below the latest PyPI release, so Colab users get an outdated model. Two files: - tests/test_notebook_installs.py Parses pip-install lines from each `examples/notebooks/*.ipynb` and reproduces the install sequence in a fresh per-notebook venv. Asserts the resolved `tabpfn` version matches the current latest on PyPI. Skipped by default; enabled via RUN_NOTEBOOK_INSTALL_CHECK=1. Sets UV_NO_CONFIG=1 in the subprocess env so any [tool.uv] config in the repo's pyproject.toml (e.g. `exclude-newer = "7 days"`) doesn't bleed into the simulated user environment. Without this the test would produce false positives when a freshly-released dep is excluded by exclude-newer and the resolver falls back to a stale capped version. Network failures during install or PyPI metadata fetch are treated as infra issues (test skip, not fail) so a flaky PyPI doesn't open false tracking issues from the nightly run. - .github/workflows/nightly-notebook-check.yml Daily cron at 07:17 UTC + workflow_dispatch. Runs the test against the live PyPI index. On failure, opens or comments on a tracking GitHub issue with the `nightly-failure` label — scheduled-run failures don't notify maintainers by default, so we make the failure visible by surfacing it in the issue tracker. `pytest --noconftest` is used because tests/conftest.py imports numpy/torch and the runner here doesn't install them (the test creates its own per-notebook venvs). Verified end-to-end locally: with the cap fix in tabpfn-extensions 0.3.0 now live on PyPI, both notebooks resolve `tabpfn==7.1.1` and the test passes. With UV_NO_CONFIG omitted, the test correctly fails because TabPFN's repo-local `exclude-newer = "7 days"` excludes the 3-day-old 0.3.0 release and the resolver falls back to 0.2.2 → 6.4.1. That negative test confirms the assertion catches the real bug class.

chatgpt-codex-connector · 2026-04-27T22:04:48Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

gemini-code-assist

Code Review

This pull request introduces a new test suite in tests/test_notebook_installs.py designed to verify that example notebooks correctly resolve the tabpfn package to its latest PyPI release. The test simulates a fresh installation environment using uv to detect potential transitive dependency conflicts. Feedback focuses on optimizing the PyPI version lookup through caching and improving the portability of the test suite by handling cross-platform path differences for virtual environments.

- Suppress S603/S607 on subprocess calls (uv args come from controlled notebook source, uv intentionally found via PATH). - Suppress S310 on the urlopen (hardcoded https://pypi.org URL, no scheme ambiguity) — same pattern as src/tabpfn/model_loading.py. - Cast json-loaded version string to str so mypy stops complaining about Returning Any. - Add explicit check=False on the install subprocess (return code is inspected manually). - Wrap the over-long stderr-tail line. - Unquote the cron expression to satisfy yamllint. - Apply ruff format. Verified: `ruff check` + `ruff format --check` both pass; the test still passes locally with RUN_NOTEBOOK_INSTALL_CHECK=1.

- @functools.lru_cache on _latest_tabpfn_version: hit PyPI once per session instead of once per notebook. Also makes assertion stable if a tabpfn release happens to land mid-test-session. - Cross-platform PATH/bin construction: use os.pathsep and `Scripts` on Windows. CI is ubuntu-only, but local-on-Windows now works.

Previously the test caught network-shaped errors and converted them to pytest.skip — to avoid false-positive nightly-failure issues from transient PyPI blips. In practice this hides real problems behind a "skipped" status and adds non-trivial complexity (retry loops, backoff, NETWORKY string matching, an "unreachable" RuntimeError fallback). New approach: any error is a test failure. The workflow's failure-issue filer catches it; the maintainer triages and reruns via workflow_dispatch if it was just a transient network issue. One click vs. 30 lines of defensive code. Concretely: - Drop retry loop in _latest_tabpfn_version; one urlopen, raise on failure. - Drop NETWORKY constant + the network-error-detection branch in the install loop. - Use check=True on the install subprocess and drop capture_output — uv's progress now streams directly to the GH Actions log, which is more useful for debugging than a captured stderr tail. - Drop now-unused imports: time, urllib.error. 13 insertions, 39 deletions.

- Drop speculative "likely cause" line from the assertion message and the issue body. Just state that the installed version doesn't match PyPI latest; let the maintainer triage. - Drop the "catches transitive caps" framing from the test docstring and workflow header — describe what the workflow does, not what it's supposed to catch. - Add changelog/901.added.md so the towncrier changelog check passes.

- Remove the "skip if no install line mentions tabpfn" check. It's half-broken (matches tabpfn-client, which doesn't install tabpfn) and defensive against a scenario that shouldn't exist: an example notebook that doesn't install tabpfn. If that ever happens, the test failing loudly is the right outcome. - Drop --python 3.12 from `uv venv`. The workflow's actions/setup-python step already pins the runner Python to 3.12, so the flag was duplicate configuration. Keeping the version in one place means future bumps (e.g. when Colab moves off 3.12) only need a single edit.

Catches Python-version-specific resolver outcomes (e.g. a transitive dep with `python_version`-conditional pins, or a wheel that ships for some versions but not others). Future-proofs against Colab moving off 3.12 too — the matrix will catch a regression on the new version without requiring a workflow edit. - strategy.matrix.python-version: ["3.11", "3.12", "3.13"] - fail-fast: false so 3.11 failing doesn't hide 3.12 / 3.13 results - Each failing matrix cell files or comments on the same shared tracking issue, with the Python version included in the body for triage. Documented race condition: if all three matrix cells fail simultaneously and all see an empty issue list, multiple issues may get filed. Acceptable — maintainer closes duplicates.

Previously the issue body only had the run URL and Python version. The maintainer had to click through to the GH Actions log to see which notebook failed and what version was installed. Now the test step tees pytest output to pytest_output.txt (alongside the live stream to the runner log). The github-script step reads the file and extracts lines starting with "FAILED " — which pytest -v prints in the short summary section, including the test ID (with notebook name) and the full assertion message. Both are factual content, not speculation. Example body the maintainer would see: Run: https://... Python: 3.13 ``` FAILED tests/test_notebook_installs.py::...[TabPFN_Demo_Local.ipynb] - AssertionError: TabPFN_Demo_Local.ipynb: installed tabpfn==6.4.1, PyPI latest is 7.1.1. ``` See logs for full output. If the file is missing or contains no FAILED lines (e.g. a step crashed before pytest ran), the body falls back to a "(no FAILED lines parsed; see logs)" placeholder.

To be reverted before merge. Lets us exercise the workflow end-to-end on a GitHub-hosted runner (matrix across Python 3.11/3.12/3.13) without waiting for the nightly cron. paths: filter prevents firing on unrelated PRs if this commit accidentally lands on main.

Was added to validate the workflow on a PR runner before merge. Now that we've seen it run end-to-end (https://github.com/PriorLabs/TabPFN/actions/runs/25063167744), revert to the schedule + workflow_dispatch shape.

chatgpt-codex-connector · 2026-04-28T15:56:51Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

bejaeger

Thanks for working on this! Looks mostly good to me, just some suggestions.

Also, do you have an estimate how long that test would run?

bejaeger · 2026-04-29T10:39:24Z

+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              state: "open",
+              labels: ["nightly-failure"],


is that label already added?

bejaeger · 2026-04-29T10:42:37Z

+              if (lines.length > 0) failedLines = lines.join("\n");
+            } catch (e) { /* output file missing — fall through to default */ }
+
+            const title = "Nightly: notebook install check failed";


let's add the python version in this title? otherwise we get three issues with the same title of all python versions fail?

adrian-prior requested a review from a team as a code owner April 27, 2026 22:04

adrian-prior requested review from bejaeger and removed request for a team April 27, 2026 22:04

gemini-code-assist Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread tests/test_notebook_installs.py

Comment thread tests/test_notebook_installs.py Outdated

Comment thread tests/test_notebook_installs.py

adrian-prior marked this pull request as draft April 27, 2026 22:27

adrian-prior removed the request for review from bejaeger April 27, 2026 22:28

adrian-prior added 9 commits April 28, 2026 17:10

Revert temporary pull_request trigger

b69610c

Was added to validate the workflow on a PR runner before merge. Now that we've seen it run end-to-end (https://github.com/PriorLabs/TabPFN/actions/runs/25063167744), revert to the schedule + workflow_dispatch shape.

adrian-prior marked this pull request as ready for review April 28, 2026 15:56

adrian-prior requested a review from bejaeger April 28, 2026 15:57

bejaeger approved these changes Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PRI-280] Nightly notebook install check#901

[PRI-280] Nightly notebook install check#901
adrian-prior wants to merge 10 commits intomainfrom
adrian/pri-280-colab-notebook-outdated-tabpfn

adrian-prior commented Apr 27, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Apr 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented Apr 28, 2026

Uh oh!

bejaeger left a comment

Uh oh!

bejaeger Apr 29, 2026

Uh oh!

bejaeger Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adrian-prior commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files

Verified end-to-end

Uh oh!

chatgpt-codex-connector Bot commented Apr 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented Apr 28, 2026

Uh oh!

bejaeger left a comment

Choose a reason for hiding this comment

Uh oh!

bejaeger Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

bejaeger Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adrian-prior commented Apr 27, 2026 •

edited

Loading