Resolved issue saving & loading the predictor (Regressor/Classifier) model on different devices (cuda / cpu) #486

MagnusBuehler · 2025-09-05T16:03:53Z

Motivation and Context

Issue #416 reported a crash when a model saved on a CUDA device was later loaded on a CPU-only machine, due to tensors being serialized as CUDA tensors. This has been fixed by ensuring all tensor-like elements of the predictor are saved as CPU tensors, allowing safe loading on any device. The desired device can be specified (or automatically inferred) at loading time. This adds a negligible one-time overhead (~±10 ms, tested over 10 runs). Corresponding tests now cover all combinations of fitting, saving, and loading devices.
Further I added uv.lock to the .gitignore file and renamed a function in "src/tabpfn/inference.py" from save_state_expect_model_weights to save_state_except_model_weights, as it makes more sense with what the function is doing.

Public API Changes

No Public API changes
Yes, Public API changes (Details below)

How Has This Been Tested?

On a cuda-environment & locally. Also added corresponding pytests.

Checklist

The changes have been tested locally.
Documentation has been updated (if the public API or usage changes).
A entry has been added to CHANGELOG.md (if relevant for users).
The code follows the project's style guidelines.
I have considered the impact of these changes on the public API.

… cuda device and loaded on a cpu device. Fix consists of always loading the device to cpu before saving. Reason: Once a "cuda tensor" is saved and later loaded on a cpu only machine the joblib.load() throws an error. One could monkey patch the torch.load(...,map_location="cpu") and restore it later. But moving all tensors to cpu once during saving is cleaner and has minimal one-time performance costs (+-10 ms) (and is a robust fix, without potential side-effects). also extended the "test_save_load_happy_path" test, with a `loading_device` and `saving_device`.

… on a cuda device and lader loading on a cpu-device. Also added test for different cuda-cpu saving-loading combinations.

gemini-code-assist

Code Review

This pull request effectively resolves the issue of saving and loading models across different devices (CPU/CUDA) by ensuring all tensor-like objects are moved to the CPU before serialization. The renaming of save_state_expect_model_weights to save_state_except_model_weights is a good clarification. The addition of comprehensive tests for various device combinations is a great improvement. I have a few suggestions to enhance the robustness of the implementation and the tests.

tests/test_save_load_fitted_model.py

src/tabpfn/model_loading.py

tests/test_save_load_fitted_model.py

…loading.py checked for tensors and nn.modules

src/tabpfn/model_loading.py

noahho

LGTM, thanks for the work. Please only review the comment made.

Co-authored-by: noahho <noah@priorlabs.ai>

…ressor/Classifier) model on different devices (cuda / cpu) (#156) * Record copied public PR 486 * Resolved issue saving & loading the predictor (Regressor/Classifier) model on different devices (cuda / cpu) (#486) Co-authored-by: noahho <noah@priorlabs.ai> --------- Co-authored-by: mirror-bot <mirror-bot@users.noreply.github.com> Co-authored-by: MagnusBuehler <111045718+MagnusBuehler@users.noreply.github.com> Co-authored-by: noahho <noah@priorlabs.ai>

MagnusBuehler and others added 3 commits August 29, 2025 11:14

added fix for fitting & saving the predictor (classifier & regressor)…

525c105

… on a cuda device and lader loading on a cpu-device. Also added test for different cuda-cpu saving-loading combinations.

ruff formatting & added uv.lock to the .gitignore

1c66d50

gemini-code-assist bot reviewed Sep 5, 2025

View reviewed changes

tests/test_save_load_fitted_model.py Outdated Show resolved Hide resolved

src/tabpfn/model_loading.py Show resolved Hide resolved

tests/test_save_load_fitted_model.py Outdated Show resolved Hide resolved

MagnusBuehler added 2 commits September 5, 2025 18:13

added fixture to restore cuda availability after tests. and in model_…

b173f9d

…loading.py checked for tensors and nn.modules

updated comment

72b730f

brendan-priorlabs self-requested a review September 10, 2025 13:06

noahho reviewed Sep 13, 2025

View reviewed changes

src/tabpfn/model_loading.py Outdated Show resolved Hide resolved

noahho approved these changes Sep 13, 2025

View reviewed changes

corrected comments

c1fd793

Co-authored-by: noahho <noah@priorlabs.ai>

noahho merged commit 542270b into PriorLabs:main Sep 16, 2025
18 of 19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Resolved issue saving & loading the predictor (Regressor/Classifier) model on different devices (cuda / cpu) #486

Resolved issue saving & loading the predictor (Regressor/Classifier) model on different devices (cuda / cpu) #486

Uh oh!

MagnusBuehler commented Sep 5, 2025 •

edited by oscarkey

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

noahho left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Resolved issue saving & loading the predictor (Regressor/Classifier) model on different devices (cuda / cpu) #486

Resolved issue saving & loading the predictor (Regressor/Classifier) model on different devices (cuda / cpu) #486

Uh oh!

Conversation

MagnusBuehler commented Sep 5, 2025 • edited by oscarkey Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Public API Changes

How Has This Been Tested?

On a cuda-environment & locally. Also added corresponding pytests.

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

noahho left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MagnusBuehler commented Sep 5, 2025 •

edited by oscarkey

Loading