-
Notifications
You must be signed in to change notification settings - Fork 538
Select multiple gpus for inference, if available. Don't use them yet. #496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If the specified device is "auto" and multiple cuda gpus are available, then select all the gpus. This PR doesn't actually implement multi-gpu inference. Instead, all the inference engines just use the first. Public API changes: - `TabPFNClassifier/Regressor.device_` has been renamed to `.devices_`, and is now a tuple rather than a single device. - The `device` argument of `TabPFNClassifier/Regressor.__init__()` is unchanged, as is the `.device` property. Multiple gpus will only be used when this is set to "auto". Notes: - `TabPFNRegressor.znorm_space_bardist_` is always placed on the first device, as we're not aiming to parallelise this portion of inference - `infer_devices()` no longer accepts `device=None`: I couldn't find anywhere using this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request makes the necessary preparatory changes for multi-GPU inference by updating the device handling logic. The device_ attribute is correctly renamed to devices_ and now holds a tuple of devices. The new infer_devices utility function is well-implemented and tested, correctly selecting all available CUDA GPUs when device="auto". The changes are consistent across the codebase. I've left a few minor suggestions regarding a typo in the changelog, some repeated code in the inference engines, and a confusing test case name. Overall, this is a solid step towards multi-GPU support.
|
@noahho are you happy with these public api changes? |
noahho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
PriorLabs/TabPFN#496 renamed infer_device_and_type() to infer_devices(), and updated it to return a tuple of devices. Update tabpfn-extensions to use this new method, and select the first device if multiple devices are returned.
PriorLabs/TabPFN#496 renamed infer_device_and_type() to infer_devices(), and updated it to return a tuple of devices. Update tabpfn-extensions to use this new method, and select the first device if multiple devices are returned.
PriorLabs/TabPFN#496 renamed infer_device_and_type() to infer_devices(), and updated it to return a tuple of devices. Update tabpfn-extensions to use this new method, and select the first device if multiple devices are returned.
… 2.1.4. (#165) PriorLabs/TabPFN#496 renamed infer_device_and_type() to infer_devices(), and updated it to return a tuple of devices. Update tabpfn-extensions to use this new method, and select the first device if multiple devices are returned. Fix seed in random forest test to fix flakiness.
… 2.1.4. (PriorLabs#165) PriorLabs/TabPFN#496 renamed infer_device_and_type() to infer_devices(), and updated it to return a tuple of devices. Update tabpfn-extensions to use this new method, and select the first device if multiple devices are returned. Fix seed in random forest test to fix flakiness.
* Add TabEBM implementation for synthetic data generation using SGLD - Introduced the TabEBM class for generating synthetic tabular data based on an energy-based model. - Implemented methods for data preprocessing, SGLD sampling, and energy computation. - Added support for various input formats including numpy arrays, PyTorch tensors, and pandas DataFrames. - Created a caching mechanism for fitted models to optimize repeated sampling. - Developed comprehensive unit tests to validate functionality and performance of the TabEBM class. - Included error handling for invalid input types and mismatched shapes. * Update src/tabpfn_extensions/tabebm/TabEBM.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/tabpfn_extensions/tabebm/TabEBM.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/tabpfn_extensions/tabebm/TabEBM.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/tabpfn_extensions/tabebm/TabEBM.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update src/tabpfn_extensions/tabebm/TabEBM.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Fix formatting in author section of README.md * Fix import paths in TabEBM example and test files * Refactor code for improved readability and consistency in TabEBM class and tests * Remove redundant test for invalid input in to_numpy method in TestTabEBMStaticMethods * Reorder import statements for consistency in test_tabebm.py * Update src/tabpfn_extensions/tabebm/tabebm.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update src/tabpfn_extensions/tabebm/README.md Co-authored-by: Klemens Flöge <117587964+klemens-floege@users.noreply.github.com> * Rename 'debug' parameter to 'verbose' in TabEBM class methods for clarity and consistency * Add script for augmenting real-world data using TabEBM This script replicates the functionality of the notebook, utilizing sklearn for data loading and preprocessing. It includes functions for loading the adult dataset, identifying feature types, subsampling data, preprocessing, and training models on both original and augmented datasets. The script demonstrates the effectiveness of data augmentation with TabEBM by comparing the performance of models trained on real data versus those trained on augmented data. * Remove verbose output during SGLD updates in TabEBM class to streamline performance * Fix bug in AutoTabPFN due to renamed infer_devices() method in tabpfn 2.1.4. (#165) PriorLabs/TabPFN#496 renamed infer_device_and_type() to infer_devices(), and updated it to return a tuple of devices. Update tabpfn-extensions to use this new method, and select the first device if multiple devices are returned. Fix seed in random forest test to fix flakiness. * Run the tests on push to main, and also when requested. (#166) At the moment we only run the tests in PRs, but this can mean that the tests are broken on main due to interactions with concurrent PRs. Also add a "workflow_dispatch" trigger so we can run the tests manually. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Klemens Flöge <117587964+klemens-floege@users.noreply.github.com> Co-authored-by: Oscar Key <oscar@priorlabs.ai>
…Don't use them yet. (#143) * Record copied public PR 496 * Select multiple gpus for inference, if available. Don't use them yet. (#496) If the specified device is "auto" and multiple cuda gpus are available, then select all the gpus. This PR doesn't actually implement multi-gpu inference. Instead, all the inference engines just use the first. Public API changes: - `TabPFNClassifier/Regressor.device_` has been renamed to `.devices_`, and is now a tuple rather than a single device. - The `device` argument of `TabPFNClassifier/Regressor.__init__()` is unchanged, as is the `.device` property. Multiple gpus will only be used when this is set to "auto". An alternative would be to update this to `devices`. I kept it like this as most users probably only want a single device, it avoids changing the api, and we don't actually support multi-device inference in this PR. Notes: - `TabPFNRegressor.znorm_space_bardist_` is always placed on the first device, as we're not aiming to parallelise this portion of inference - `infer_devices()` no longer accepts `device=None`: I couldn't find anywhere using this --------- Co-authored-by: mirror-bot <mirror-bot@users.noreply.github.com> Co-authored-by: Oscar Key <oscar@priorlabs.ai>
If the specified device is "auto" and multiple cuda gpus are available, then select all the gpus. This PR doesn't actually implement multi-gpu inference. Instead, all the inference engines just use the first.
Public API changes:
TabPFNClassifier/Regressor.device_has been renamed to.devices_, and is now a tuple rather than a single device.deviceargument ofTabPFNClassifier/Regressor.__init__()is unchanged, as is the.deviceproperty. Multiple gpus will only be used when this is set to "auto". An alternative would be to update this todevices. I kept it like this as most users probably only want a single device, it avoids changing the api, and we don't actually support multi-device inference in this PR. But maybe it is confusing to have bothdeviceanddevices_. What do you think?Notes:
TabPFNRegressor.znorm_space_bardist_is always placed on the first device, as we're not aiming to parallelise this portion of inferenceinfer_devices()no longer acceptsdevice=None: I couldn't find anywhere using this