scikit-learn-intelex integration #1316

ethanglaser · 2023-08-15T19:59:45Z

Context

The Intel(R) Extension for Scikit-learn (sklearnex) provides accelerations to popular classical machine learning algorithms, both on CPU and GPU. Given TPOT's heavy usage of scikit-learn algorithms, we believe there are compelling reasons for an integration of some sort with sklearnex's optimized regression and classification algorithms. Initial experimentation has shown potential for significant performance improvements - see this jupyter notebook for further detail.

Proposal

There are a few directions that this could go:

Integrate sklearnex into TPOT backend and allow users to set a use_sklearnex flag when initializing their TPOT classifier or regressor, in which case their config would use sklearnex implementations of algorithms instead of the default sklearn implementation (where possible). See an example of what this might look like in the code backend here: fork and how it could translate into performance improvements in the notebook.

Pros: any config could be accelerated here, no exclusion of algorithms (would use default sklearn implementation if an algorithm is not supported in sklearnex), relatively clean integration as shown in the branch above
Cons: configs and circumstances that do not lead to heavy usage of sklearnex-supported algorithms would not get significant performance improvements (i.e. sklearnex does not have an implementation for neural_network.MLPClassifier)

Create a separate sklearnex config classifier and regressor, which would yield a pipeline with sklearnex-supported algorithms (possibly something like this regressor_config_dict_sklearnex)

Pros: all or most algorithms included in this config would be accelerated by sklearnex, yielding optimal performance improvements
Cons: not all algorithms that a user might be interested in comparing would be covered by this config, and similarly - use of the existing TPOT configs that users are familiar with would not be accelerated

A combination of 1 and 2. Integrate into TPOT backend with a flag for users that want to accelerate existing configs, as well as a separate config focused on the sklearnex-accelerated algorithms.

Pros: provides users with the most flexibility - can use the new config (option 2), accelerate existing configs (option 1), or use the original configs without accelerations as usual - fully backwards compatible
Cons: none other than it would be the most involved integration (but still fairly simple)

In either case, there would be corresponding docs/tests updates and an additional tutorial created for a smooth integration, as well as any other additions you feel would be necessary.

Thank you for your consideration and look forward to continuing this discussion.

The text was updated successfully, but these errors were encountered:

ethanglaser · 2023-09-19T14:42:23Z

Just following up on this to see if it would be of interest.

perib · 2023-09-20T17:56:48Z

we have shifted development to TPOT2, which is a refactored version of TPOT1 that is hopefully easier to work with (We will pin something about it to the issues page soon). You can find that here https://github.com/EpistasisLab/tpot2

But yes, I would be interested in exploring this. I think option 2 makes the most sense. there are other similar accelerated packages we were considering, such as cuML. Option2 would give them all the same interface.

ethanglaser · 2023-10-03T12:50:37Z

Great, I can open up a PR reflecting an integration described with option 2 to continue discussion here. I see in TPOT2 the configs are a bit different in format than in the original library, and I a not seeing the cuML config or other similar custom ones - any suggestions on approach for this?

perib · 2023-10-03T15:29:32Z

The configuration setup is different in TPOT2. Rather than a single configuration dictionary, TPOT2 takes in three. One for the leaves, roots, and inner nodes. Additionally, we allow multiple configurations to be selected simultaneously and have broken up the configuration dictionary into modular pieces (selection, transformers, classifiers, regressors, etc). Some configurations are also not fixed and depend on the shape of your dataset. More information on how to set this up can be found in tutorial 2 here.

To add a custom configuration to TPOT2, a file defining the search space can be added to the configs folder here. Then an option can be added to this function to allow it as an option for the TPOTEstimator.

This approach could be used to add cuML support or sklearnex.

We still need to add cuML to TPOT2, which is on the to-do list.

ethanglaser mentioned this issue Oct 5, 2023

Initial sklearnex support EpistasisLab/tpot2#102

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scikit-learn-intelex integration #1316

scikit-learn-intelex integration #1316

ethanglaser commented Aug 15, 2023 •

edited

ethanglaser commented Sep 19, 2023

perib commented Sep 20, 2023

ethanglaser commented Oct 3, 2023 •

edited

perib commented Oct 3, 2023

scikit-learn-intelex integration #1316

scikit-learn-intelex integration #1316

Comments

ethanglaser commented Aug 15, 2023 • edited

Context

Proposal

ethanglaser commented Sep 19, 2023

perib commented Sep 20, 2023

ethanglaser commented Oct 3, 2023 • edited

perib commented Oct 3, 2023

ethanglaser commented Aug 15, 2023 •

edited

ethanglaser commented Oct 3, 2023 •

edited