This repository contains a comprehensive set of tools to upgrade a wide variety of Scikit-learn models from older versions (e.g., <=1.3.x
) to be compatible with newer Scikit-learn environments.
During the upgrade process, some models are refit with their original training data, while others have their internal components directly upgraded. It is crucial to understand which models are affected:
-
Models that ARE refit:
KNeighborsClassifier
KNeighborsRegressor
NearestNeighbors
-
Why is this necessary?
- For these models, the internal data structures are not easily portable across Scikit-learn versions. To ensure compatibility, the tool extracts the original training data (
X
andy
) and uses it to refit a new model instance in the target environment.
- For these models, the internal data structures are not easily portable across Scikit-learn versions. To ensure compatibility, the tool extracts the original training data (
-
What does this mean for you?
- The refitting process ensures that your model is compatible with the new environment, but it is essentially a new model trained on the same data. The performance should be identical, but the model object itself is new.
-
Pipelines:
- If your
Pipeline
,FeatureUnion
, orColumnTransformer
contains one of the models listed above, that specific step will be refit, while other steps will be upgraded directly.
- If your
All other supported models are upgraded by transferring their learned attributes to a new model instance without refitting.
Note
Our internal saving logic uses pickle
and not joblib
. Refrain from using joblib
to load converted models after they have been generated if not using the command-line interface directly.
- Model Converter: Extracts and upgrades the internal components of a Scikit-learn model.
- Model Upgrader: Reconstructs and updates the model using the upgraded components.
- Extensive Model Support: Handles a wide range of models, from simple scalers to complex pipelines.
- Modular Architecture: Uses a handler-based system that can be easily extended to support new model types.
- Unified CLI: A single, easy-to-use command-line interface for all operations.
This tool supports a wide range of Scikit-learn models, including:
- Tree-Based Models:
DecisionTreeClassifier
DecisionTreeRegressor
RandomForestClassifier
RandomForestRegressor
- Scalers:
StandardScaler
MinMaxScaler
- Nearest Neighbor Models:
KNeighborsClassifier
KNeighborsRegressor
NearestNeighbors
- Support Vector Machines (SVMs):
SVC
SVR
- Linear Models:
LogisticRegression
Ridge
Lasso
- Decomposition Models:
PCA
IncrementalPCA
KernelPCA
- Preprocessing Models:
PolynomialFeatures
OneHotEncoder
- Pipeline Models:
Pipeline
FeatureUnion
ColumnTransformer
-
Clone the repository:
git clone https://github.com/Infinitode/Scikit-learn-Model-Updater.git cd Scikit-learn-Model-Updater
-
Set up your environments: This tool requires two separate Python environments:
old_environment
: Where your original model was trained and saved. It should have the older versions ofscikit-learn
,numpy
, and other relevant libraries.new_environment
: Where you want to use the upgraded model. It should have the newer/target versions ofscikit-learn
andnumpy
.
It is highly recommended to use virtual environments (e.g.,
venv
orconda
) to manage these dependencies.
The model upgrade is a two-step process: convert
and upgrade
.
This step extracts the model's internal components and prepares them for the upgrade.
- Activate your
old_environment
. - Run the
convert
command:This will generate three files: a model "shell", the upgraded state, and a JSON metadata file.python main.py convert \ --model_path path/to/original_model.pkl \ --shell_path path/to/model_shell.pkl \ --upgraded_tree_state_path path/to/upgraded_state.pkl \ --model_json_path path/to/model_metadata.json
This step reconstructs the model in the new environment using the files generated in the convert
step.
- Activate your
new_environment
. - Run the
upgrade
command:This will create the final, upgraded model file.python main.py upgrade \ --shell_path path/to/model_shell.pkl \ --upgraded_tree_state_path path/to/upgraded_state.pkl \ --model_json_path path/to/model_metadata.json \ --output_path path/to/upgraded_model.pkl
To ensure the tool is working correctly, you can run the included tests. The tests are named as follows:
test_conversion.py
: Run in the old environment to test old model conversion.test_upgrade.py
: Run in the new environment to test model upgrade.
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions are welcome! Please open an issue or submit a pull request with your changes.
This project was inspired by the need to adapt Scikit-learn models to newer versions and ensure compatibility in various environments.