Java library for converting [Scikit-Learn] (http://scikit-learn.org/) models to PMML.
- Supported Estimator types:
- [
DecisionTreeClassifier
] (http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html). - [
DecisionTreeRegressor
] (http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html). - [
LinearRegression
] (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html). - [
LogisticRegression
] (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html). - [
RandomForestClassifier
] (http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html). - [
RandomForestRegressor
] (http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html).
- [
- Supported Transformer types:
- [
Binarizer
] (http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Binarizer.html). - [
Imputer
] (http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Imputer.html). - [
LabelBinarizer
] (http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelBinarizer.html). - [
LabelEncoder
] (http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html). - [
MinMaxScaler
] (http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html). - [
StandardScaler
] (http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html).
- [
- Production quality:
- Complete test coverage.
- Fully compliant with the [JPMML-Evaluator] (https://github.com/jpmml/jpmml-evaluator) library.
- Python 3.4 or newer.
- [
scikit-learn
] (https://pypi.python.org/pypi/scikit-learn) 0.16.0 or newer. - [
pandas
] (https://pypi.python.org/pypi/pandas) 0.16.2 or newer. - [
sklearn-pandas
] (https://pypi.python.org/pypi/sklearn-pandas) 0.0.10 or newer. - [
joblib
] (https://pypi.python.org/pypi/joblib) 0.8.4 or newer. - [
numpy
] (https://pypi.python.org/pypi/numpy) 1.9.2 or newer.
Python installation can be validated as follows:
import sklearn, pandas, sklearn_pandas, joblib, numpy
print(sklearn.__version__)
print(pandas.__version__)
print(sklearn_pandas.__version__)
print(joblib.__version__)
print(numpy.__version__)
- Java 1.7 or newer.
Enter the project root directory and build using [Apache Maven] (http://maven.apache.org/):
mvn clean install
The build produces an executable uber-JAR file target/converter-executable-1.0-SNAPSHOT.jar
.
A typical workflow can be summarized as follows:
- Use Python to train a model.
- Serialize the model in
pickle
data format to a file in a local filesystem. - Use the JPMML-SkLearn command-line converter application to turn the pickle file to a PMML file.
Load data to a pandas.DataFrame
object:
import pandas
iris_df = pandas.read_csv("Iris.csv")
Describe data and data pre-processing actions by creating an appropriate sklearn_pandas.DataFrameMapper
object:
from sklearn.preprocessing import StandardScaler
from sklearn_pandas import DataFrameMapper
iris_mapper = DataFrameMapper([
("Sepal.Length", StandardScaler()),
("Sepal.Width", StandardScaler()),
("Petal.Length", StandardScaler()),
("Petal.Width", StandardScaler()),
("Species", None)
])
iris_df = iris_mapper.fit_transform(iris_df)
Train an appropriate estimator object:
from sklearn.ensemble.forest import RandomForestClassifier
iris_X = iris_df[:, 0:4]
iris_y = iris_df[:, 4]
iris_forest = RandomForestClassifier(min_samples_leaf = 5)
iris_forest.fit(iris_X, iris_y)
Serialize the sklearn_pandas.DataFrameMapper
object and estimator object in pickle
data format:
from sklearn.externals import joblib
joblib.dump(iris_mapper, "mapper.pkl")
joblib.dump(iris_forest, "estimator.pkl")
Please see the test script file [main.py] (https://github.com/jpmml/jpmml-sklearn/blob/master/src/test/resources/main.py) for more classification (binary and multi-class) and regression workflows.
Converting the estimator pickle file estimator.pkl
to a PMML file estimator.pmml
:
java -jar target/converter-executable-1.0-SNAPSHOT.jar --pkl-input estimator.pkl --pmml-output estimator.pmml
Converting the sklearn_pandas.DataFrameMapper
pickle file mapper.pkl
and the estimator pickle file estimator.pkl
to a PMML file mapper-estimator.pmml
:
java -jar target/converter-executable-1.0-SNAPSHOT.jar --pkl-mapper-input mapper.pkl --pkl-estimator-input estimator.pkl --pmml-output mapper-estimator.pmml
Getting help:
java -jar target/converter-executable-1.0-SNAPSHOT.jar --help
JPMML-SkLearn is dual-licensed under the [GNU Affero General Public License (AGPL) version 3.0] (http://www.gnu.org/licenses/agpl-3.0.html) and a commercial license.
Please contact [info@openscoring.io] (mailto:info@openscoring.io)