Lack of reproducibility between TPOTRegressor and .fitted_pipeline_ attribute #1305

dlmolloy97 · 2023-06-22T12:57:30Z

It is currently not possible to reproduce the results of regression performed with the TPOTRegressor class with the resulting pipeline.

Context of the issue

Currently, the accuracy score from the .score() method of a TPOTClassifier instance and the output of sklearn.metrics.accuracy_score on the best pipeline are identical. This is not the case with pipelines from TPOTRegressor instances.

Process to reproduce the issue

Classifier (correct/reproducible results)

The following code is used to create a TPOTClassifier, train it on the iris dataset and then return the accuracy

from tpot import TPOTClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target,
                                                    train_size=0.75, test_size=0.25)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
tpot = TPOTClassifier(verbosity=2, max_time_mins=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))

>>>Optimization Progress: 77%
>>>154/200 [01:49<00:46, 1.00s/pipeline]


>>>2.01 minutes have elapsed. TPOT will close down.
>>>TPOT closed during evaluation in one generation.
>>>WARNING: TPOT may not provide a good pipeline if TPOT is stopped/interrupted in a early generation.


>>>TPOT closed prematurely. Will use the current best pipeline.

>>>Best pipeline: MLPClassifier(input_matrix, alpha=0.01, learning_rate_init=0.001)
>>>1.0

When the sklearn.metrics.accuracy_score function is called on the y_test data and the predictions from the best pipeline created by the TPOTClassifier instance, the result is identical:

pipeline = tpot.fitted_pipeline_
from sklearn.metrics import accuracy_score
y_pred = pipeline.predict(X_test)
accuracy_score(y_test, y_pred)
>>> 1.0

Regressor (incorrect/nonreproducible results)

With TPOTRegressor, the results are not identical.

from tpot import TPOTRegressor
#from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes

X, y = load_diabetes(return_X_y=True)

X = X[:1500]
y = y[:1500]

X_train, X_test, y_train, y_test = train_test_split(X,y,
                                                    train_size=0.75, test_size=0.25, random_state=42)


tpot = TPOTRegressor(generations=5, population_size=5, verbosity=2, random_state=42)
tpot.fit(X_train, y_train)
print(-1*tpot.score(X_test, y_test))
>>>Best pipeline: RandomForestRegressor(SelectFromModel(ElasticNetCV(input_matrix, l1_ratio=0.75, tol=0.01), >>>max_features=0.15000000000000002, n_estimators=100, threshold=0.0), bootstrap=True, max_features=0.4, min_samples_leaf=7, >>>min_samples_split=17, n_estimators=100)
>>>2572.133297426151

Unfortunately, the results of rerunning the call to .predict with test data for the best pipeline from the TPOTRegressor object, are not identical to this step:

from sklearn.metrics import mean_absolute_error
pipeline = tpot.fitted_pipeline_
y_pred = pipeline.predict(X_test)
mean_absolute_error(y_test,y_pred)
>>> 40.08185512072557

I would have expected the last line to return 2572.133297426151.

The text was updated successfully, but these errors were encountered:

AdamFinkle · 2023-08-03T23:05:33Z

Succinctly, the problem is that you expected:
some_tpot_classifier.score() == sklearn.metrics.accuracy_score(some_tpot_classifier)
some_tpot_regressor.score() == sklearn.metrics.accuracy_score(some_tpot_regressor)
but instead got:
some_tpot_classifier.score() == sklearn.metrics.accuracy_score(some_tpot_classifier)
some_tpot_regressor.score() != sklearn.metrics.accuracy_score(some_tpot_regressor)

I doubt what you got is intended for at least for some datasets; the first step would be consolidating your demonstration code into an automated test.

def test_accuracy_scoring(...):
    ...
    some_tpot_classifier.score() == sklearn.metrics.accuracy_score(some_tpot_classifier)
    some_tpot_regressor.score() == sklearn.metrics.accuracy_score(some_tpot_regressor)

perib · 2023-09-14T21:17:22Z

The default scoring for TPOTRegressor is 'neg_mean_squared_error'. so tpot.score will return the neg_mean_squared_error. But you are comparing it to mean_absolute_error.

If you want to optimize mean absolute error, you can pass that in as a scorer.

If you change your estimator to the following, you get the same results in your example.
tpot = TPOTRegressor(generations=5, population_size=5, verbosity=2, random_state=42, scoring='neg_mean_absolute_error')

dlmolloy97 changed the title ~~Lack of reproducibility between TPOTRegressor and exported pipeline~~ Lack of reproducibility between TPOTRegressor and .fitted_pipeline_ attribute Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lack of reproducibility between TPOTRegressor and .fitted_pipeline_ attribute #1305

Lack of reproducibility between TPOTRegressor and .fitted_pipeline_ attribute #1305

dlmolloy97 commented Jun 22, 2023 •

edited

AdamFinkle commented Aug 3, 2023

perib commented Sep 14, 2023

Lack of reproducibility between TPOTRegressor and .fitted_pipeline_ attribute #1305

Lack of reproducibility between TPOTRegressor and .fitted_pipeline_ attribute #1305

Comments

dlmolloy97 commented Jun 22, 2023 • edited

Context of the issue

Process to reproduce the issue

Classifier (correct/reproducible results)

Regressor (incorrect/nonreproducible results)

AdamFinkle commented Aug 3, 2023

perib commented Sep 14, 2023

dlmolloy97 commented Jun 22, 2023 •

edited