Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of reproducibility between TPOTRegressor and .fitted_pipeline_ attribute #1305

Open
dlmolloy97 opened this issue Jun 22, 2023 · 2 comments

Comments

@dlmolloy97
Copy link

dlmolloy97 commented Jun 22, 2023

It is currently not possible to reproduce the results of regression performed with the TPOTRegressor class with the resulting pipeline.

Context of the issue

Currently, the accuracy score from the .score() method of a TPOTClassifier instance and the output of sklearn.metrics.accuracy_score on the best pipeline are identical. This is not the case with pipelines from TPOTRegressor instances.

Process to reproduce the issue

Classifier (correct/reproducible results)

The following code is used to create a TPOTClassifier, train it on the iris dataset and then return the accuracy

from tpot import TPOTClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target,
                                                    train_size=0.75, test_size=0.25)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
tpot = TPOTClassifier(verbosity=2, max_time_mins=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))

>>>Optimization Progress: 77%
>>>154/200 [01:49<00:46, 1.00s/pipeline]


>>>2.01 minutes have elapsed. TPOT will close down.
>>>TPOT closed during evaluation in one generation.
>>>WARNING: TPOT may not provide a good pipeline if TPOT is stopped/interrupted in a early generation.


>>>TPOT closed prematurely. Will use the current best pipeline.

>>>Best pipeline: MLPClassifier(input_matrix, alpha=0.01, learning_rate_init=0.001)
>>>1.0

When the sklearn.metrics.accuracy_score function is called on the y_test data and the predictions from the best pipeline created by the TPOTClassifier instance, the result is identical:

pipeline = tpot.fitted_pipeline_
from sklearn.metrics import accuracy_score
y_pred = pipeline.predict(X_test)
accuracy_score(y_test, y_pred)
>>> 1.0

Regressor (incorrect/nonreproducible results)

With TPOTRegressor, the results are not identical.

from tpot import TPOTRegressor
#from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes

X, y = load_diabetes(return_X_y=True)

X = X[:1500]
y = y[:1500]

X_train, X_test, y_train, y_test = train_test_split(X,y,
                                                    train_size=0.75, test_size=0.25, random_state=42)


tpot = TPOTRegressor(generations=5, population_size=5, verbosity=2, random_state=42)
tpot.fit(X_train, y_train)
print(-1*tpot.score(X_test, y_test))
>>>Best pipeline: RandomForestRegressor(SelectFromModel(ElasticNetCV(input_matrix, l1_ratio=0.75, tol=0.01), >>>max_features=0.15000000000000002, n_estimators=100, threshold=0.0), bootstrap=True, max_features=0.4, min_samples_leaf=7, >>>min_samples_split=17, n_estimators=100)
>>>2572.133297426151

Unfortunately, the results of rerunning the call to .predict with test data for the best pipeline from the TPOTRegressor object, are not identical to this step:

from sklearn.metrics import mean_absolute_error
pipeline = tpot.fitted_pipeline_
y_pred = pipeline.predict(X_test)
mean_absolute_error(y_test,y_pred)
>>> 40.08185512072557

I would have expected the last line to return 2572.133297426151.

@dlmolloy97 dlmolloy97 changed the title Lack of reproducibility between TPOTRegressor and exported pipeline Lack of reproducibility between TPOTRegressor and .fitted_pipeline_ attribute Jun 22, 2023
@AdamFinkle
Copy link

Succinctly, the problem is that you expected:
some_tpot_classifier.score() == sklearn.metrics.accuracy_score(some_tpot_classifier)
some_tpot_regressor.score() == sklearn.metrics.accuracy_score(some_tpot_regressor)
but instead got:
some_tpot_classifier.score() == sklearn.metrics.accuracy_score(some_tpot_classifier)
some_tpot_regressor.score() != sklearn.metrics.accuracy_score(some_tpot_regressor)

I doubt what you got is intended for at least for some datasets; the first step would be consolidating your demonstration code into an automated test.

def test_accuracy_scoring(...):
    ...
    some_tpot_classifier.score() == sklearn.metrics.accuracy_score(some_tpot_classifier)
    some_tpot_regressor.score() == sklearn.metrics.accuracy_score(some_tpot_regressor)

@perib
Copy link
Contributor

perib commented Sep 14, 2023

The default scoring for TPOTRegressor is 'neg_mean_squared_error'. so tpot.score will return the neg_mean_squared_error. But you are comparing it to mean_absolute_error.

If you want to optimize mean absolute error, you can pass that in as a scorer.

If you change your estimator to the following, you get the same results in your example.
tpot = TPOTRegressor(generations=5, population_size=5, verbosity=2, random_state=42, scoring='neg_mean_absolute_error')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants