Tests for the `AutoML` class relying on `is_classification=false` even when it is a classificaiton task, crash when corrected #1212

eddiebergman · 2021-08-08T13:01:28Z

The is_classificaiton parameter for AutoML.fit defaults to false in this test even though it is indeed a classification problem.

# The test as is
def test_fit(...)
    ... # iris dataset
    automl.fit(
        X_train, Y_train, task=MULTICLASS_CLASSIFICATION,
    )
    ....

Changing the fit to the following causes a test error, even though it is a classification task:

# Adding is_classification=True causes the test to fail
automl.fit(
    X_train, Y_train, task=MULTICLASS_CLASSIFICATION, is_classification=True
)

Extra Context:
Automl.fit creates an InputValidator that is the only thing to use the is_classification param. This is defaulted to false unless explicitly passed.

        self.InputValidator = InputValidator(
            is_classification=is_classification,
            feat_type=feat_type,
            logger_port=self._logger_port,
        )
        self.InputValidator.fit(X_train=X, y_train=y, X_test=X_test, y_test=y_test)
        X, y = self.InputValidator.transform(X, y)

Error:

___________________________________ test_fit ___________________________________

dask_client = <Client: 'inproc://192.168.178.28/175589/1' processes=2 threads=2, memory=7.69 GiB>

    def test_fit(dask_client):

        X_train, Y_train, X_test, Y_test = putil.get_dataset('iris')
        automl = autosklearn.automl.AutoML(
            time_left_for_this_task=30,
            per_run_time_limit=5,
            metric=accuracy,
            dask_client=dask_client,
        )
        automl.fit(
            X_train, Y_train, task=MULTICLASS_CLASSIFICATION, is_classification=True
        )
>       score = automl.score(X_test, Y_test)

test/test_automl/test_automl.py:62:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
autosklearn/automl.py:1430: in score
    prediction = self.InputValidator.target_validator.transform(prediction)
autosklearn/data/target_validator.py:235: in transform
    y = self.encoder.transform(y)
.venv/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:805: in transform
    X_int, X_mask = self._transform(X, handle_unknown=self.handle_unknown)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1)
X = array([[0.99609375, 0.00390625, 0.        ],
       [1.        , 0.        , 0.        ],
       [0.99609375, 0.003906...12, 0.15429688, 0.        ],
       [0.        , 0.9609375 , 0.0390625 ],
       [0.        , 0.        , 1.        ]])
handle_unknown = 'use_encoded_value', force_all_finite = True

    def _transform(self, X, handle_unknown='error', force_all_finite=True):
        X_list, n_samples, n_features = self._check_X(
            X, force_all_finite=force_all_finite)

        X_int = np.zeros((n_samples, n_features), dtype=int)
        X_mask = np.ones((n_samples, n_features), dtype=bool)

        if n_features != len(self.categories_):
>           raise ValueError(
                "The number of features in X is different to the number of "
                "features of the fitted data. The fitted data had {} features "
                "and the X has {} features."
                .format(len(self.categories_,), n_features)
            )
E           ValueError: The number of features in X is different to the number of features of the fitted data. The fitted data had 1 features and the X has 3 features.

.venv/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:120: ValueError
---------------------------- Captured stdout setup -----------------------------
Started Dask client=<Client: 'inproc://192.168.178.28/175589/1' processes=2 threads=2, memory=7.69 GiB>
--------------------------- Captured stdout teardown ---------------------------
Closed Dask client=<Client: 'inproc://192.168.178.28/175589/1' processes=2 threads=2, memory=7.69 GiB>

The text was updated successfully, but these errors were encountered:

project-bot bot added this to LabelMe in Maintenance Aug 8, 2021

eddiebergman added the maintenance Internal maintenance label Aug 8, 2021

eddiebergman added the bug label Jun 10, 2022

eddiebergman mentioned this issue Jul 16, 2022

Type estimators #1542

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests for the `AutoML` class relying on `is_classification=false` even when it is a classificaiton task, crash when corrected #1212

Tests for the `AutoML` class relying on `is_classification=false` even when it is a classificaiton task, crash when corrected #1212

eddiebergman commented Aug 8, 2021 •

edited

Tests for the AutoML class relying on is_classification=false even when it is a classificaiton task, crash when corrected #1212

Tests for the AutoML class relying on is_classification=false even when it is a classificaiton task, crash when corrected #1212

Comments

eddiebergman commented Aug 8, 2021 • edited

Tests for the `AutoML` class relying on `is_classification=false` even when it is a classificaiton task, crash when corrected #1212

Tests for the `AutoML` class relying on `is_classification=false` even when it is a classificaiton task, crash when corrected #1212

eddiebergman commented Aug 8, 2021 •

edited