[BUG] _DummyModel's predict function should be able to predict on `dataset.data` #2202

OlivierBinette · 2022-12-17T17:55:34Z

Describe the bug

A _DummyModel created on a given dataset is not able to predict on dataset.data. It can only predict on dataset.features_columns.

This behavior is inconsistent with sklearn models which can predict on any dataframe that contains the necessary columns, even if there are extraneous ones. Data validation in _DummyModel's predict function should be more forgiving.

This can cause issues in checks where y_pred is passed instead of a model and the run_logic expects the context's model (which is a dummy model) to be able to predict on dataset.data like a sklearn model would.

To Reproduce

from deepchecks.tabular.context import _DummyModel
from deepchecks.tabular.datasets.regression import wine_quality

dataset = wine_quality.load_data()[0]
model = wine_quality.load_fitted_model()
y_pred = model.predict(dataset.data)

dummy =_DummyModel(train = dataset, test=None, y_pred_train=y_pred)

dummy.predict(dataset.features_columns)
## Works as expected

dummy.predict(dataset.data)
## raise DeepchecksValueError('Data that has not been seen before passed for inference with static '
##                            'predictions. Pass a real model to resolve this')

Expected behavior

Dummy model should be able to predict on dataset.data, and on any dataframe which contains the necessary columns.

Environment (please complete the following information):

OS: Linux (Ubuntu 22)
Python Version: 3.10
Deepchecks Version: 0.10.0

The text was updated successfully, but these errors were encountered:

noamzbr · 2022-12-18T09:41:39Z

Thanks for noticing that @OlivierBinette!
Indeed sklearn models can predict when additional (unneeded) columns are present, but it's an ability we do not require from user models (see our model guide). The user may pass a custom model, and in that case it's not required to be able to predict when additional columns are present.

In that light, all internal calls to model.predict() in the tabular package always pass dataset.feature_columns, rather than dataset.data, and that's why we never run into an issue with _DummyModel only working on the features (and actually it is helpful for internal development that errors are raised in tests when model.predict(dataset.data) is mistakenly used).

Given that it's an internal mechanism, is there a case in your opinion that this behavior may cause a bug?

OlivierBinette · 2022-12-18T11:05:38Z

I think the current behavior could cause a confusing bug for a user defining a custom check. If the run logic uses context.model.predict(dataset.data), this will work when a sklearn model is passed to the check but throw an error when y_pred is passed to the check.

So from a end-user perspective it might be more convenient for the dummy model to be able to predict from dataset.data. But I see the point that it's not an issue from a development perspective.

noamzbr · 2022-12-18T14:41:29Z

I think we still want to keep the current behavior (on the contrary - because it keeps us from assuming this about models), but we can solve your concern by making the error in this case more informative.
Closing this issue in place of this new one.

OlivierBinette added the bug label Dec 17, 2022

github-actions bot added the needs triage Issue needs to be labeled and prioritized label Dec 17, 2022

noamzbr mentioned this issue Dec 18, 2022

[DEE-6] [FEAT] Better error message when passing illegal columns to _DummyModel #2206

Open

noamzbr closed this as completed Dec 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] _DummyModel's predict function should be able to predict on `dataset.data` #2202

[BUG] _DummyModel's predict function should be able to predict on `dataset.data` #2202

OlivierBinette commented Dec 17, 2022

noamzbr commented Dec 18, 2022

OlivierBinette commented Dec 18, 2022

noamzbr commented Dec 18, 2022

[BUG] _DummyModel's predict function should be able to predict on dataset.data #2202

[BUG] _DummyModel's predict function should be able to predict on dataset.data #2202

Comments

OlivierBinette commented Dec 17, 2022

noamzbr commented Dec 18, 2022

OlivierBinette commented Dec 18, 2022

noamzbr commented Dec 18, 2022

[BUG] _DummyModel's predict function should be able to predict on `dataset.data` #2202

[BUG] _DummyModel's predict function should be able to predict on `dataset.data` #2202