-
Notifications
You must be signed in to change notification settings - Fork 1.3k
nan_mode='Max' ignored for predictions #2850
Copy link
Copy link
Open
Description
If I pass option nan_mode='Max' and the data to which the model is fitted has no missing values, subsequent calls to .predict() on data with missing values will calculate predictions as if it had nan_mode='Min':
import numpy as np
from sklearn.datasets import make_regression
from catboost import CatBoostRegressor
X, y = make_regression(random_state=123)
xnan = np.repeat(np.nan, X.shape[1]).reshape((1,-1))
model_min = CatBoostRegressor(
depth=3,
iterations=5,
random_seed=123,
thread_count=1,
save_snapshot=False,
verbose=0,
nan_mode="Min",
).fit(X, y)
model_max = CatBoostRegressor(
depth=3,
iterations=5,
random_seed=123,
thread_count=1,
save_snapshot=False,
verbose=0,
nan_mode="Max",
).fit(X, y)
model_min.predict(xnan), model_max.predict(xnan)(array([-41.14545892]), array([-41.14545892]))
Since missing values are either 'Min' or 'Max', and there are no splits with missing values, results should be different in this case. It shouldn't be required to have missing values in the training data in order to handle missing values as 'Max' during predictions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels