Skip to content

nan_mode='Max' ignored for predictions #2850

@david-cortes

Description

@david-cortes

If I pass option nan_mode='Max' and the data to which the model is fitted has no missing values, subsequent calls to .predict() on data with missing values will calculate predictions as if it had nan_mode='Min':

import numpy as np
from sklearn.datasets import make_regression
from catboost import CatBoostRegressor

X, y = make_regression(random_state=123)
xnan = np.repeat(np.nan, X.shape[1]).reshape((1,-1))

model_min = CatBoostRegressor(
    depth=3,
    iterations=5,
    random_seed=123,
    thread_count=1,
    save_snapshot=False,
    verbose=0,
    nan_mode="Min",
).fit(X, y)
model_max = CatBoostRegressor(
    depth=3,
    iterations=5,
    random_seed=123,
    thread_count=1,
    save_snapshot=False,
    verbose=0,
    nan_mode="Max",
).fit(X, y)

model_min.predict(xnan), model_max.predict(xnan)
(array([-41.14545892]), array([-41.14545892]))

Since missing values are either 'Min' or 'Max', and there are no splits with missing values, results should be different in this case. It shouldn't be required to have missing values in the training data in order to handle missing values as 'Max' during predictions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions