A UX wrapper over CatBoost — readable errors, pre-flight data validation, ergonomic custom losses, sklearn pipeline compatibility, structured logging, lossless save/load, and exception-safe callbacks.
Not a fork. Not a replacement. A wrapper. Use it where it helps; mix freely with stock catboost.
pip install catboost-utils
# or, with sklearn pipeline support:
pip install "catboost-utils[sklearn]"Requires Python 3.10+ and CatBoost 1.2+.
from catboost_utils import CBXClassifier
model = CBXClassifier(
iterations=500,
auto_cat_features=True, # detects str/category/bool columns automatically
nan_fill="__NA__", # explicit handling of NaN in cat features (no magic)
early_stopping="auto", # enables sane defaults when eval_set is given
)
model.fit(X_train, y_train, eval_set=(X_val, y_val))isinstance(model, CatBoostClassifier) is still True. clone(), GridSearchCV, and pickle work out of the box.
Every module is independent. Use only what you need.
from catboost_utils import wrap, CBXError
m = wrap(CatBoostClassifier(iterations=10))
try:
m.fit(X, y) # X has a string column not declared in cat_features
except CBXError as e:
print(e.human_message) # e.g. "Feature 'city' (index 5) has invalid type ..."
print(e.hint) # e.g. "Convert float values and NaN to strings ..."wrap() swaps the model's class to a CBX-enhanced subclass — isinstance checks keep working, and pickle round-trips correctly.
from catboost_utils import validate
report = validate(X, y, cat_features=["city"])
report # in Jupyter: rich HTML table of issues + warnings
report.raise_if_failed() # raises ValidationError if any blocking issueCatches NaN-in-cat-features, inf, single-class targets, undeclared object columns, datetime columns, class-weights conflicts, and GPU/multi-thread non-determinism — before training crashes with a cryptic message.
import numpy as np
from catboost_utils.objectives import objective, metric
@objective(task="regression")
def my_huber(y_true: np.ndarray, y_pred: np.ndarray):
delta = 1.0
err = y_pred - y_true
is_small = np.abs(err) <= delta
grad = np.where(is_small, err, delta * np.sign(err))
hess = np.where(is_small, 1.0, 0.0)
return grad, hess
@metric(task="regression", name="MAE", higher_is_better=False)
def mae(y_true: np.ndarray, y_pred: np.ndarray) -> float:
return float(np.mean(np.abs(y_true - y_pred)))
model = CatBoostRegressor(loss_function=my_huber, eval_metric=mae)The decorator handles all CatBoost-isms (list-of-list approxes, sign convention, weights, sigmoid/softmax internal transform). Functions are JIT-compiled with numba; multiclass works at C-speed despite CatBoost's per-object API.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from catboost_utils import CBXRegressor
pipe = Pipeline([
("scaler", StandardScaler()),
("model", CBXRegressor(iterations=100)),
])
pipe.fit(X, y)Works inside Pipeline, GridSearchCV, cross_val_score, and clone().
import logging
from catboost_utils.logging import setup_logging, attach
setup_logging(level=logging.INFO, structured=False)
attach(model)
model.fit(X, y)
# INFO catboost_utils.training - iteration=10 learn_loss=0.423 test_loss=0.451 ...Use structured=True for JSON output. Each parsed line carries an cbx_iteration extra dict for downstream log processors.
from catboost_utils.explain import feature_importance, shap_values, check_early_stopping
fi = feature_importance(model, X) # sorted DataFrame with feature names
sv = shap_values(model, X) # DataFrame: features + expected_value
check_early_stopping(model, eval_set=eval_set) # raise CBXError if misconfiguredfrom catboost_utils.io import save, load
save(model, "artifact.cbm") # writes artifact.cbm + artifact.cbm.meta.json
restored = load("artifact.cbm") # restores best_iteration, feature_names, etc.The sidecar bundles best_iteration, feature_names, cat_features, class_names, training params, and version info. load() works without a sidecar (logs a warning) so external .cbm files keep loading.
from catboost_utils.callbacks import safe
cb = safe(my_callback)
model.fit(X, y, callbacks=[cb])
cb.raise_if_failed() # surfaces any exception your callback raised, with original traceback- Backwards compatible — anything that works in CatBoost works through
catboost_utils. - Opt-in — every module is independent. Use what you need; ignore the rest.
- No magic — no silent data transformations. Auto-fixes are always parameters the user passes explicitly (
nan_fill="...",auto_cat_features=True). - Strict typing — every public function fully annotated;
mypy --strictclean.
- Python: 3.10, 3.11, 3.12
- CatBoost: ≥ 1.2, < 2.0
- sklearn: 1.3+ (optional, only for the
pipelinemodule)
Pre-1.0 (0.x.y). Any minor bump may include breaking changes — see CHANGELOG.md. 1.0.0 will be cut once the public API is frozen and CI is green across the matrix.
Apache 2.0 — see LICENSE.