catboost_utils

A UX wrapper over CatBoost — readable errors, pre-flight data validation, ergonomic custom losses, sklearn pipeline compatibility, structured logging, lossless save/load, and exception-safe callbacks.

Not a fork. Not a replacement. A wrapper. Use it where it helps; mix freely with stock catboost.

Install

pip install catboost-utils
# or, with sklearn pipeline support:
pip install "catboost-utils[sklearn]"

Requires Python 3.10+ and CatBoost 1.2+.

Quick start

from catboost_utils import CBXClassifier

model = CBXClassifier(
    iterations=500,
    auto_cat_features=True,        # detects str/category/bool columns automatically
    nan_fill="__NA__",             # explicit handling of NaN in cat features (no magic)
    early_stopping="auto",         # enables sane defaults when eval_set is given
)
model.fit(X_train, y_train, eval_set=(X_val, y_val))

isinstance(model, CatBoostClassifier) is still True. clone(), GridSearchCV, and pickle work out of the box.

What's in the box

Every module is independent. Use only what you need.

`errors` — readable error messages

from catboost_utils import wrap, CBXError

m = wrap(CatBoostClassifier(iterations=10))
try:
    m.fit(X, y)   # X has a string column not declared in cat_features
except CBXError as e:
    print(e.human_message)  # e.g. "Feature 'city' (index 5) has invalid type ..."
    print(e.hint)           # e.g. "Convert float values and NaN to strings ..."

wrap() swaps the model's class to a CBX-enhanced subclass — isinstance checks keep working, and pickle round-trips correctly.

`validation` — pre-flight checks

from catboost_utils import validate

report = validate(X, y, cat_features=["city"])
report                  # in Jupyter: rich HTML table of issues + warnings
report.raise_if_failed()  # raises ValidationError if any blocking issue

Catches NaN-in-cat-features, inf, single-class targets, undeclared object columns, datetime columns, class-weights conflicts, and GPU/multi-thread non-determinism — before training crashes with a cryptic message.

`objectives` — custom losses, numba-jit'ed

import numpy as np
from catboost_utils.objectives import objective, metric

@objective(task="regression")
def my_huber(y_true: np.ndarray, y_pred: np.ndarray):
    delta = 1.0
    err = y_pred - y_true
    is_small = np.abs(err) <= delta
    grad = np.where(is_small, err, delta * np.sign(err))
    hess = np.where(is_small, 1.0, 0.0)
    return grad, hess

@metric(task="regression", name="MAE", higher_is_better=False)
def mae(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    return float(np.mean(np.abs(y_true - y_pred)))

model = CatBoostRegressor(loss_function=my_huber, eval_metric=mae)

The decorator handles all CatBoost-isms (list-of-list approxes, sign convention, weights, sigmoid/softmax internal transform). Functions are JIT-compiled with numba; multiclass works at C-speed despite CatBoost's per-object API.

`pipeline` — sklearn-friendly classes

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from catboost_utils import CBXRegressor

pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("model", CBXRegressor(iterations=100)),
])
pipe.fit(X, y)

Works inside Pipeline, GridSearchCV, cross_val_score, and clone().

`logging` — structured training output

import logging
from catboost_utils.logging import setup_logging, attach

setup_logging(level=logging.INFO, structured=False)
attach(model)
model.fit(X, y)
# INFO catboost_utils.training - iteration=10 learn_loss=0.423 test_loss=0.451 ...

Use structured=True for JSON output. Each parsed line carries an cbx_iteration extra dict for downstream log processors.

`explain` — feature importance + SHAP, named DataFrames

from catboost_utils.explain import feature_importance, shap_values, check_early_stopping

fi = feature_importance(model, X)            # sorted DataFrame with feature names
sv = shap_values(model, X)                   # DataFrame: features + expected_value
check_early_stopping(model, eval_set=eval_set)  # raise CBXError if misconfigured

`io` — lossless save/load

from catboost_utils.io import save, load

save(model, "artifact.cbm")              # writes artifact.cbm + artifact.cbm.meta.json
restored = load("artifact.cbm")          # restores best_iteration, feature_names, etc.

The sidecar bundles best_iteration, feature_names, cat_features, class_names, training params, and version info. load() works without a sidecar (logs a warning) so external .cbm files keep loading.

`callbacks` — exception-safe wrapper

from catboost_utils.callbacks import safe

cb = safe(my_callback)
model.fit(X, y, callbacks=[cb])
cb.raise_if_failed()   # surfaces any exception your callback raised, with original traceback

Principles

Backwards compatible — anything that works in CatBoost works through catboost_utils.
Opt-in — every module is independent. Use what you need; ignore the rest.
No magic — no silent data transformations. Auto-fixes are always parameters the user passes explicitly (nan_fill="...", auto_cat_features=True).
Strict typing — every public function fully annotated; mypy --strict clean.

Compatibility

Python: 3.10, 3.11, 3.12
CatBoost: ≥ 1.2, < 2.0
sklearn: 1.3+ (optional, only for the pipeline module)

Versioning

Pre-1.0 (0.x.y). Any minor bump may include breaking changes — see CHANGELOG.md. 1.0.0 will be cut once the public API is frozen and CI is green across the matrix.

License

Apache 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
docs		docs
src/catboost_utils		src/catboost_utils
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
ruff.toml		ruff.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

catboost_utils

Install

Quick start

What's in the box

`errors` — readable error messages

`validation` — pre-flight checks

`objectives` — custom losses, numba-jit'ed

`pipeline` — sklearn-friendly classes

`logging` — structured training output

`explain` — feature importance + SHAP, named DataFrames

`io` — lossless save/load

`callbacks` — exception-safe wrapper

Principles

Compatibility

Versioning

License

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

catboost_utils

Install

Quick start

What's in the box

errors — readable error messages

validation — pre-flight checks

objectives — custom losses, numba-jit'ed

pipeline — sklearn-friendly classes

logging — structured training output

explain — feature importance + SHAP, named DataFrames

io — lossless save/load

callbacks — exception-safe wrapper

Principles

Compatibility

Versioning

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages

`errors` — readable error messages

`validation` — pre-flight checks

`objectives` — custom losses, numba-jit'ed

`pipeline` — sklearn-friendly classes

`logging` — structured training output

`explain` — feature importance + SHAP, named DataFrames

`io` — lossless save/load

`callbacks` — exception-safe wrapper