CTBoost

CTBoost is a gradient boosting library built around Conditional Inference Trees, with a native C++17 core, Python bindings via pybind11, optional CUDA support for source builds, and an optional scikit-learn style API.

The current codebase supports end-to-end training and prediction for regression, classification, grouped ranking, and survival, plus pandas and SciPy sparse ingestion without dense expansion, richer ranking metadata (group_weight, subgroup_id, explicit pairs, and pairs_weight), baseline raw-score inputs, row weights and class imbalance controls, explicit missing-value handling, configurable validation metrics, stable JSON model persistence, standalone Python export for prepared numeric models, staged prediction, warm-start continuation, a native C++ feature pipeline for categorical/text/embedding transforms with thin Python wrappers, reusable prepared training-data bundles, and a built-in cross-validation helper.

Current Status

Language mix: Python + C++17, with optional CUDA
Python support: 3.8 through 3.14
Packaging: scikit-build-core
CI/CD: GitHub Actions for CMake validation and cibuildwheel release builds
Repository version: 0.1.39
Status: actively evolving native + Python package

What Works Today

Native gradient boosting backend exposed as ctboost._core
Pool abstraction for dense tabular data, SciPy sparse input, categorical feature indices, optional ranking/query metadata (group_id, group_weight, subgroup_id), explicit ranking pairs plus pairs_weight, and optional baseline raw scores
Native pandas DataFrame and Series support
Automatic categorical detection for pandas category and object columns
Regression training with ctboost.train(...), including raw array/DataFrame inputs plus optional preprocessing and external-memory staging
scikit-learn compatible CTBoostClassifier, CTBoostRegressor, and CTBoostRanker when scikit-learn is installed
Binary and multiclass classification
Grouped ranking with PairLogit, NDCG, MAP, MRR, explicit ranking pairs, subgroup-aware auto-pair generation, and query-level weights
Row weights through Pool(..., weight=...) and sample_weight on sklearn estimators
Class imbalance controls through class_weight, class_weights, auto_class_weights="balanced", and scale_pos_weight
Explicit missing-value handling through nan_mode
Quantization controls through max_bins, max_bin_by_feature, border_selection_method, feature_borders, and nan_mode_by_feature
Row subsampling through subsample plus bootstrap_type="No"|"Bernoulli"|"Poisson"
Bayesian bagging through bootstrap_type="Bayesian" plus bagging_temperature
boosting_type="RandomForest" on top of the existing conditional-inference tree learner
boosting_type="DART" with dropout-style tree normalization on top of the existing conditional-inference tree learner
Monotonic constraints through monotone_constraints
Path-level interaction constraints through interaction_constraints
Additional generic regularization and tree-growth controls through feature_weights, first_feature_use_penalties, random_strength, grow_policy, min_samples_split, and max_leaf_weight
GPU tree growth now also supports monotonic constraints, interaction constraints, feature_weights, first_feature_use_penalties, random_strength, and grow_policy="LeafWise" without replacing the conditional-inference split gate
Survival objectives: Cox, SurvivalExponential
Survival evaluation through CIndex
Early stopping with eval_set, eval_names, early_stopping_rounds, early_stopping_metric, and early_stopping_name
Single- and multi-watchlist evaluation through one or many eval_set entries
Single- and multi-metric evaluation through string or sequence eval_metric values
Per-iteration callback hooks through callbacks, plus built-in ctboost.log_evaluation(...) and ctboost.checkpoint_callback(...)
Validation loss/metric history and evals_result_
Per-iteration prediction through staged prediction and num_iteration
Stable JSON and pickle model persistence for low-level boosters and scikit-learn style estimators
Cross-validation with ctboost.cv(...) when scikit-learn is installed
Regression objectives: RMSE, MAE, Huber, Quantile, Poisson, Tweedie
Generic eval metrics including RMSE, MAE, Poisson, Tweedie, Accuracy, BalancedAccuracy, Precision, Recall, F1, AUC, NDCG, MAP, MRR, and CIndex
Native ctboost.FeaturePipeline logic in _core.NativeFeaturePipeline, with low-level and sklearn integration for ordered CTRs, frequency-style CTRs, categorical crosses, low-cardinality one-hot expansion, rare-category bucketing, text hashing, and embedding-stat expansion
Generic categorical controls around the existing conditional tree learner: one_hot_max_size / max_cat_to_onehot, max_cat_threshold, simple_ctr, combinations_ctr, and per_feature_ctr
ctboost.prepare_pool(...) for low-level raw-data preparation, optional feature-pipeline fitting, and disk-backed external-memory pool staging
ctboost.prepare_training_data(...) plus PreparedTrainingData for one-time raw train/eval preparation that can be reused across repeated fits
Native CPU out-of-core fit through ctboost.train(..., external_memory=True), which now spills quantized feature-bin columns to disk instead of keeping the full histogram matrix resident in RAM
Multi-host distributed training through distributed_world_size, distributed_rank, distributed_root, and distributed_run_id, with a native per-node histogram reduction path and a TCP collective backend available through distributed_root="tcp://host:port"
Distributed eval_set, multi-watchlist or multi-metric evaluation, callbacks, early_stopping_rounds, init_model, grouped ranking shards, and sklearn-estimator wrappers on the TCP collective backend
Filesystem-backed distributed runs now also fall back to a rank-0 coordinator path for advanced eval, callback, ranking, and GPU compatibility flows when TCP is not configured
Distributed GPU training when CUDA is available and distributed_root uses the TCP collective backend
Distributed raw-data feature-pipeline fitting across ranks for native categorical, text, and embedding preprocessing
Feature importance reporting
Leaf-index introspection and path-based prediction contributions
Continued training through init_model and estimator warm_start
Standalone pure-Python deployment export through Booster.export_model(..., export_format="python") and matching sklearn-estimator wrappers for numeric or already-prepared features
Build metadata reporting through ctboost.build_info()
CPU builds on standard CI runners
Optional CUDA compilation when building from source with a suitable toolkit
GPU source builds now keep fit-scoped histogram data resident on device, support shared-memory histogram accumulation, and expose GPU raw-score prediction for regression, binary classification, and multiclass models
Histogram building now writes directly into final-width compact storage when the fitted schema permits <=256 bins, avoiding the old transient uint16 -> uint8 duplication spike
Fitted models now store quantization metadata once per booster instead of duplicating the same schema in every tree
Low-level boosters can export reusable fitted borders through Booster.get_borders() and expose the full shared quantization schema through Booster.get_quantization_schema()
GPU fit now drops the host training histogram bin matrix immediately after the device histogram workspace has been created and warm-start predictions have been seeded
GPU tree building now uses histogram subtraction in the device path as well, so only one child histogram is built explicitly after each split
GPU node search now keeps best-feature selection on device and returns a compact winner instead of copying the full per-feature search buffer back to host each node
Training can emit native histogram/tree timing via verbose=True or CTBOOST_PROFILE=1

Current Limitations

Ordered CTRs, frequency-style CTRs, categorical crosses, low-cardinality one-hot expansion, rare-category bucketing, text hashing, and embedding expansion now run through a native C++ pipeline, while pandas extraction, raw-data routing, and Pool orchestration remain thin Python glue; ctboost.prepare_training_data(...) reduces that repeated Python work when you need to fit multiple times on the same raw train/eval split
There is now a native sparse training path plus disk-backed quantized-bin staging through ctboost.train(..., external_memory=True) on both CPU and GPU, and distributed training can also use a standalone TCP collective coordinator through distributed_root="tcp://host:port"
The legacy filesystem-based distributed path still exists for the native shard-reduction path; advanced eval, callback, ranking, and GPU compatibility workflows now fall back to a rank-0 coordinator path, while the TCP backend remains the true multi-rank path for those features
Distributed grouped/ranking training requires each group_id to live entirely on one worker shard; cross-rank query groups are rejected
Distributed multi-host training does not yet accept group_weight, subgroup_id, or explicit pairs / pairs_weight metadata on rank-local pools
Dedicated GPU wheel automation now targets Linux x86_64 and Windows amd64 CPython 3.10 through 3.14 release assets
CUDA wheel builds in CI depend on container-side toolkit provisioning

Resolved Fold-Memory Hotspots

The older v0.1.15 GPU fit-memory bottleneck list is now closed in the current tree:

Quantization metadata is stored once per fitted booster and shared by all trees instead of being duplicated per tree
GPU fit releases the host training histogram bin matrix immediately after device workspace creation and warm-start seeding
GPU tree growth uses histogram subtraction, so only one child histogram is built explicitly after a split
GPU split search keeps best-feature selection on device and copies back only the winning feature summary

That means the old per-node GPU bin-materialization issue is no longer the main resident-memory problem in the current codebase. The remaining generic backlog is now in broader distributed runtime ergonomics and additional export or deployment tooling.

Benchmark Snapshot

The heavy ordered-target-encoding playground-series-s6e4 replay was last measured on April 12, 2026 with the v0.1.11 source tree. The one-fold Kaggle source-build replay completed successfully with:

build 55.41s
fold preprocess 57.17s
fold fit 2107.10s
fold predict 5.89s
fold total 2170.17s
validation score 0.973213

Since that replay, the source tree has removed additional fit-memory overhead by sharing quantization schema per model, building compact train bins without a second host copy, releasing host train-bin storage after GPU upload, and adding GPU histogram subtraction plus device-side best-feature reduction.

Installation

For local development or source builds:

pip install .

Install development dependencies:

pip install -e .[dev]

Install the optional scikit-learn wrappers and ctboost.cv(...) support:

pip install -e .[sklearn]

Wheels vs Source Builds

pip install ctboost works without a compiler only when PyPI has a prebuilt wheel for your exact Python/OS tag. If no matching wheel exists, pip falls back to the source distribution and has to compile the native extension locally.

The release workflow is configured to publish CPU wheels for current CPython releases on Windows and Linux, plus macOS x86_64 CPU wheels for CPython 3.10 through 3.14, so standard pip install ctboost usage does not depend on a local compiler.

Each tagged GitHub release also attaches the CPU wheels, the source distribution, and dedicated Linux x86_64 plus Windows amd64 CUDA wheels for CPython 3.10 through 3.14. The GPU wheel filenames carry a 1gpu build tag so the release can publish CPU and GPU artifacts for the same Python and platform tags without filename collisions.

The GPU release jobs install the CUDA toolkit in CI, export the toolkit paths into the build environment, and set CTBOOST_REQUIRE_CUDA=ON so the wheel build fails instead of silently degrading to a CPU-only artifact. The release smoke test also checks that ctboost.build_info()["cuda_enabled"] is True before the GPU wheel is uploaded.

Kaggle GPU Install

pip install ctboost still resolves to the CPU wheel on PyPI. On Kaggle, install the matching GPU release wheel from GitHub instead:

import json
import subprocess
import sys
import urllib.request

tag = "v0.1.39"
py_tag = f"cp{sys.version_info.major}{sys.version_info.minor}"
api_url = f"https://api.github.com/repos/captnmarkus/ctboost/releases/tags/{tag}"

with urllib.request.urlopen(api_url) as response:
    release = json.load(response)

asset = next(
    item
    for item in release["assets"]
    if item["name"].endswith(".whl") and f"-1gpu-{py_tag}-{py_tag}-" in item["name"]
)

subprocess.check_call(
    [sys.executable, "-m", "pip", "install", "-U", asset["browser_download_url"]]
)

After installation, confirm the wheel really contains CUDA support:

import ctboost

info = ctboost.build_info()
if not info["cuda_enabled"]:
    raise RuntimeError(f"Expected a CUDA-enabled CTBoost wheel, got: {info}")
print(info)

CPU-Only Source Build

To force a CPU-only native build:

CMAKE_ARGS="-DCTBOOST_ENABLE_CUDA=OFF" pip install .

On PowerShell:

$env:CMAKE_ARGS="-DCTBOOST_ENABLE_CUDA=OFF"
pip install .

Windows source builds require a working C++ toolchain. In practice that means Visual Studio Build Tools 2022 or a compatible MSVC environment, plus CMake. ninja is recommended, but it does not replace the compiler itself.

CUDA Source Build

CTBoost can compile a CUDA backend when the CUDA toolkit and compiler are available. CUDA is enabled by default in CMake, but the build automatically falls back to CPU-only when no toolkit is detected.

pip install .

You can inspect the compiled package after installation:

import ctboost
print(ctboost.build_info())

Quick Start

scikit-learn Style Classification

import pandas as pd
from sklearn.datasets import make_classification

from ctboost import CTBoostClassifier

X, y = make_classification(
    n_samples=256,
    n_features=8,
    n_informative=5,
    n_redundant=0,
    random_state=13,
).astype("float32")
X = pd.DataFrame(X, columns=[f"f{i}" for i in range(X.shape[1])])
X["segment"] = pd.Categorical(["a" if i % 2 == 0 else "b" for i in range(len(X))])
y = y.astype("float32")

model = CTBoostClassifier(
    iterations=256,
    learning_rate=0.1,
    max_depth=3,
    alpha=1.0,
    lambda_l2=1.0,
    task_type="CPU",
)

model.fit(
    X.iloc[:200],
    y[:200],
    eval_set=[(X.iloc[200:], y[200:])],
    early_stopping_rounds=20,
)
proba = model.predict_proba(X)
pred = model.predict(X)
importance = model.feature_importances_
best_iteration = model.best_iteration_

Low-Level Training API

import numpy as np

import ctboost

X = np.array([[0.0, 1.0], [1.0, 0.0], [0.5, 0.5]], dtype=np.float32)
y = np.array([0.0, 1.0, 0.5], dtype=np.float32)

pool = ctboost.Pool(X, y)
booster = ctboost.train(
    pool,
    {
        "objective": "Huber",
        "learning_rate": 0.2,
        "max_depth": 2,
        "alpha": 1.0,
        "lambda_l2": 1.0,
        "max_bins": 64,
        "huber_delta": 1.5,
        "eval_metric": "MAE",
        "nan_mode": "Min",
        "task_type": "CPU",
    },
    num_boost_round=10,
)

predictions = booster.predict(pool)
loss_history = booster.loss_history
eval_loss_history = booster.eval_loss_history
exported_borders = booster.get_borders()

Per-feature quantization controls are available on the same low-level API:

booster = ctboost.train(
    pool,
    {
        "objective": "RMSE",
        "learning_rate": 0.1,
        "max_depth": 3,
        "alpha": 1.0,
        "lambda_l2": 1.0,
        "max_bins": 128,
        "max_bin_by_feature": {0: 16, 1: 8},
        "border_selection_method": "Uniform",
        "feature_borders": {1: [-0.5, 0.0, 0.5]},
        "nan_mode_by_feature": {0: "Max"},
    },
    num_boost_round=32,
)

feature_borders lets selected numeric features reuse explicit cut values, max_bin_by_feature overrides the global max_bins budget per column, border_selection_method currently supports Quantile and Uniform, and Booster.get_borders() returns an importable border bundle keyed by fitted feature index.

The same low-level API also exposes generic regularization and growth controls around the existing conditional tree learner:

booster = ctboost.train(
    pool,
    {
        "objective": "RMSE",
        "learning_rate": 0.1,
        "max_depth": 4,
        "alpha": 1.0,
        "lambda_l2": 1.0,
        "bootstrap_type": "Bayesian",
        "bagging_temperature": 1.0,
        "feature_weights": {0: 2.0, 3: 0.5},
        "first_feature_use_penalties": {2: 1.5},
        "random_strength": 0.2,
        "grow_policy": "LeafWise",
        "max_leaves": 16,
        "min_samples_split": 8,
        "max_leaf_weight": 2.0,
    },
    num_boost_round=64,
)

feature_weights rescales feature preference without replacing the conditional test, first_feature_use_penalties discourages the first use of selected features at the model level, random_strength adds seeded noise to break near-ties in split gain after the conditional gate has already accepted a candidate, and grow_policy="LeafWise" currently means a best-child-first heuristic under the existing max_leaves budget rather than a separate split criterion.

The same low-level API can now prepare raw categorical/text/embedding inputs directly:

import numpy as np
import ctboost

X = np.empty((4, 4), dtype=object)
X[:, 0] = ["berlin", "paris", "berlin", "rome"]
X[:, 1] = [1.0, 2.0, 1.5, 3.0]
X[:, 2] = ["red fox", "blue fox", "red hare", "green fox"]
X[:, 3] = [
    np.array([0.1, 0.4, 0.2], dtype=np.float32),
    np.array([0.7, 0.1, 0.3], dtype=np.float32),
    np.array([0.2, 0.5, 0.6], dtype=np.float32),
    np.array([0.9, 0.2, 0.4], dtype=np.float32),
]
y = np.array([0.5, 1.2, 0.7, 1.6], dtype=np.float32)

booster = ctboost.train(
    X,
    {
        "objective": "RMSE",
        "learning_rate": 0.1,
        "max_depth": 3,
        "alpha": 1.0,
        "lambda_l2": 1.0,
        "ordered_ctr": True,
        "one_hot_max_size": 4,
        "max_cat_threshold": 16,
        "cat_features": [0],
        "simple_ctr": ["Mean", "Frequency"],
        "per_feature_ctr": {0: ["Mean"]},
        "text_features": [2],
        "embedding_features": [3],
    },
    label=y,
    num_boost_round=32,
)

raw_predictions = booster.predict(X)

If you want to reuse the raw-data preparation work across repeated fits on the same split, prepare it once and then train against the prepared bundle:

prepared = ctboost.prepare_training_data(
    X_train,
    {
        "objective": "RMSE",
        "ordered_ctr": True,
        "cat_features": [0],
        "text_features": [2],
    },
    label=y_train,
    eval_set=[(X_valid, y_valid)],
    eval_names=["holdout"],
)

booster = ctboost.train(
    prepared,
    {
        "objective": "RMSE",
        "learning_rate": 0.1,
        "max_depth": 3,
        "alpha": 1.0,
        "lambda_l2": 1.0,
        "ordered_ctr": True,
        "cat_features": [0],
        "text_features": [2],
    },
    num_boost_round=64,
    early_stopping_rounds=10,
)

For disk-backed pool staging on large folds:

pool = ctboost.prepare_pool(
    X_numeric,
    y,
    external_memory=True,
    external_memory_dir="ctboost-cache",
)

booster = ctboost.train(
    X_numeric,
    {
        "objective": "RMSE",
        "learning_rate": 0.1,
        "max_depth": 3,
        "alpha": 1.0,
        "lambda_l2": 1.0,
        "external_memory": True,
        "external_memory_dir": "ctboost-cache",
    },
    label=y,
    num_boost_round=64,
)

Ranking pools can also carry richer query metadata without changing the underlying conditional tree learner:

group_id = np.array([0, 0, 0, 1, 1, 1], dtype=np.int64)
subgroup_id = np.array([0, 0, 1, 0, 1, 2], dtype=np.int64)
group_weight = np.array([2.0, 2.0, 2.0, 1.0, 1.0, 1.0], dtype=np.float32)
pairs = np.array([[0, 2], [3, 5]], dtype=np.int64)
pairs_weight = np.array([1.5, 0.5], dtype=np.float32)
baseline = np.zeros(group_id.shape[0], dtype=np.float32)

rank_pool = ctboost.Pool(
    X_rank,
    y_rank,
    group_id=group_id,
    group_weight=group_weight,
    subgroup_id=subgroup_id,
    pairs=pairs,
    pairs_weight=pairs_weight,
    baseline=baseline,
)

Working With Categorical Features

Categorical columns can still be marked manually through the Pool API:

import numpy as np
import ctboost

X = np.array([[0.0], [1.0], [2.0], [3.0]], dtype=np.float32)
y = np.array([1.0, 0.0, 1.0, 0.0], dtype=np.float32)

pool = ctboost.Pool(X, y, cat_features=[0])

For pandas inputs, categorical/object columns are detected automatically:

import pandas as pd
import ctboost

frame = pd.DataFrame(
    {
        "value": [1.0, 2.0, 3.0, 4.0],
        "city": pd.Categorical(["berlin", "paris", "berlin", "rome"]),
        "segment": ["retail", "enterprise", "retail", "enterprise"],
    }
)
label = pd.Series([0.0, 1.0, 0.0, 1.0], dtype="float32")

pool = ctboost.Pool(frame, label)
assert pool.cat_features == [1, 2]

For estimator-side ordered CTRs, categorical crosses, one-hot expansion, rare-category bucketing, text hashing, and embedding expansion, use the Python feature pipeline parameters:

import numpy as np
import pandas as pd

from ctboost import CTBoostRegressor

frame = pd.DataFrame(
    {
        "city": ["berlin", "paris", "berlin", "rome"],
        "headline": ["red fox", "blue fox", "red hare", "green fox"],
        "embedding": [
            np.array([0.1, 0.4, 0.2], dtype=np.float32),
            np.array([0.7, 0.1, 0.3], dtype=np.float32),
            np.array([0.2, 0.5, 0.6], dtype=np.float32),
            np.array([0.9, 0.2, 0.4], dtype=np.float32),
        ],
        "value": [1.0, 2.0, 1.5, 3.0],
    }
)
label = np.array([0.5, 1.2, 0.7, 1.6], dtype=np.float32)

model = CTBoostRegressor(
    iterations=32,
    learning_rate=0.1,
    max_depth=3,
    ordered_ctr=True,
    one_hot_max_size=8,
    max_cat_threshold=32,
    cat_features=["city"],
    categorical_combinations=[["city", "headline"]],
    simple_ctr=["Mean", "Frequency"],
    per_feature_ctr={"city": ["Mean"]},
    text_features=["headline"],
    embedding_features=["embedding"],
)
model.fit(frame, label)

one_hot_max_size keeps low-cardinality categoricals as explicit indicator columns, max_cat_threshold buckets higher-cardinality levels down to a capped native categorical domain before the conditional tree learner sees them, and per_feature_ctr lets specific base features or categorical combinations opt into CTR generation without changing the underlying conditional split logic.

Model Persistence, Warm Start, And Cross-Validation

import ctboost

booster.save_model("regression-model.json")
restored = ctboost.load_model("regression-model.json")
restored_predictions = restored.predict(pool)
booster.export_model("standalone_predictor.py", export_format="python")

continued = ctboost.train(
    pool,
    {"objective": "RMSE", "learning_rate": 0.2, "max_depth": 2, "alpha": 1.0, "lambda_l2": 1.0},
    num_boost_round=10,
    init_model=restored,
)

cv_result = ctboost.cv(
    pool,
    {
        "objective": "RMSE",
        "learning_rate": 0.2,
        "max_depth": 2,
        "alpha": 1.0,
        "lambda_l2": 1.0,
    },
    num_boost_round=25,
    nfold=3,
)

The scikit-learn compatible estimators also expose:

save_model(...)
export_model(..., export_format="python") for standalone numeric or already-prepared deployment scoring
load_model(...)
staged_predict(...)
staged_predict_proba(...) for classifiers
predict_leaf_index(...)
predict_contrib(...)
evals_result_
best_score_
sample_weight and baseline on fit(...)
group_weight, subgroup_id, pairs, and pairs_weight on CTBoostRanker.fit(...)
class_weight, scale_pos_weight, eval_metric, nan_mode, nan_mode_by_feature, and warm_start
max_bins, max_bin_by_feature, border_selection_method, and feature_borders
bagging_temperature, feature_weights, first_feature_use_penalties, random_strength, grow_policy, min_samples_split, and max_leaf_weight

Public Python API

The main entry points are:

ctboost.Pool
ctboost.FeaturePipeline
ctboost.PreparedTrainingData
ctboost.prepare_pool
ctboost.prepare_training_data
ctboost.train
ctboost.cv
ctboost.Booster
ctboost.CTBoostClassifier
ctboost.CTBoostRanker
ctboost.CTBoostRegressor
ctboost.CBoostClassifier
ctboost.CBoostRanker
ctboost.CBoostRegressor
ctboost.build_info
ctboost.load_model

Build and Test

Run the test suite:

pytest tests

The latest local release-candidate validation on April 18, 2026 was:

python -m pytest tests/test_data_and_loss.py tests/test_ranking.py tests/test_explainability_and_warm_start.py -q
python -m pytest tests/test_sklearn.py tests/test_metrics_and_objectives.py tests/test_persistence_and_cv.py -q --basetemp=.pytest-tmp

Build an sdist:

python -m build --sdist

Configure and build the native extension directly with CMake:

python -m pip install pybind11 numpy pandas scikit-learn pytest
cmake -S . -B build -DCTBOOST_ENABLE_CUDA=OFF -Dpybind11_DIR="$(python -m pybind11 --cmakedir)"
cmake --build build --config Release --parallel

Wheel builds are configured through cibuildwheel for:

Windows amd64
Linux x86_64 and aarch64 using the current manylinux baseline
macOS x86_64
CPython 3.8, 3.9, 3.10, 3.11, 3.12, 3.13, and 3.14

GitHub Actions workflows:

.github/workflows/cmake.yml: configures, builds, installs, and tests CPU builds on Ubuntu, Windows, and macOS for pushes and pull requests
.github/workflows/publish.yml: builds release wheels and the sdist, runs wheel smoke tests on built artifacts, publishes CPU wheels to PyPI, and attaches both CPU and Linux/Windows GPU wheels to tagged GitHub releases

The standard PyPI release wheel workflow builds CPU-only wheels by setting:

cmake.define.CTBOOST_ENABLE_CUDA=OFF

The GPU release-wheel matrices enable CUDA separately with:

cmake.define.CTBOOST_ENABLE_CUDA=ON
cmake.define.CTBOOST_REQUIRE_CUDA=ON
cmake.define.CMAKE_CUDA_COMPILER=/usr/local/cuda-12.0/bin/nvcc
cmake.define.CUDAToolkit_ROOT=/usr/local/cuda-12.0
cmake.define.CMAKE_CUDA_ARCHITECTURES=60;70;75;80;86;89
wheel.build-tag=1gpu

Project Layout

ctboost/      Python API layer
include/      public C++ headers
src/core/     core boosting, objectives, trees, statistics
src/bindings/ pybind11 extension bindings
cuda/         optional CUDA backend
tests/        Python test suite

License

Apache 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.github/workflows		.github/workflows
build-ci-312		build-ci-312
build-manual		build-manual
ctboost		ctboost
cuda		cuda
include/ctboost		include/ctboost
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
agent.md		agent.md
cmake-uninstall.log		cmake-uninstall.log
pyproject.toml		pyproject.toml
run_kaggle_kernel_session.py		run_kaggle_kernel_session.py
run_kaggle_yx_benchmark_session.py		run_kaggle_yx_benchmark_session.py
vsdevcmd.trace.txt		vsdevcmd.trace.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CTBoost

Current Status

What Works Today

Current Limitations

Resolved Fold-Memory Hotspots

Benchmark Snapshot

Installation

Wheels vs Source Builds

Kaggle GPU Install

CPU-Only Source Build

CUDA Source Build

Quick Start

scikit-learn Style Classification

Low-Level Training API

Working With Categorical Features

Model Persistence, Warm Start, And Cross-Validation

Public Python API

Build and Test

Project Layout

License

About

Uh oh!

Releases 16

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

CTBoost

Current Status

What Works Today

Current Limitations

Resolved Fold-Memory Hotspots

Benchmark Snapshot

Installation

Wheels vs Source Builds

Kaggle GPU Install

CPU-Only Source Build

CUDA Source Build

Quick Start

scikit-learn Style Classification

Low-Level Training API

Working With Categorical Features

Model Persistence, Warm Start, And Cross-Validation

Public Python API

Build and Test

Project Layout

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 16

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages