# RAPIDS cuML
## Performance, Boundaries, and Correctness Benchmarks

**Description:** This notebook provides a simple and unified means of benchmarking single GPU cuML algorithms against their skLearn counterparts with the `cuml.benchmark` package in RAPIDS cuML. This enables quick and simple measurements of performance, validation of correctness, and investigation of upper bounds.

Each benchmark returns a Pandas `DataFrame` with the results. At the end of the notebook, these results are used to draw charts and output to a CSV file.

Please refer to the [table of contents](#table_of_contents) for algorithms available to be benchmarked with this notebook.

In [None]:
import cuml
import pandas as pd

from cuml.benchmark.runners import SpeedupComparisonRunner
from cuml.benchmark.algorithms import algorithm_by_name

import warnings
warnings.filterwarnings('ignore', 'Expected column ')

print(cuml.__version__)

25.02.01


In [None]:
N_REPS = 1  # Number of times each test is repeated

DATA_NEIGHBORHOODS = "blobs"
DATA_CLASSIFICATION = "classification"
DATA_REGRESSION = "regression"

INPUT_TYPE = "numpy"

benchmark_results = []

In [None]:
SMALL_ROW_SIZES = [2**x for x in range(14, 20)]
LARGE_ROW_SIZES = [2**x for x in range(18, 24, 2)]

SKINNY_FEATURES = [64, 512]
WIDE_FEATURES = [1000, 10000]

VERBOSE=True
RUN_CPU=True

In [None]:
def enrich_result(algorithm, runner, result):
    result["algo"] = algorithm
    result["dataset_name"] = runner.dataset_name
    result["input_type"] = runner.input_type
    return result

def execute_benchmark(algorithm, runner, verbose=VERBOSE, run_cpu=RUN_CPU, **kwargs):
    results = runner.run(algorithm_by_name(algorithm), verbose=verbose, run_cpu=run_cpu, **kwargs)
    results = [enrich_result(algorithm, runner, result) for result in results]
    benchmark_results.extend(results)

## Table of Contents<a id="table_of_contents"/>

### Benchmarks
1. [Neighbors](#neighbors)<br>
    1.1 [Nearest Neighbors - Brute Force](#nn_bruteforce)<br>
    1.2 [KNeighborsClassifier](#kneighborsclassifier)<br>
    1.3 [KNeighborsRegressor](#kneighborsregressor)<br>
2. [Clustering](#clustering)<br>
    2.1 [DBSCAN - Brute Force](#dbscan_bruteforce)<br>
    2.2 [K-Means](#kmeans)<br>
3. [Manifold Learning](#manifold_learning)<br>
    3.1 [UMAP - Unsupervised](#umap_unsupervised)<br>
    3.2 [UMAP - Supervised](#umap_supervised)<br>
    3.3 [T-SNE](#tsne)<br>
4. [Linear Models](#linear_models)<br>
    4.1 [Linear Regression](#linear_regression)<br>
    4.2 [Logistic Regression](#logistic_regression)<br>
    4.3 [Ridge Regression](#ridge_regression)<br>
    4.4 [Lasso Regression](#lasso_regression)<br>
    4.5 [ElasticNet Regression](#elasticnet_regression)<br>
    4.6 [Mini-batch SGD Classifier](#minibatch_sgd_classifier)<br>
5. [Decomposition](#decomposition)<br>
    5.1 [PCA](#pca)<br>
    5.2 [Truncated SVD](#truncated_svd)<br>
6. [Ensemble](#ensemble)<br>
    6.1 [Random Forest Classifier](#random_forest_classifier)<br>
    6.2 [Random Forest Regressor](#random_forest_regressor)<br>
    6.3 [FIL](#fil)<br>
    6.4 [Sparse FIL](#sparse_fil)<br>
7. [Random Projection](#random_projection)<br>
    7.1 [Gaussian Random Projection](#gaussian_random_projection)<br>
    7.2 [Sparse Random Projection](#sparse_random_projection)<br>
8. [SVM](#svm)<br>
    8.1 [SVC - Linear Kernel](#svc_linear_kernel)<br>
    8.2 [SVC - RBF Kernel](#svc_rbf_kernel)<br>
    8.3 [SVR - Linear Kernel](#svr_linear_kernel)<br>
    8.4 [SVR - RBF Kernel](#svr_rbf_kernel)<br>
    
### Chart & Store Results
9. [Convert to Pandas DataFrame](#convert_to_pandas)<br>
10. [Chart Results](#chart_results)<br>
11. [Output to CSV](#output_csv)<br>

## Neighbors<a id="neighbors"/>


### Nearest Neighbors - Brute Force<a id="nn_bruteforce"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_NEIGHBORHOODS,
    input_type=INPUT_TYPE,
    n_reps=N_REPS,
)

execute_benchmark("NearestNeighbors", runner)

NearestNeighbors (n_samples=16384, n_features=64) [cpu=2.5525217056274414, gpu=0.14844679832458496, speedup=17.194858592007144]
NearestNeighbors (n_samples=16384, n_features=512) [cpu=7.930440902709961, gpu=0.3864150047302246, speedup=20.523118423536875]
NearestNeighbors (n_samples=32768, n_features=64) [cpu=8.843840599060059, gpu=0.3380286693572998, speedup=26.162989712899254]
NearestNeighbors (n_samples=32768, n_features=512) [cpu=30.873115301132202, gpu=1.2724535465240479, speedup=24.26266592243628]
NearestNeighbors (n_samples=65536, n_features=64) [cpu=31.496611833572388, gpu=0.8295607566833496, speedup=37.96781800467318]


KeyboardInterrupt: 

### KNeighborsClassifier<a id="kneighborsclassifier"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_CLASSIFICATION,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("KNeighborsClassifier", runner)

### KNeighborsRegressor<a id="kneighborsregressor"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_REGRESSION,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("KNeighborsRegressor", runner)

## Clustering<a id="clustering"/>

### DBSCAN - Brute Force<a id="dbscan_bruteforce"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_NEIGHBORHOODS,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("DBSCAN", runner)

DBSCAN (n_samples=16384, n_features=64) [cpu=1.1189565658569336, gpu=0.04033160209655762, speedup=27.743915631668862]
DBSCAN (n_samples=16384, n_features=512) [cpu=6.683979749679565, gpu=0.20073485374450684, speedup=33.297554585049106]
DBSCAN (n_samples=32768, n_features=64) [cpu=5.792907476425171, gpu=0.14844655990600586, speedup=39.023521192361436]
DBSCAN (n_samples=32768, n_features=512) [cpu=27.79302954673767, gpu=0.606020450592041, speedup=45.861537378129334]


KeyboardInterrupt: 

### K-means Clustering<a id="kmeans"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_NEIGHBORHOODS,
    input_type="numpy",
    n_reps=N_REPS
)

execute_benchmark("KMeans", runner)

KMeans (n_samples=16384, n_features=64) [cpu=0.0983431339263916, gpu=0.0331873893737793, speedup=2.96326815040446]
KMeans (n_samples=16384, n_features=512) [cpu=0.5573301315307617, gpu=0.08445119857788086, speedup=6.599434240318]
KMeans (n_samples=32768, n_features=64) [cpu=0.14568138122558594, gpu=0.07829666137695312, speedup=1.8606333739342265]
KMeans (n_samples=32768, n_features=512) [cpu=1.2206439971923828, gpu=0.23358154296875, speedup=5.225772471910112]
KMeans (n_samples=65536, n_features=64) [cpu=0.3057842254638672, gpu=0.15462017059326172, speedup=1.9776477046339072]
KMeans (n_samples=65536, n_features=512) [cpu=2.7848353385925293, gpu=0.5538210868835449, speedup=5.028402501362525]
KMeans (n_samples=131072, n_features=64) [cpu=0.9543838500976562, gpu=0.2854435443878174, speedup=3.3435117691818745]
KMeans (n_samples=131072, n_features=512) [cpu=4.326925992965698, gpu=1.240894079208374, speedup=3.486942250321681]
KMeans (n_samples=262144, n_features=64) [cpu=1.0533945560455322, g

## Manifold Learning<a id="manifold_learning"/>

### UMAP - Unsupervised<a id="umap_unsupervised"/>
CPU benchmark requires UMAP-learn

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=WIDE_FEATURES,
    dataset_name=DATA_NEIGHBORHOODS,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("UMAP-Unsupervised", runner)

[2025-04-14 14:28:05.225] [CUML] [info] Building knn graph using brute force




UMAP-Unsupervised (n_samples=16384, n_features=1000) [cpu=36.85332441329956, gpu=1.6239778995513916, speedup=22.693242576441428]
[2025-04-14 14:28:44.628] [CUML] [info] Building knn graph using brute force




UMAP-Unsupervised (n_samples=16384, n_features=10000) [cpu=32.27751159667969, gpu=6.11249852180481, speedup=5.280575771354012]
[2025-04-14 14:29:22.571] [CUML] [info] Building knn graph using brute force




UMAP-Unsupervised (n_samples=32768, n_features=1000) [cpu=30.270137786865234, gpu=2.4500648975372314, speedup=12.354831015820162]
[2025-04-14 14:29:57.326] [CUML] [info] Building knn graph using brute force




### UMAP - Supervised<a id="umap_supervised"/>
CPU benchmark requires UMAP-learn

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=WIDE_FEATURES,
    dataset_name=DATA_NEIGHBORHOODS,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("UMAP-Supervised", runner)

### T-SNE<a id="tsne"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_NEIGHBORHOODS,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

# Due to extreme high runtime, the CPU benchmark
# is disabled. Use run_cpu=True to re-enable.

execute_benchmark("TSNE", runner, run_cpu=True)

## Linear Models<a id="linear_models"/>

### Linear Regression<a id="linear_regression"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_REGRESSION,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("LinearRegression", runner)

LinearRegression (n_samples=16384, n_features=64) [cpu=0.04991602897644043, gpu=0.20392751693725586, speedup=0.2447733867705481]
LinearRegression (n_samples=16384, n_features=512) [cpu=0.4313216209411621, gpu=0.0546259880065918, speedup=7.895905166769961]
LinearRegression (n_samples=32768, n_features=64) [cpu=0.05830574035644531, gpu=0.008665084838867188, speedup=6.728813559322034]
LinearRegression (n_samples=32768, n_features=512) [cpu=0.9007878303527832, gpu=0.04964470863342285, speedup=18.14468963861208]
LinearRegression (n_samples=65536, n_features=64) [cpu=0.11041426658630371, gpu=0.011630773544311523, speedup=9.493286595740319]
LinearRegression (n_samples=65536, n_features=512) [cpu=2.210209846496582, gpu=0.11963009834289551, speedup=18.475365958167668]
LinearRegression (n_samples=131072, n_features=64) [cpu=0.3291754722595215, gpu=0.021834850311279297, speedup=15.075691729815903]
LinearRegression (n_samples=131072, n_features=512) [cpu=3.804492473602295, gpu=0.11341142654418945,

### Logistic Regression<a id="logistic_regression"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_CLASSIFICATION,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("LogisticRegression", runner)

LogisticRegression (n_samples=16384, n_features=64) [cpu=0.049225807189941406, gpu=2.1575589179992676, speedup=0.02281551005596136]
LogisticRegression (n_samples=16384, n_features=512) [cpu=0.26046156883239746, gpu=0.04640030860900879, speedup=5.613358545245276]
LogisticRegression (n_samples=32768, n_features=64) [cpu=0.0496373176574707, gpu=0.018589258193969727, speedup=2.670215085482692]
LogisticRegression (n_samples=32768, n_features=512) [cpu=0.527238130569458, gpu=0.04798150062561035, speedup=10.988362675094038]
LogisticRegression (n_samples=65536, n_features=64) [cpu=0.10184907913208008, gpu=0.02131366729736328, speedup=4.778580697122914]
LogisticRegression (n_samples=65536, n_features=512) [cpu=1.3272318840026855, gpu=0.07958745956420898, speedup=16.676394638930663]
LogisticRegression (n_samples=131072, n_features=64) [cpu=0.37027716636657715, gpu=0.03832602500915527, speedup=9.661246275295333]
LogisticRegression (n_samples=131072, n_features=512) [cpu=1.925128698348999, gpu=0.1

### Ridge Regression<a id="ridge_regression"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_REGRESSION,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("Ridge", runner)

### Lasso Regression<a id="lasso_regression"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_REGRESSION,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("Lasso", runner)

### ElasticNet Regression<a id="elasticnet_regression"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_REGRESSION,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("ElasticNet", runner)

### Mini-batch SGD Classifier<a id="minibatch_sgd_classifier"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_CLASSIFICATION,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("MBSGDClassifier", runner, run_cpu=False)

MBSGDClassifier (n_samples=16384, n_features=64) [cpu=0.0, gpu=3.4012086391448975, speedup=0.0]
MBSGDClassifier (n_samples=16384, n_features=512) [cpu=0.0, gpu=3.6186819076538086, speedup=0.0]
MBSGDClassifier (n_samples=32768, n_features=64) [cpu=0.0, gpu=7.3478546142578125, speedup=0.0]
MBSGDClassifier (n_samples=32768, n_features=512) [cpu=0.0, gpu=7.901196241378784, speedup=0.0]
MBSGDClassifier (n_samples=65536, n_features=64) [cpu=0.0, gpu=14.105752229690552, speedup=0.0]
MBSGDClassifier (n_samples=65536, n_features=512) [cpu=0.0, gpu=15.406798124313354, speedup=0.0]


KeyboardInterrupt: 

## Decomposition<a id="decomposition"/>

### PCA<a id="pca"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=WIDE_FEATURES,
    dataset_name=DATA_NEIGHBORHOODS,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("PCA", runner)

PCA (n_samples=16384, n_features=1000) [cpu=0.48946523666381836, gpu=0.21363091468811035, speedup=2.2911723117340546]
PCA (n_samples=16384, n_features=10000) [cpu=4.371657371520996, gpu=6.5629894733428955, speedup=0.6661076311759293]
PCA (n_samples=32768, n_features=1000) [cpu=0.5718538761138916, gpu=0.1176750659942627, speedup=4.859601065715762]


KeyboardInterrupt: 

### Truncated SVD<a id="truncated_svd"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=WIDE_FEATURES,
    dataset_name=DATA_NEIGHBORHOODS,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("TSVD", runner)

## Ensemble<a id="ensemble"/>

### Random Forest Classifier<a id="random_forest_classifier"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_CLASSIFICATION,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("RandomForestClassifier", runner)

RandomForestClassifier (n_samples=16384, n_features=64) [cpu=12.99736499786377, gpu=1.0100727081298828, speedup=12.867751888800138]
RandomForestClassifier (n_samples=16384, n_features=512) [cpu=47.425092458724976, gpu=0.5071637630462646, speedup=93.51041204889623]


KeyboardInterrupt: 

### Random Forest Regressor<a id="random_forest_regressor"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_REGRESSION,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("RandomForestRegressor", runner)

### FIL<a id="fil"/>
CPU benchmark requires XGBoost Library

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_CLASSIFICATION,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("FIL", runner)


    E.g. tree_method = "hist", device = "cuda"

Parameters: { "fil_algo", "num_rounds", "output_class", "precision", "silent", "storage_type", "threshold" } are not used.


    E.g. tree_method = "hist", device = "cuda"



Failed to run with 16384 samples, 64 features: [13:40:14] /workspace/dmlc-core/src/io/local_filesys.cc:210: Check failed: allow_null:  LocalFileSystem::Open "/tmp/tmpo7upz1b6/xgb_10_100_64_16384.model": No such file or directory
Stack trace:
  [bt] (0) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0x25c1ac) [0x79988285c1ac]
  [bt] (1) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf3770d) [0x79988353770d]
  [bt] (2) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf23b25) [0x799883523b25]
  [bt] (3) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(XGBoosterLoadModel+0x1f3) [0x79988276b8a3]
  [bt] (4) /lib/x86_64-linux-gnu/libffi.so.8(+0x7e2e) [0x799a4a89ae2e]
  [bt] (5) /lib/x86_64-linux-gnu/libffi.so.8(+0x4493) [0x799a4a897493]
  [bt] (6) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so(+0xa4d8) [0x799a4a8aa4d8]
  [bt] (7) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-li


    E.g. tree_method = "hist", device = "cuda"

Parameters: { "fil_algo", "num_rounds", "output_class", "precision", "silent", "storage_type", "threshold" } are not used.


    E.g. tree_method = "hist", device = "cuda"

Parameters: { "fil_algo", "num_rounds", "output_class", "precision", "silent", "storage_type", "threshold" } are not used.



Failed to run with 16384 samples, 512 features: [13:40:15] /workspace/dmlc-core/src/io/local_filesys.cc:210: Check failed: allow_null:  LocalFileSystem::Open "/tmp/tmpo7upz1b6/xgb_10_100_512_16384.model": No such file or directory
Stack trace:
  [bt] (0) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0x25c1ac) [0x79988285c1ac]
  [bt] (1) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf3770d) [0x79988353770d]
  [bt] (2) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf23b25) [0x799883523b25]
  [bt] (3) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(XGBoosterLoadModel+0x1f3) [0x79988276b8a3]
  [bt] (4) /lib/x86_64-linux-gnu/libffi.so.8(+0x7e2e) [0x799a4a89ae2e]
  [bt] (5) /lib/x86_64-linux-gnu/libffi.so.8(+0x4493) [0x799a4a897493]
  [bt] (6) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so(+0xa4d8) [0x799a4a8aa4d8]
  [bt] (7) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-


    E.g. tree_method = "hist", device = "cuda"



Failed to run with 32768 samples, 64 features: [13:40:16] /workspace/dmlc-core/src/io/local_filesys.cc:210: Check failed: allow_null:  LocalFileSystem::Open "/tmp/tmpo7upz1b6/xgb_10_100_64_32768.model": No such file or directory
Stack trace:
  [bt] (0) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0x25c1ac) [0x79988285c1ac]
  [bt] (1) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf3770d) [0x79988353770d]
  [bt] (2) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf23b25) [0x799883523b25]
  [bt] (3) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(XGBoosterLoadModel+0x1f3) [0x79988276b8a3]
  [bt] (4) /lib/x86_64-linux-gnu/libffi.so.8(+0x7e2e) [0x799a4a89ae2e]
  [bt] (5) /lib/x86_64-linux-gnu/libffi.so.8(+0x4493) [0x799a4a897493]
  [bt] (6) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so(+0xa4d8) [0x799a4a8aa4d8]
  [bt] (7) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-li


    E.g. tree_method = "hist", device = "cuda"

Parameters: { "fil_algo", "num_rounds", "output_class", "precision", "silent", "storage_type", "threshold" } are not used.


    E.g. tree_method = "hist", device = "cuda"


    E.g. tree_method = "hist", device = "cuda"

Parameters: { "fil_algo", "num_rounds", "output_class", "precision", "silent", "storage_type", "threshold" } are not used.



Failed to run with 32768 samples, 512 features: [13:40:19] /workspace/dmlc-core/src/io/local_filesys.cc:210: Check failed: allow_null:  LocalFileSystem::Open "/tmp/tmpo7upz1b6/xgb_10_100_512_32768.model": No such file or directory
Stack trace:
  [bt] (0) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0x25c1ac) [0x79988285c1ac]
  [bt] (1) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf3770d) [0x79988353770d]
  [bt] (2) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf23b25) [0x799883523b25]
  [bt] (3) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(XGBoosterLoadModel+0x1f3) [0x79988276b8a3]
  [bt] (4) /lib/x86_64-linux-gnu/libffi.so.8(+0x7e2e) [0x799a4a89ae2e]
  [bt] (5) /lib/x86_64-linux-gnu/libffi.so.8(+0x4493) [0x799a4a897493]
  [bt] (6) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so(+0xa4d8) [0x799a4a8aa4d8]
  [bt] (7) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-


    E.g. tree_method = "hist", device = "cuda"



Failed to run with 65536 samples, 64 features: [13:40:20] /workspace/dmlc-core/src/io/local_filesys.cc:210: Check failed: allow_null:  LocalFileSystem::Open "/tmp/tmpo7upz1b6/xgb_10_100_64_65536.model": No such file or directory
Stack trace:
  [bt] (0) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0x25c1ac) [0x79988285c1ac]
  [bt] (1) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf3770d) [0x79988353770d]
  [bt] (2) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf23b25) [0x799883523b25]
  [bt] (3) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(XGBoosterLoadModel+0x1f3) [0x79988276b8a3]
  [bt] (4) /lib/x86_64-linux-gnu/libffi.so.8(+0x7e2e) [0x799a4a89ae2e]
  [bt] (5) /lib/x86_64-linux-gnu/libffi.so.8(+0x4493) [0x799a4a897493]
  [bt] (6) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so(+0xa4d8) [0x799a4a8aa4d8]
  [bt] (7) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-li


    E.g. tree_method = "hist", device = "cuda"

Parameters: { "fil_algo", "num_rounds", "output_class", "precision", "silent", "storage_type", "threshold" } are not used.


    E.g. tree_method = "hist", device = "cuda"



Failed to run with 65536 samples, 512 features: [13:40:25] /workspace/dmlc-core/src/io/local_filesys.cc:210: Check failed: allow_null:  LocalFileSystem::Open "/tmp/tmpo7upz1b6/xgb_10_100_512_65536.model": No such file or directory
Stack trace:
  [bt] (0) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0x25c1ac) [0x79988285c1ac]
  [bt] (1) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf3770d) [0x79988353770d]
  [bt] (2) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf23b25) [0x799883523b25]
  [bt] (3) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(XGBoosterLoadModel+0x1f3) [0x79988276b8a3]
  [bt] (4) /lib/x86_64-linux-gnu/libffi.so.8(+0x7e2e) [0x799a4a89ae2e]
  [bt] (5) /lib/x86_64-linux-gnu/libffi.so.8(+0x4493) [0x799a4a897493]
  [bt] (6) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so(+0xa4d8) [0x799a4a8aa4d8]
  [bt] (7) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-


    E.g. tree_method = "hist", device = "cuda"

Parameters: { "fil_algo", "num_rounds", "output_class", "precision", "silent", "storage_type", "threshold" } are not used.


    E.g. tree_method = "hist", device = "cuda"



Failed to run with 131072 samples, 64 features: [13:40:27] /workspace/dmlc-core/src/io/local_filesys.cc:210: Check failed: allow_null:  LocalFileSystem::Open "/tmp/tmpo7upz1b6/xgb_10_100_64_131072.model": No such file or directory
Stack trace:
  [bt] (0) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0x25c1ac) [0x79988285c1ac]
  [bt] (1) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf3770d) [0x79988353770d]
  [bt] (2) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf23b25) [0x799883523b25]
  [bt] (3) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(XGBoosterLoadModel+0x1f3) [0x79988276b8a3]
  [bt] (4) /lib/x86_64-linux-gnu/libffi.so.8(+0x7e2e) [0x799a4a89ae2e]
  [bt] (5) /lib/x86_64-linux-gnu/libffi.so.8(+0x4493) [0x799a4a897493]
  [bt] (6) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so(+0xa4d8) [0x799a4a8aa4d8]
  [bt] (7) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-


    E.g. tree_method = "hist", device = "cuda"

Parameters: { "fil_algo", "num_rounds", "output_class", "precision", "silent", "storage_type", "threshold" } are not used.


    E.g. tree_method = "hist", device = "cuda"



Failed to run with 131072 samples, 512 features: [13:40:36] /workspace/dmlc-core/src/io/local_filesys.cc:210: Check failed: allow_null:  LocalFileSystem::Open "/tmp/tmpo7upz1b6/xgb_10_100_512_131072.model": No such file or directory
Stack trace:
  [bt] (0) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0x25c1ac) [0x79988285c1ac]
  [bt] (1) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf3770d) [0x79988353770d]
  [bt] (2) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf23b25) [0x799883523b25]
  [bt] (3) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(XGBoosterLoadModel+0x1f3) [0x79988276b8a3]
  [bt] (4) /lib/x86_64-linux-gnu/libffi.so.8(+0x7e2e) [0x799a4a89ae2e]
  [bt] (5) /lib/x86_64-linux-gnu/libffi.so.8(+0x4493) [0x799a4a897493]
  [bt] (6) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so(+0xa4d8) [0x799a4a8aa4d8]
  [bt] (7) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_6


    E.g. tree_method = "hist", device = "cuda"

Parameters: { "fil_algo", "num_rounds", "output_class", "precision", "silent", "storage_type", "threshold" } are not used.


    E.g. tree_method = "hist", device = "cuda"



Failed to run with 262144 samples, 64 features: [13:40:39] /workspace/dmlc-core/src/io/local_filesys.cc:210: Check failed: allow_null:  LocalFileSystem::Open "/tmp/tmpo7upz1b6/xgb_10_100_64_262144.model": No such file or directory
Stack trace:
  [bt] (0) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0x25c1ac) [0x79988285c1ac]
  [bt] (1) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf3770d) [0x79988353770d]
  [bt] (2) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(+0xf23b25) [0x799883523b25]
  [bt] (3) /usr/local/lib/python3.11/dist-packages/xgboost/lib/libxgboost.so(XGBoosterLoadModel+0x1f3) [0x79988276b8a3]
  [bt] (4) /lib/x86_64-linux-gnu/libffi.so.8(+0x7e2e) [0x799a4a89ae2e]
  [bt] (5) /lib/x86_64-linux-gnu/libffi.so.8(+0x4493) [0x799a4a897493]
  [bt] (6) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so(+0xa4d8) [0x799a4a8aa4d8]
  [bt] (7) /usr/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-

KeyboardInterrupt: 

## Sparse FIL<a id="sparse_fil"/>
Requires TreeLite library

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_CLASSIFICATION,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("Sparse-FIL-SKL", runner)

## Random Projection<a id="random_projection"/>

### Gaussian Random Projection<a id="gaussian_random_projection"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=WIDE_FEATURES,
    dataset_name=DATA_NEIGHBORHOODS,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("GaussianRandomProjection", runner)

### Sparse Random Projection<a id="sparse_random_projection"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=WIDE_FEATURES,
    dataset_name=DATA_NEIGHBORHOODS,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("SparseRandomProjection", runner)

## SVM<a id="svm"/>

### SVC - Linear Kernel<a id="svc_linear_kernel"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_CLASSIFICATION,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

# Due to extreme high runtime, the CPU benchmark
# is disabled. Use run_cpu=True to re-enable.

execute_benchmark("SVC-Linear", runner, run_cpu=True)

### SVC - RBF Kernel<a id="svc_rbf_kernel"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_CLASSIFICATION,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

# Due to extreme high runtime, the CPU benchmark
# is disabled. Use run_cpu=True to re-enable.

execute_benchmark("SVC-RBF", runner, run_cpu=True)

### SVR - Linear Kernel<a id="svr_linear_kernel"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_REGRESSION,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

# Due to extreme high runtime, the CPU benchmark
# is disabled. Use run_cpu=True to re-enable.

execute_benchmark("SVR-Linear", runner, run_cpu=False)

### SVR - RBF Kernel<a id="svr_rbf_kernel"/>

In [None]:
runner = cuml.benchmark.runners.SpeedupComparisonRunner(
    bench_rows=SMALL_ROW_SIZES,
    bench_dims=SKINNY_FEATURES,
    dataset_name=DATA_REGRESSION,
    input_type=INPUT_TYPE,
    n_reps=N_REPS
)

execute_benchmark("SVR-RBF", runner)

## Charting & Storing Results<a id="charting_and_storing_results"/>

### Convert Results to Pandas DataFrame<a id="convert_to_pandas"/>

In [None]:
%matplotlib inline

In [None]:
df = pd.DataFrame(benchmark_results)

### Chart Results<a id="chart_results"/>

In [None]:
def chart_single_algo_speedup(df, algorithm):
    df = df.loc[df.algo == algorithm]
    df = df.pivot(index="n_samples", columns="n_features", values="speedup")
    axes = df.plot.bar(title="%s Speedup" % algorithm)

In [None]:
def chart_all_algo_speedup(df):
    df = df[["algo", "n_samples", "speedup"]].groupby(["algo", "n_samples"]).mean()
    df.plot.bar()

In [None]:
chart_single_algo_speedup(df, "LinearRegression")

In [None]:
chart_all_algo_speedup(df)

### Output Results to CSV<a id="output_csv"/>

In [None]:
df.to_csv("benchmark_results.csv")