Pool does not use multiple cores during initialisation #1558

rivershah · 2021-01-26T21:45:52Z

Problem: Pool does not use multiple cores during initialisation
catboost version: 0.24.4
Operating System: MacOS Big Sur / CentOS 7

For large data sets it is imperative that all pre processing and pool creation work is done as fast as possible and utilises all the cores available to it on powerful multi-core machines. That is not happening now. Please run the code below and look in htop or any other core utilisation viewer. Only one core gets utilized even on very large processors with up to 128 cores.

Minimal replicating example:

import os
import numpy as np
from catboost import Pool


def test_catboost_pool_threading():
    print("number of cpus: %d " % os.cpu_count())
    x_data = np.random.normal(loc=0, scale=0.01, size=(100000000, 30))
    y_data = np.random.normal(loc=0, scale=0.01, size=(100000000,))

    data_set = Pool(data=x_data,
                    label=y_data,
                    cat_features=None,
                    text_features=None,
                    embedding_features=None,
                    thread_count=-1)

if __name__ == "__main__":
    test_catboost_pool_threading()

The text was updated successfully, but these errors were encountered:

andrey-khropov · 2024-03-10T18:17:54Z

Since 588e1bc (released in CatBoost 1.2.3) we now use multiple cores for features with float32 data type. Other data types will follow.

Related: #2542

kizill added performance python labels Jan 29, 2023

andrey-khropov added the in progress label Mar 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pool does not use multiple cores during initialisation #1558

Pool does not use multiple cores during initialisation #1558

rivershah commented Jan 26, 2021 •

edited

andrey-khropov commented Mar 10, 2024

Pool does not use multiple cores during initialisation #1558

Pool does not use multiple cores during initialisation #1558

Comments

rivershah commented Jan 26, 2021 • edited

andrey-khropov commented Mar 10, 2024

rivershah commented Jan 26, 2021 •

edited