Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pool does not use multiple cores during initialisation #1558

Open
rivershah opened this issue Jan 26, 2021 · 1 comment
Open

Pool does not use multiple cores during initialisation #1558

rivershah opened this issue Jan 26, 2021 · 1 comment

Comments

@rivershah
Copy link

rivershah commented Jan 26, 2021

Problem: Pool does not use multiple cores during initialisation
catboost version: 0.24.4
Operating System: MacOS Big Sur / CentOS 7

For large data sets it is imperative that all pre processing and pool creation work is done as fast as possible and utilises all the cores available to it on powerful multi-core machines. That is not happening now. Please run the code below and look in htop or any other core utilisation viewer. Only one core gets utilized even on very large processors with up to 128 cores.

Minimal replicating example:

import os
import numpy as np
from catboost import Pool


def test_catboost_pool_threading():
    print("number of cpus: %d " % os.cpu_count())
    x_data = np.random.normal(loc=0, scale=0.01, size=(100000000, 30))
    y_data = np.random.normal(loc=0, scale=0.01, size=(100000000,))

    data_set = Pool(data=x_data,
                    label=y_data,
                    cat_features=None,
                    text_features=None,
                    embedding_features=None,
                    thread_count=-1)

if __name__ == "__main__":
    test_catboost_pool_threading()
@andrey-khropov
Copy link
Member

Since 588e1bc (released in CatBoost 1.2.3) we now use multiple cores for features with float32 data type. Other data types will follow.

Related: #2542

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants