You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem: Pool does not use multiple cores during initialisation
catboost version: 0.24.4
Operating System: MacOS Big Sur / CentOS 7
For large data sets it is imperative that all pre processing and pool creation work is done as fast as possible and utilises all the cores available to it on powerful multi-core machines. That is not happening now. Please run the code below and look in htop or any other core utilisation viewer. Only one core gets utilized even on very large processors with up to 128 cores.
Minimal replicating example:
import os
import numpy as np
from catboost import Pool
def test_catboost_pool_threading():
print("number of cpus: %d " % os.cpu_count())
x_data = np.random.normal(loc=0, scale=0.01, size=(100000000, 30))
y_data = np.random.normal(loc=0, scale=0.01, size=(100000000,))
data_set = Pool(data=x_data,
label=y_data,
cat_features=None,
text_features=None,
embedding_features=None,
thread_count=-1)
if __name__ == "__main__":
test_catboost_pool_threading()
The text was updated successfully, but these errors were encountered:
Problem: Pool does not use multiple cores during initialisation
catboost version: 0.24.4
Operating System: MacOS Big Sur / CentOS 7
For large data sets it is imperative that all pre processing and pool creation work is done as fast as possible and utilises all the cores available to it on powerful multi-core machines. That is not happening now. Please run the code below and look in htop or any other core utilisation viewer. Only one core gets utilized even on very large processors with up to 128 cores.
Minimal replicating example:
The text was updated successfully, but these errors were encountered: