New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
numpy.core._exceptions._ArrayMemoryError: during CatBoostRegressor Training #2405
Comments
Hi @nikitxskv Thanks a lot for your reply. I need some help in changing the numpy to FeaturesData class. most of the columns in my input dataset are categorical. Attached a sample data for reference. Approximately 3 million rows * 30 columns dataset categorical_features_indices = np.where(np.isin(X_train[X_train.columns].dtypes, ['bool', 'object']))[0] col = "work_time_sec" |
Hi @Karrvp ! |
Hi @Karrvp ! P.S. We will fix this bug in the next release! |
Hi @nikitxskv, After implementing the change, I am getting a different warning followed by kernel termination. Re-execution also fails. /opt/conda/anaconda/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py:706: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak. |
Try to do |
By the way, here is fix: 733e63a |
Problem: model.fit fails with numpy.core._exceptions._ArrayMemoryError: Unable to allocate 71.6 GiB for an array with shape (98000, 98000) and data type float64
catboost version: 1.2
Operating System:ubuntu
Partial Code:
import numpy as np
import catboost as cb
from sklearn.model_selection import GridSearchCV
grid = {"learning_rate": [0.01,0.03],
"depth": [4,6],
"iterations": [10],
}
cbr = cb.CatBoostRegressor(loss_function='RMSE',eval_metric="RMSE",boosting_type ='Plain')#,task_type='GPU')
gscv = GridSearchCV(estimator = cbr, param_grid = grid)#, cv = 3)#, n_jobs=-1)
gscv.fit(X_train, y_train, cat_features = categorical_features_indices)
Error:
0: learn: 1.5782847 total: 207ms remaining: 1.86s
1: learn: 1.5758104 total: 378ms remaining: 1.51s
2: learn: 1.5733945 total: 550ms remaining: 1.28s
3: learn: 1.5710246 total: 678ms remaining: 1.02s
4: learn: 1.5686950 total: 805ms remaining: 805ms
5: learn: 1.5663753 total: 932ms remaining: 622ms
6: learn: 1.5641121 total: 1.06s remaining: 453ms
7: learn: 1.5618219 total: 1.18s remaining: 296ms
8: learn: 1.5596248 total: 1.31s remaining: 146ms
9: learn: 1.5576024 total: 1.44s remaining: 0us
/opt/conda/anaconda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py:774: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/opt/conda/anaconda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 761, in _score
scores = scorer(estimator, X_test, y_test)
File "/opt/conda/anaconda/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 418, in _passthrough_scorer
return estimator.score(*args, **kwargs)
File "/opt/conda/anaconda/lib/python3.7/site-packages/catboost/core.py", line 5856, in score
residual_sum_of_squares = np.sum((y - predictions) ** 2)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 71.6 GiB for an array with shape (98000, 98000) and data type float64
The text was updated successfully, but these errors were encountered: