New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLAS : Program is Terminated. Because you tried to allocate too many memory regions. #1020
Comments
Current theory based on OpenMathLib/OpenBLAS#1882: AutoGluon will call .fit many times very quickly when sampling KNN with 96 threads, each fit call creates 96 threads for parallel fitting/inference. Because this process repeats so rapidly, python isn't able to clean the threads faster than they are created, causing BLAS to error. |
I can confirm this is still happening. Any best practices you recommend to alleviate this bug?
|
@rakshithvasudev I suspect it occurs when you have many CPUs. Could you tell me how many CPUs your machine has? One option is to specify your |
@Innixma Thanks for your response. I have I'm very new to autogluon. I'm using the
Including the full log it that helps:
|
Thanks for the info! To understand how to specify custom hyperparameters, refer to the The default hyperparameters are:
You would want to edit
Another option is to disable KNN entirely:
|
Thanks @Innixma |
This worked for me.
https://www.discoverbits.in/2509/program-terminated-because-tried-allocate-memory-regions |
When training K-nearest-neighbors (KNN) models, sometimes a rare error can occur that crashes the entire process:
It has so far only occurred on machines with >300 GB of memory (2 confirmed instances). In both cases, the system was not low on memory and had plenty to complete the task, yet the error still occurred.
The text was updated successfully, but these errors were encountered: