-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault on optimization process #676
Comments
While I'm looking at it... This is a pretty mighty stack. Could it have something to do with the maximum recursion depth like in https://stackoverflow.com/q/10035541? |
I am not sure why this error showed up. What are the RAMs size and the number of CPU? I just limited chunk-size when n_jobs = -1 or very large number in #677 (I will merge it dev branch soon). Maybe the dev branch is more stable for n_jobs=-1. |
RAM size is 16GB and nCPUs=16 (hyperthreading). I will check the dev branch out and see if it fixes the issue. |
By the way, I think this line in the output above is significant:
|
16GB RAM maybe not enough to handle n_jobs=-1. Could you please also try n_jobs=4 instead? |
Of course. In the stable branch? |
Yep |
Nope:
Also, I noticed that it used all available cores, even though n_jobs=4 was specified. |
After it ran 1.5 hours, it's using all 16 cores for a short period even though I specified n_jobs=1. Could it be, that TPOT is spawning several jobs when n_jobs>1 and some estimators used by TPOT are spawning several jobs on their own (no matter if n_jobs=1)? I got trouble with this scenario some time ago too. Also, I made some memory measurements during a segfault:
Either the peak was shorter than .5 seconds or memory usage is not a problem. I also tried a non-conda python Version, since I read about segfaults in conda's python. Seems that python is not the problem here, it crashes too. Another observation I made is that the other programs become unstable too when TPOT is run with n_jobs>1: Firefox (sometimes only tabs crash, sometimes the whole thing), conda (segfaults when creating environments), ... |
None of the operators in TPOT should be spawning new processes, especially
when n_jobs=1. We’ve been careful to make sure of that.
|
I’ll have to take a look when I get back to the office. Thank you for the
minimal working example.
|
I couldn't reproduce the issue on my MacBook with TPOT v0.9.2 that's available through
|
This is interesting. How did you count the threads? Here's what I came up with:
To setup up a fresh install I did this:
|
I think the issue about multit-threads in
|
I have found the reason for the initial problem, the segfaults:
This issue can be closed. |
Glad you were able to get to the bottom of the issue! Guess that's about as low-level as an issue can go. |
It's nasty. Due to it's unpredictable nature, it makes you slowly distrust every piece of software on the system. Then I removed the module that I disliked most and now. Suddenly |
Am also running running into this issue on an AWS instance with 137 Gb RAM and 72 CPUs. I created a new instance with the same specs and ran into the same segfault after three generations. Pretty unlikely that two separate instances would both have defective RAM.
|
Really? I don't think so. Do a memory test. PS: They seem to use ECC RAM. Nevertheless I would try to exclude faulty RAM as a reason, although I may be a little biased now. |
I'm getting a segfault at some stage of the process. I got a traceback with gdb. Code as follows:
Traceback and context:
Versions:
If you need further information, please let me know.
The text was updated successfully, but these errors were encountered: