Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] TPOT can't run for a lot of time #1200

Open
neel04 opened this issue Apr 1, 2021 · 1 comment
Open

[Question] TPOT can't run for a lot of time #1200

neel04 opened this issue Apr 1, 2021 · 1 comment

Comments

@neel04
Copy link

neel04 commented Apr 1, 2021

I wanted to run TPOT for a large amount of time on my server (about 200--> population size and 100--> Generations) it works with 50 gens but if I increase any more, then it doesn't work.

To describe, what happens is that tqdm shows that it gets stuck at one point (nearly always around #4000th pipeline), it uses exactly 1 CPU core when in its 'stuck' phase, does not store any checkpoint, and does not move any further. This seems that it has stopped doing anything.

Any idea what the issue could be?

@perib
Copy link
Contributor

perib commented May 9, 2023

This is most likely related to TPOT not being able to terminate some pipelines. The current timeout method doesn't always work on specific modules. If those modules can't be timed out and they run for a long time (such as SVC), then TPOT will get stuck slowly fitting a single pipeline which may never converge.

This could be resolved by using func_timeout https://pypi.org/project/func-timeout/

We think this has been resolved in the next version of TPOT, TPOT2, found here: https://github.com/EpistasisLab/tpot2

It may be a good idea to bring the same fix to this version of TPOT as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants