Does TPOT support memory when running dask.distributed? #1228

KrzysztofNawara · 2021-09-04T17:58:22Z

I wanted to use TPOT with

dask.distributed running multiple processes on the local machine
memory enabled to cache common transformation across processes (it's supposed to be multiprocessing-safe)

But I did two things that make me thing this mode of operation is not supported:

Setting breakpoint inside's joblib.Memory.cache() function - it only get's called to check if produced individual is valid (check_pipeline/_pre_test function)
Looking at the code that actually performs evaluation of individuals. Everything seems to happen inside dask_ml.model_selection._search.build_graph(). But the way it handles pipelines (if my analysis is correct) is to recursively extract all leaf transformers and estimators, turn them into Dask graph nodes and then, at the end, rebuild pipelines. No sklearn.Pipeline code appears to be executed (and that's where caching is implemented)

My questions are as follow:

perib mentioned this issue Sep 21, 2023

TPOT2 and the future of TPOT development -- From the Devs #1322

Open

Provide feedback