# Accelerating Pipeline Search with Dask (Beta)

## Default AutoML Search
By default, the EvalML AutoML search process will evaluate generated pipelines one at a time. For most modern multicore systems, this does not allow for the most straightforward system. 


## The Dask Engine

[Dask](https://dask.org/) is a parallel computing library for Python. One of the major benefits of using Dask is the ability to scale up computation from a local laptop all the way to a cluster. As such, it is currently the preferred method accelerating the EvalML search process.  

Through the use of the Engines API, EvalML provides the ability to scale out the pipeline search to multiple workers. For more information about the Engines API and how to create your own engine, see the documentation. 

## Quick Start Example
The easiest way to enable EvalML to evalute pipelines locally is to create a new `DaskEngine`. By default, this will create a local Dask cluster with 4 workers. 

In [None]:
from evalml.automl.engines import DaskEngine
dask_engine = DaskEngine()

After setting up the AutoML search, pass in the engine object into search. In the output, the 

During the search process, `AutoMLSearch` will send each pipeline batch to `dask_engine`, which will then map each pipeline to a Dask worker. 

In [None]:
from evalml import AutoMLSearch

X, y = evalml.demos.load_breast_cancer()
X_train, X_holdout, y_train, y_holdout = evalml.preprocessing.split_data(X, y, test_size=.8)

automl = AutoMLSearch(problem_type="binary", objective="f1", max_batches=2)
automl.search(X_train, y_train, engine=dask_engine)

## Specifying Your Own Dask Client
`DaskEngine` has the ability to take in a Dask `Client`. This allows for specifying custom settings such as the number of workers. Dask clients can also point to a remote cluster for processing as well. See the [Dask documentation](https://distributed.dask.org/en/latest/client.html) for more details.     


## Dask Engine Limitations
- As the pipelines are evaluated and the results are reported to the tuner in an asyncronous manner, the search process is not guaranteed to be determinisitc even for the same 

## Tips for Optimizing Parallel Search Performance
- Be careful not to create too many Dask workers. A good rule of thumb is to create as many workers as cores on your local machine.
- The Dask Engine works best with longer searches and larger datasets. 