# Distributed Automated Machine Learning with TPOT

TPOT is an [automated machine learning](https://en.wikipedia.org/wiki/Automated_machine_learning) library.
It evaluates many scikit-learn pipelines and hyperparameter combinations to find a model that works well for your data. Evaluating all these computations is computationally expensive, but ammenable to parallelism. TPOT can use Dask to distribute these computations on a cluster of machines.

In [None]:
from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from distributed import Client

In [None]:
client = Client()
client

We'll train on the digits dataset.

In [None]:
digits = load_digits()

X_train, X_test, y_train, y_test = train_test_split(
    digits.data,
    digits.target,
    train_size=0.20,
    test_size=0.80
)

TPOT is an automated machine learning library. It does a bunch of feature engineering, hyperparameter optimization, and model selection for you. It uses genetic algorithms to produce new models and hyperparameters to try in the next "generation" of models.

In [None]:
tp = TPOTClassifier(
    generations=3,
    population_size=50,
    cv=3,
    n_jobs=-1,
    random_state=0,
    verbosity=2,
    use_dask=True
)

In [None]:
%time tp.fit(X_train, y_train)

In [None]:
tp.fitted_pipeline_