# Part 3: TPOT

Without an extensive background in the statistics and mathematics behind different machine learning models, it can be difficult to determine what the best model for a given dataset is. This also applies to tuning the parameters. As you have probably noticed, the models we've used in this workshop so far have many different parameters, and it's by no means obvious how to tune them. 

Moreover, testing out many different models, along with many different combinations of parameters, could be extremely time consuming and impractical. 

[TPOT](https://github.com/rhiever/tpot) is a new tool that automates the model selection and hyperparameter tuning process using genetic programming. It also determines what preprocessing, if any, is necessary, such as PCA or standard scaling. It then exports this model to a file with the scikit-learn code written for you. 

Although it is in your best interest to learn as much about the theory behind machine learning as possible, tools like TPOT can theoretically do the work for you. 

TPOT can be used for both classification and regression.

Let's set the random seed.

In [None]:
import numpy as np

np.random.seed(10)

## Classification

In [None]:
from sklearn.datasets import load_iris
iris = load_iris()

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target,
                                                    train_size=0.75, test_size=0.25)

In [None]:
from tpot import TPOTClassifier

tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_iris_pipeline.py')

Let's look at the model TPOT created for us:

In [None]:
!cat tpot_iris_pipeline.py

## Regression

First we'll load the Boston dataset.

In [None]:
from sklearn.datasets import load_boston

boston = load_boston()

Then split our data into train and test sets.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target,
                                                    train_size=0.75, test_size=0.25)

Now TPOT will make the model.

In [None]:
from tpot import TPOTRegressor

tpot = TPOTRegressor(generations=5, population_size=20, verbosity=2)  # generations for optimization, , pop size is models
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_boston_pipeline.py')

In [None]:
!cat tpot_boston_pipeline.py