# Setting up pipeline search

Designing the right machine learning pipeline and picking the best parameters is a time-consuming process that relies on a mix of data science intuition as well as trial and error. EvalML streamlines the process of selecting the best modeling algorithms and parameters, so data scientists can focus their energy where it is most needed.

## How it works

EvalML selects and tunes machine learning pipelines built of numerous steps. This include missing value imputation, feature selection, feature scaling, and finaly machine learning. As EvalML tunes pipelines, it uses the objective function selecte and configured by the user to guide it search. 


At each iteration, EvalML uses cross-validation to generate an estimate of the pipelines performances. If a pipeline has high variance across cross-validation folds, it will provide a warning. In this case, the pipeline may not perform reliably in the future.

EvalML is designed to work well out of the box. However, it provides numerous methods for you to control the search described below.

## Selecting problem type

EvalML supports both classifition and regression problems. You select you problem type by importing the appropriate class


In [1]:
import evalml

In [2]:
evalml.AutoClassifier()

<evalml.models.auto_classifier.AutoClassifier at 0x110dcb2b0>

In [3]:
evalml.AutoRegressor()

<evalml.models.auto_regressor.AutoRegressor at 0x12b4d4e10>

## Setting the Objective Function

The only required parameter to start seraching for pipelines is the objective function. Most domain-specific objective functions require you specify parameters based on your business assumptions. You can do this before initialize your pipeline search. For example

In [4]:
from evalml.objectives import FraudDetection

fraud_objective = FraudDetection(
    retry_percentage=.5,
    interchange_fee=.02,
    fraud_payout_percentage=.75,
    amount_col='amount'
)

evalml.AutoClassifier(objective=fraud_objective)

<evalml.models.auto_classifier.AutoClassifier at 0x12b4f16d8>

## Selecting Model Types

By default, all model types are considered. You can control which model types get search with the `model_types` parameters

In [5]:
evalml.AutoClassifier(objective="f1",
                      model_types=["random_forest"])

<evalml.models.auto_classifier.AutoClassifier at 0x12b4f6358>

you can see a list of all supported models like this

In [6]:
evalml.list_model_types()

['random_forest', 'linear_model', 'xgboost']

## Limiting Search Time

You can limit the search time by specifying a maximum number of pipelines or maximum amount of time. EvalML won't build new pipelines after the maximum time has passed or the maximum number of pipelines have been built.

In [7]:
evalml.AutoClassifier(objective="f1",
                      max_time=60,
                      max_pipelines=10)

<evalml.models.auto_classifier.AutoClassifier at 0x12b4f6ba8>

## Control Cross Validation

EvalML cross-validates each model it tests during it's search. By default it uses 3-fold cross-validation. You can optionally provide your own cross validation method. 

In [8]:
from sklearn.model_selection import StratifiedKFold

clf = evalml.AutoClassifier(objective="f1",
                            cv=StratifiedKFold(5))