# Using GAMA for Automated Machine Learning
This notebook is used at the Tutorial on Automated Machine Learning at ODSC Europe 2019.
It shows how to use GAMA for automated machine learning.
Most of the information in this notebook can also be found in the [documentation](https://pgijsbers.github.io/gama/).
If you are reading this notebook at a later time, and find its outdated, be sure to check out the documentation instead.

GAMA is hosted on PyPI, so installing it is easy:
```bash
> pip install gama
```

## Using GAMA in Python
First we need to load our data:

In [1]:
import openml
letter_holdout = openml.tasks.get_task(236)
train, test = letter_holdout.get_train_test_split_indices()
x, y = letter_holdout.get_X_and_y()
x_train, x_test, y_train, y_test = x[train, :], x[test, :], y[train], y[test]
f'x_train: {x_train.shape}, x_test: {x_test.shape}, y_train: {y_train.shape}, y_test: {y_test.shape}'

'x_train: (13400, 16), x_test: (6600, 16), y_train: (13400,), y_test: (6600,)'

Now we are ready for AutoML.
Using GAMA through its Python API works much like you would a scikit-learn estimator.
We must first initialize either a `GamaClassifier` or `GamaRegressor` and specify its hyperparameters.
For now, we keep it simple and only specify how long we want GAMA to optimize for, and what to communicate back to us:

In [2]:
import logging
from gama import GamaClassifier

automl = GamaClassifier(
    max_total_time=180,  # GAMA can use at most 180 seconds for fit.
    verbosity=logging.INFO,  # We want to see INFO messages.
    scoring='accuracy'
)  

Using GAMA version 19.11.4.
GamaClassifier(post_processing_method=BestFitPostProcessing(),search_method=AsyncEA(),cache_dir=None,keep_analysis_log=D:\repositories\Talks\odsc\gama.log,verbosity=20,n_jobs=None,max_eval_time=None,max_total_time=180,random_state=None,regularize_length=True,scoring=accuracy)


When we call `fit`, GAMA will start searching for a good machine learning pipeline for the data.

In [3]:
automl.fit(x_train, y_train)

preprocessing took 0.0290s.
Starting EA with new population.
Current Pareto front updated: RobustScaler>VarianceThreshold>LinearSVC scores (-inf, -3)
Current Pareto front updated: BernoulliNB scores (0.1132089552238806, -1)
Current Pareto front updated: PCA>GaussianNB scores (0.6905970149253732, -2)
Current Pareto front updated: ExtraTreesClassifier scores (0.9501492537313433, -1)
Current Pareto front updated: ExtraTreesClassifier scores (0.9556716417910448, -1)
Search phase evaluated 135 individuals.
search took 161.0380s.
postprocess took 2.1004s.
Attempting to delete 20191119_152844_e83a_GAMA


In [4]:
automl.predict(x_test[:5, :])

array([ 8, 23,  7, 24, 16])

In [5]:
f'{automl._metrics[0].name}: {automl.score(x_test, y_test)}'

'accuracy: 0.9577272727272728'

In [6]:
automl.model.steps

[('1', SimpleImputer(add_indicator=False, copy=True, fill_value=None,
                missing_values=nan, strategy='median', verbose=0)),
 ('0',
  ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='entropy',
                       max_depth=None, max_features=0.6000000000000001,
                       max_leaf_nodes=None, min_impurity_decrease=0.0,
                       min_impurity_split=None, min_samples_leaf=3,
                       min_samples_split=4, min_weight_fraction_leaf=0.0,
                       n_estimators=100, n_jobs=None, oob_score=False,
                       random_state=None, verbose=0, warm_start=False))]

*note:* `TPOT` and `auto-sklearn` follow the same scikit-learn convention of `fit` and `predict`, so now you know how to use them too! :)

## So what's going on?

In this section, we shine some light on *how* GAMA performs automated machine learning.
By default GAMA uses an evolutionary algorithm to optimize pipelines, we break it down in pseudo-code.

There are some differences between what is described below and what is implemented in GAMA. That said, the code below conveys the general strategy of using evolution to guide search, and is not off too far.

```python
def optimize():
    # Create an initial random population
    population = [new() for _ in range(20)]
    for pipeline in population:
        score(pipeline)
        
    while there is time left:
        new_pipeline = create_from_population(population)
        population.remove(worst_pipeline)  # worst pipeline determined by its assigned scores
        
        score(new_pipeline)
        population = population + [new_pipeline]
    
```

## Generating New Pipelines

In GAMA, we use scikit-learn base classes to determine if an algorithm is a preprocessing algorithm (`Transformer`) or an algorithm which builds a predictive model (`Estimator`). We can randomly sample an `Estimator` and add any number (or zero) of `Transformer` components for preprocessing to make a new pipeline. Which algorithms to consider, and what their hyperparameters are, is defined in a configuration file (not discussed in the tutorial).

```python
def new() -> Pipeline:  # Creates new pipeline at random.
```

We define a `mutate` function to make changes to a single pipeline:
```python
def mutate(pipeline) -> Pipeline: 
    # Create a variation of the given pipeline: 
    # - Add or remove a step
    # - Change the hyperparameter configuration of a step
    # - Replace a step (e.g. change the estimator)
```
And a `crossover` function to combine two pipelines into a new one:
```python
def crossover(pipeline, other_pipeline) -> Pipeline:
    # Creates a new pipeline by combining two others:
    # - Swap (parts of) the preprocessing pipeline
    # - Swap (some of) the hyperparameters of a shared step
```

We create new pipelines from the population by randomly mutating a single pipeline, or performing crossover on two pipelines:
```python
def create_from_population(pipelines):
    if ...:  # Select with some probability
        pipeline = select(pipelines, n=1)
        return mutate(pipeline)
    else:
        pl1, pl2 = select(pipelines, n=2)
        return crossover(pl1, pl2)
```

### Selecting Pipelines

Selects pipelines that are 'good' for creating offspring.
It can be any kind of selection procedure which takes into account the scores of the pipelines.
For example (NSGA-II):
 1. Select two pipelines at random
 2. 
     - If they are not on the same pareto front, pick the one on the best front.
     - If they are on the same pareto front, decide based on 'crowding distance'.
 3. Repeat 1 & 2 n times.

<img style="height:400px; margin-right: 30%" src="https://raw.githubusercontent.com/PGijsbers/Talks/master/odsc/images/gama/pareto.png">

## Configuring the AutoML Pipeline

While we want you to specify 'resource' and 'preference' hyperparameters, you can also specify hyperparameters which influence the AutoML process. In GAMA you can pick which optimization algorithm to use, or to create an ensemble of pipelines after search. Each of these steps also have their own hyperparameters which may be specified! But don't worry - we do our best to find good defaults so we can keep AutoML automatic. ;)

In [7]:
from gama.search_methods import AsynchronousSuccessiveHalving
from gama.postprocessing import EnsemblePostProcessing 
automl = GamaClassifier(
    search_method=AsynchronousSuccessiveHalving(),
    post_processing_method=EnsemblePostProcessing()
)

This concludes our look at `gama`. 
If you're interested in learning more about the package, visit the [documentation](https://pgijsbers.github.io/gama/).
If you need help, find bugs, have feature requests or want to get in touch visit our Github repository at [PGijsbers/gama](https://github.com/PGijsbers/gama). User feedback is always appreciated!

In a short look ahead, very soon we will also release a first version of our Command Line Interface and Graphical User Interface!

----

## Extra on Mutation and Crossover
Code below illustrates the creation of pipelines in GAMA, in particular the mutation and crossover of pipelines.
First we create two pipelines to modify. Ignore most of the code, what's important is that we create two pipelines here.
One (`ss_bnb`) transforms the data with **S**tandard**S**caler and creates a model with **B**ernoulli**NB**, the second (`mm_dt`) transforms the data with a **M**in**M**axScaler and creates a model with a **D**ecision**T**ree:

In [8]:
from gama.genetic_programming.components import Individual

ss_bnb = Individual.from_string('BernoulliNB(StandardScaler(data), alpha=1.0, fit_prior=True)', automl._pset)
print(ss_bnb.pipeline_str())

mm_dt = Individual.from_string(
    "DecisionTreeClassifier("
        "MinMaxScaler(data), "
        "DecisionTreeClassifier.criterion='gini', "  # Some hyperparameters here are prefixed,
        "DecisionTreeClassifier.max_depth=3, "       # you can ignore this in the tutorial.
        "min_samples_split=10, "
        "min_samples_leaf=5)",
    automl._pset
)
print(mm_dt.pipeline_str())

BernoulliNB(StandardScaler(data), alpha=1.0, fit_prior=True)
DecisionTreeClassifier(MinMaxScaler(data), DecisionTreeClassifier.criterion='gini', DecisionTreeClassifier.max_depth=3, min_samples_leaf=5, min_samples_split=10)


In the figure below you see the different mutations that are in use by GAMA.
A machine learning pipeline is represented by a set of shapes (circles or squares) connected by arrows.
Each color represents a different algorithm, two different shapes with the same color represent the same algorithm but a different hyperparameter configuration.

On the left you see the pipelines before mutation, and on the right you see the pipeline after the mutation.

<img style="height:400px; margin-right: 15%" src="https://raw.githubusercontent.com/PGijsbers/Talks/master/odsc/images/gama/mutations_txt.png">

The four different mutations are:
 - Removing a step (first line).
 - Replacing a step (second line).
 - Changing the hyperparameter configuration of a step (third line).
 - Adding a step (fourth line).
 
Call the code block below to get some mutated versions of our `ss_bnb` pipeline!
You might need to try a few times, but you should be able to find all four mutations can be applied to this pipeline.

In [9]:
mutated = automl._operator_set.mutate(ss_bnb)
print(f'Before: {ss_bnb.pipeline_str()}\nAfter: {mutated.pipeline_str()}')

Before: BernoulliNB(StandardScaler(data), alpha=1.0, fit_prior=True)
After: BernoulliNB(data, alpha=1.0, fit_prior=True)


In the next figure, we visualize crossover in the same way. There are two ways crossover is applied in GAMA:
 - Swapping (part of) the preprocessing pipelines (first line).
 - Changing (a subset of) hyperparameter values when the pipelines share a step.


<img style="height:400px; margin-right: 15%" src="https://raw.githubusercontent.com/PGijsbers/Talks/master/odsc/images/gama/crossover_txt.png">

In the code below performs crossover between the `ss_bnb` and `mm_dt` pipelines. As they have no steps in common, they can't swap hyperparameter configurations. As such, the only crossover method employed is swapping preprocessing pipelines:

In [10]:
mated = automl._operator_set.mate(ss_bnb, mm_dt)
print(f'Parents:\n {ss_bnb.short_name}\n {mm_dt.short_name}\nCreated:')
print(f' {mated.short_name}')

Parents:
 StandardScaler>BernoulliNB
 MinMaxScaler>DecisionTreeClassifier
Created:
 MinMaxScaler>BernoulliNB
