# optimizer-test

## Model search on MNIST

[MNIST](http://yann.lecun.com/exdb/mnist/) is a dataset of 60,000 handwritten digits for training and 10,000 for testing. It is used as the "Hello World" of computer vision and it is a standard dataset for machine learning. Thus, we have chosen it as the first testing ground for our study.

## Testing the optimization tools

In this notebook, we going to test implementations of the four optimization methods studied:

* **Random search:** `hyperopt`.
* **Tree of Parzen Estimators (TPE):** `hyperopt`.
* **Gaussian Process (GP) SMBO:** `BayesianOptimization`.
* **Sequential Model-based Algorithm Configuration (SMAC):** `pysmac`.


## Libraries & code

Use use pretty much all of the stardard machine learning tools out there. Following are some of the most important: 

* `pandas`, `scikit-learn`, `XGBoost`, `H2O`, `lasagne`
* `Theano`, `scikit-neuralnetwork`, `Auto-sklearn`
* `hyperopt`, `numpy`, `scipy`, `seaborn`, `matplotlib`
* `BayesianOptimization`, `pysmac`

Let us start by importing all libraries and code. We use a base script to write all important functions, including the proposed pipeline.

In [None]:
# starting up a console attached to this kernel
%matplotlib inline
%qtconsole
import os

# importing base code
os.chdir('C:\\Users\\Guilherme\\Documents\\TCC\\tsne-optim\\code')
from base import *

# changing to competition dir
os.chdir('C:\\Users\\Guilherme\\Documents\\TCC\\tsne-optim')

## Target Function

For this simple demonstration, let's use the function defined at one of the `BayesianOptimization` [examples](https://github.com/fmfn/BayesianOptimization/blob/master/examples/visualization.ipynb):

$$f(x) = e^{-(x - 2)^2} + e^{-\frac{(x - 6)^2}{10}} + \frac{1}{x^2 + 1} $$ 

Let us write it in Python.

In [None]:
def target(x):
    return np.exp(-(x - 2)**2) + np.exp(-(x - 6)**2/10) + 1/ (x**2 + 1)

In [None]:
x = np.linspace(-2, 10, 1000)
y = target(x)
plt.figure(figsize=[16,6])
plt.plot(x, y)

# Optimization algorithms

Let us define the optimization objectives and frameworks for each of the optimization methods.

## `hyperopt`: Random Search and TPE

Let us start with `hyperopt` implementing **Random Search** and **Tree of Parzen Estimators**. We have to make a small modification to the target function for it to work. 

In [None]:
def target_hyperopt(space):
    return {'loss': -target(space['x']),
            'x': space['x'],
            'status': STATUS_OK
            }

Now we have to define the search space, which is just the interval between -2 and 10.

In [None]:
space = {'x': hp.uniform('x',-2,10)}

Finally, the optimization function. Let us do **Random Search** first.

In [None]:
# object that is going to carry trial information
trials = Trials()

# parameters of optim function
evals = 100
algo = rand.suggest 

# minimization function
fmin(target_hyperopt, space, algo=algo, trials=trials, max_evals=evals)

Let us plot the results.

In [None]:
# extracting trials
x_trials = [e['result']['x'] for e in trials.trials]
loss_trials = [-e['result']['loss'] for e in trials.trials]

In [None]:
# plotting the samples 
plt.figure(figsize=[16,6])
plt.plot(x, y)
plt.plot(x_trials, loss_trials, 'ro')
plt.title('Objective Samples [Random Search]', fontsize=14)

In [None]:
plt.figure(figsize=[16,6])
plt.plot(loss_trials)
plt.title('Loss over rounds [Random Search]', fontsize=14)

In [None]:
plt.figure(figsize=[16,6])
plt.plot(x_trials, 'bo')
plt.title('Candidate solutions [Random Search]',fontsize=14)

Let us try **TPE** now.

In [None]:
# object that is going to carry trial information
trials = Trials()

# parameters of optim function
evals = 100
algo = tpe.suggest 

# minimization function
fmin(target_hyperopt, space, algo=algo, trials=trials, max_evals=evals)

In [None]:
# plotting the samples 
plt.figure(figsize=[16,6])
plt.plot(x, y)
plt.plot(x_trials, loss_trials, 'ro')
plt.title('Objective Samples [TPE]', fontsize=14)

In [None]:
plt.figure(figsize=[16,6])
plt.plot(loss_trials)
plt.title('Loss over rounds [TPE]', fontsize=14)

In [None]:
plt.figure(figsize=[16,6])
plt.plot(x_trials, 'bo')
plt.title('Candidate solutions [TPE]',fontsize=14)

**TPE** does explore less the unpromising areas of the function than **Random Search**, and it looks like it changes its behavior in steps of 20 samples. being more focused as time passes. Let us move to **SMAC**!

## `pysmac`: SMAC

`pysmac` is a Python wrapper for the Sequential Model-based Algorithm Configuration [library](http://www.cs.ubc.ca/labs/beta/Projects/SMAC/). It works roughly the same way as the previous algortihms. We need to define search bounds and a initial point for each parameter (in our case it's only `x`).

In [None]:
# module
import pysmac

# initial point and search space
init_x = random.uniform(-2,10)
space = {'x': ('real', [-2, 10], init_x)}

Next we define the optimization function and run.

In [None]:
# optim object
opt = pysmac.SMAC_optimizer(working_directory='output/smac')

# target for SMAC
def target_smac(x):
    return(-target(x))

# minimizing
value, space = opt.minimize(target_smac, 100, space)

Let us visualize! It is a bit more complicated to get the trials from SMAC, as it is saved in a output file.

In [None]:
# reading SMAC trials file
smac_trials = pd.read_csv('output/smac/out/scenario/state-run0/runs_and_results-it36.csv')

# reading parameter trials
from pysmac.utils.smac_output_readers import read_paramstrings_file
param_trials = read_paramstrings_file('output/smac/out/scenario/state-run0/paramstrings-it36.txt')
x_trials = [float(e['x']) for e in param_trials]

In [None]:
# plotting the samples 
plt.figure(figsize=[16,6])
plt.plot(x, y)
plt.plot(x_trials, -smac_trials['Response Value (y)'], 'ro')
plt.title('Objective Samples [SMAC]', fontsize=14)

In [None]:
plt.figure(figsize=[16,6])
plt.plot(-smac_trials['Response Value (y)'])
plt.title('Loss over rounds [SMAC]', fontsize=14)

In [None]:
plt.figure(figsize=[16,6])
plt.plot(x_trials, 'bo')
plt.title('Candidate solutions [SMAC]',fontsize=14)

**SMAC** does more exploration around local (and global) maxima. It was very effective. Now, to the last method, **Gaussian Processes**!

## `BayesianOptimization`: Gaussian Processes

With `BayesianOptimization` we can use Gaussian Processes to optimize functions. The first thing we need to do is define search bounds for the algorithm.

In [None]:
# importing
from bayes_opt import BayesianOptimization

# defining bounds
bounds = {'x': (-2, 10)}

Then, we create the optimization task.

In [None]:
bo = BayesianOptimization(target, bounds, verbose=0)

And finally define the optimization parameters and execute.

In [None]:
bo.maximize(init_points=2, n_iter=30, acq='ucb', kappa=5)

Let's visualize!

In [None]:
# plotting the samples 
plt.figure(figsize=[16,6])
plt.plot(x, y)
plt.plot(bo.X, bo.Y, 'ro')
plt.title('Objective Samples [Gaussian Process]', fontsize=14)

In [None]:
plt.figure(figsize=[16,6])
plt.plot(bo.Y)
plt.title('Loss over rounds [Gaussian Process]', fontsize=14)

In [None]:
plt.figure(figsize=[16,6])
plt.plot(bo.X, 'bo')
plt.title('Candidate solutions [Gaussian Process]',fontsize=14)

Remarkable! **GP's** do estimate very well the target function. Let us see the function it came up with.