Merge pull request #35 from automl/development

Development
automl · Jul 11, 2016 · 228bfab · 228bfab
2 parents 15e1a7b + 8471d02
commit 228bfab
Show file tree

Hide file tree

Showing 75 changed files with 5,007 additions and 418 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -1,6 +1,5 @@
 language: python
 python:
-  - "2.6"
   - "2.7"
   - "3.3"
   - "3.4"

diff --git a/README.md b/README.md
@@ -1,6 +1,10 @@
 RoBO - a Robust Bayesian Optimization framework.
 ================================================
 
+[![Build Status](https://travis-ci.org/automl/RoBO.svg?branch=development)](https://travis-ci.org/automl/RoBO)
+[![Coverage Status](https://coveralls.io/repos/github/automl/RoBO/badge.svg?branch=development)](https://coveralls.io/github/automl/RoBO?branch=development)
+[![Code Health](https://landscape.io/github/automl/RoBO/development/landscape.svg?style=flat)](https://landscape.io/github/automl/RoBO/development)
+
 Documentation
 -------------
 http://robo-fork.readthedocs.org/en/latest/

diff --git a/docs/advanced.rst b/docs/advanced.rst
@@ -65,4 +65,42 @@ It used the HMC method implemented in GPy to sample the marginal loglikelihood.
 
 	bo.run(10)
 
-RoBO will then compute an marginalised acquistion value by computing the acquisition value based on each single GP and sum over all of them.
+RoBO will then compute an marginalised acquistion value by computing the acquisition value based on each single GP and sum over all of them.
+
+
+Fabolas
+-------
+
+The general idea of Fabolas is to expand the traditional way to model the objective function :math:`f(\bm{x}, s)` by an additional input :math:`s` that specifies the amount of training data to evaluate a point :math:`\bm{x}`:
+
+
+At the end we want to find the best points :math:`\bm{x}_{\star}` on the full dataset :math:`s=s_{max}`. Because of that Fabolas uses the information gain acquisition function but models the distribution over the minimum :math:`p_{min}(\bm{x, s})` only on the subspace :math:`s_{max}` such that :math:`p_{min}(\bm{x, s=s_{max}})`.
+
+By additionally modeling the evaluation time of a point :math:`\bm{x}` and dividing the information gain by the cost :math:`c(\bm{x}, s)` it would take to evaluate :math:`\bm{x}` on :math:`s`, Fabolas evaluate points only  on small subsets of the data and extrapolates their error to the full dataset size. For more details have a look at the paper http://arxiv.org/abs/1605.07079
+
+
+Fabolas has the same interface as RoBO`s fmin function (see :ref:`fmin`). First you have to define your objective function which now should depend on :math:`\bm{x}` and :math:`s`:
+
+.. code-block:: python
+
+        def objective_function(x, s):
+                # Train your algorithm here with x on the dataset subset with length s
+                # Estimate the validation error and the cost on the validation data set
+                return np.array([[validation_error]]), np.array([[cost]])
+
+Your objective function should return the validation error and the total cost :math:`c(\bm{x}, s)` of the point :math:`\bm{x}`. Normally the cost is the time it took to train and validate :math:`\bm{x}`.
+After defining your objective function you also have to define the input bounds for :math:`\bm{x}` and :math:`s`. Make sure that the dataset size :math:`s` is the last dimension.
+It is often a good idea to set the data set size on a log scale. 
+
+.. code-block:: python
+
+        X_lower = np.array([-10, -10, s_min])
+        X_upper = np.array([10, 10, s_max])
+
+Then you can call Fabolas by:
+
+.. code-block:: python
+
+        x_best = fabolas_fmin(objective_function, X_lower, X_upper, num_iterations=100)
+
+You can find a full example for training a support vector machine on MNIST `here <http://https://github.com/automl/RoBO/blob/development/examples/example_fmin_fabolas.py>`_
diff --git a/docs/basics.rst b/docs/basics.rst
@@ -2,8 +2,11 @@
 Basic Usage
 ===========
 
+
+.. _fmin:
+
 RoBO in a few lines of code
--------------------------
+---------------------------
 
 RoBO offers a simple interface such that you can use it as a optimizer for black box function without knowing what's going on inside. In order to do that you first have to 
 define the objective function and the bounds of the configuration space:
@@ -159,7 +162,7 @@ Saving output
 ^^^^^^^^^^^^^
 
 You can save RoBO's output by passing the parameters 'save_dir' and 'num_save'. The first parameter 'save_dir' specifies where the results will be saved and
-the second parameter 'num_save' after how many iterations the output should be saved. 
+the second parameter 'num_save' after how many iterations the output should be saved. RoBO will save the ouput both in .csv and Json format.
 
 .. code-block:: python
 
@@ -170,7 +173,7 @@ the second parameter 'num_save' after how many iterations the output should be s
                       		  save_dir="path_to_directory",
                       		  num_save=1)
 
-RoBO will save then the following information:
+RoBO will save then the following information in the CSV file:
 
  - X: The configuration it evaluated so far
  - y: Their corresponding function values
@@ -179,6 +182,17 @@ RoBO will save then the following information:
  - time_function: The time each function evaluation took
  - optimizer_overhead: The time RoBO needed to pick a new configuration
 
+Following information will be saved in Json in below shown format.
+
+.. code-block:: javascript
+	{
+	"Acquisiton":{"type" },
+	"Model":{"Y" ,"X" ,"hyperparameters" },
+	"Task":{"opt": ,"fopt": ,"original_X_lower": ,"original_X_upper": , },
+	"Solver":{"optimization_overhead" ,"incumbent_fval" ,"iteration" ,"time_func_eval" ,"incumbent" ,"runtime"  }
+	}
+
+
     
 Implementing the Bayesian optimization loop
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

diff --git a/examples/example_branin.py b/examples/example_branin.py
@@ -1,6 +1,5 @@
 '''
 Created on Jun 23, 2015
-
 @author: Aaron Klein
 '''
 

diff --git a/examples/example_fmin.py b/examples/example_fmin.py
@@ -1,8 +1,4 @@
-'''
-Created on Jul 3, 2015
 
-@author: Aaron Klein
-'''
 import numpy as np
 
 from robo.fmin import fmin
@@ -12,7 +8,8 @@
 # It gets a numpy array with shape (N,D) where N >= 1 are the number of
 # datapoints and D are the number of features
 def objective_function(x):
-    return np.sin(3 * x) * 4 * (x - 1) * (x + 2)
+    y = np.sin(3 * x) * 4 * (x - 1) * (x + 2)
+    return y
 
 # Defining the bounds and dimensions of the input space
 X_lower = np.array([0])

diff --git a/examples/example_fmin_fabolas.py b/examples/example_fmin_fabolas.py
@@ -0,0 +1,125 @@
+import os
+import sys
+import time
+import numpy as np
+
+from sklearn import svm
+
+from robo.fmin import fabolas_fmin
+
+
+# Example script to optimize the C and gamma parameter of a
+# support vector machine on MNIST with Fabolas.
+# Have a look into the paper " Fast Bayesian Optimization of Machine Learning
+# Hyperparameters on Large Datasets" (http://arxiv.org/abs/1605.07079)
+# to see how it works. Note in order run this example you need scikit-learn
+# you can install by: pip install sklearn
+
+
+def load_dataset():
+    # This function loads the MNIST data, its copied from the Lasagne tutorial
+    # We first define a download function, supporting both Python 2 and 3.
+    if sys.version_info[0] == 2:
+        from urllib import urlretrieve
+    else:
+        from urllib.request import urlretrieve
+
+    def download(filename, source='http://yann.lecun.com/exdb/mnist/'):
+        print("Downloading %s" % filename)
+        urlretrieve(source + filename, filename)
+
+    # We then define functions for loading MNIST images and labels.
+    # For convenience, they also download the requested files if needed.
+    import gzip
+
+    def load_mnist_images(filename):
+        if not os.path.exists(filename):
+            download(filename)
+        # Read the inputs in Yann LeCun's binary format.
+        with gzip.open(filename, 'rb') as f:
+            data = np.frombuffer(f.read(), np.uint8, offset=16)
+        # The inputs are vectors now, we reshape them to monochrome 2D images,
+        # following the shape convention: (examples, channels, rows, columns)
+        data = data.reshape(-1, 1, 28, 28)
+        # The inputs come as bytes, we convert them to float32 in range [0,1].
+        # (Actually to range [0, 255/256], for compatibility to the version
+        # provided at http://deeplearning.net/data/mnist/mnist.pkl.gz.)
+        return data / np.float32(256)
+
+    def load_mnist_labels(filename):
+        if not os.path.exists(filename):
+            download(filename)
+        # Read the labels in Yann LeCun's binary format.
+        with gzip.open(filename, 'rb') as f:
+            data = np.frombuffer(f.read(), np.uint8, offset=8)
+        # The labels are vectors of integers now, that's exactly what we want.
+        return data
+
+    # We can now download and read the training and test set images and labels.
+    X_train = load_mnist_images('train-images-idx3-ubyte.gz')
+    y_train = load_mnist_labels('train-labels-idx1-ubyte.gz')
+    X_test = load_mnist_images('t10k-images-idx3-ubyte.gz')
+    y_test = load_mnist_labels('t10k-labels-idx1-ubyte.gz')
+
+    # We reserve the last 10000 training examples for validation.
+    X_train, X_val = X_train[:-10000], X_train[-10000:]
+    y_train, y_val = y_train[:-10000], y_train[-10000:]
+
+    X_train = X_train.reshape(X_train.shape[0], 28 * 28)
+    X_val = X_val.reshape(X_val.shape[0], 28 * 28)
+    X_test = X_test.reshape(X_test.shape[0], 28 * 28)
+
+    # We just return all the arrays in order, as expected in main().
+    # (It doesn't matter how we do this as long as we can read them again.)
+    return X_train, y_train, X_val, y_val, X_test, y_test
+
+
+# The optimization function that we want to optimize.
+# It gets a numpy array x with shape (1,D) where D are the number of parameters
+# and s which is the ratio of the training data that is used to
+# evaluate this configuration
+def objective_function(x, s):
+
+    # Start the clock to determine the cost of this function evaluation
+    start_time = time.time()
+
+    # Shuffle the data and split up the request subset of the training data    
+    size = int(np.exp(s))
+    s_max = y_train.shape[0]
+    shuffle = np.random.permutation(np.arange(s_max))
+    train_subset = X_train[shuffle[:size]]
+    train_targets_subset = y_train[shuffle[:size]]
+
+    # Train the SVM on the subset set
+    C = np.exp(float(x[0, 0]))
+    gamma = np.exp(float(x[0, 1]))
+    clf = svm.SVC(gamma=gamma, C=C)
+    clf.fit(train_subset, train_targets_subset)
+
+    # Validate this hyperparameter configuration on the full validation data
+    y = 1 - clf.score(X_val, y_val)
+
+    c = time.time() - start_time
+
+    return np.array([[np.log(y)]]), np.array([[c]])
+
+# Load the data, change that to 
+X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()
+
+
+# We optimize s on a log scale, as we expect that the performance varies
+# logarithmically across s
+s_min = np.log(100)
+s_max = np.log(X_train.shape[0])
+
+# Defining the bounds and dimensions of the
+# input space (configuration space + environment space)
+# We also optimize the hyperparameters of the svm on a log scale
+X_lower = np.array([-10, -10, s_min])
+X_upper = np.array([10, 10, s_max])
+
+# Start Fabolas to optimize the objective function
+x_best = fabolas_fmin(objective_function, X_lower, X_upper, num_iterations=100)
+
+print x_best
+print objective_function(x_best[:, :-1], s=x_best[:, None, -1])
diff --git a/examples/example_json_dump.py b/examples/example_json_dump.py
@@ -0,0 +1,33 @@
+'''
+Created on June 5th, 2016
+
+@author: Numair Mansur (numair.mansur@gmail.com)
+'''
+
+import george
+
+from robo.maximizers.direct import Direct
+from robo.models.gaussian_process import GaussianProcess
+from robo.task.synthetic_functions.levy import Levy
+from robo.acquisition.ei import EI
+from robo.solver.bayesian_optimization import BayesianOptimization
+
+
+task = Levy()
+kernel = george.kernels.Matern52Kernel([1.0], ndim=1)
+
+
+model = GaussianProcess(kernel)
+
+ei = EI(model, task.X_lower, task.X_upper)
+
+maximizer = Direct(ei, task.X_lower, task.X_upper)
+
+bo = BayesianOptimization(acquisition_func=ei,
+                          model=model,
+                          maximize_func=maximizer,
+                          task=task
+                          ,save_dir='../JsonDumps/'
+                          )
+
+print bo.run(20)
diff --git a/examples/example_mcmc.py b/examples/example_mcmc.py
@@ -14,10 +14,9 @@
 noise = 1.0
 cov_amp = 2
 exp_kernel = george.kernels.Matern52Kernel([1.0, 1.0], ndim=2)
-noise_kernel = george.kernels.WhiteKernel(noise, ndim=2)
-kernel = cov_amp * (exp_kernel + noise_kernel)
+kernel = cov_amp * exp_kernel
 
-prior = DefaultPrior(len(kernel))
+prior = DefaultPrior(len(kernel) + 1)
 model = GaussianProcessMCMC(kernel, prior=prior,
                             chain_length=100, burnin_steps=200, n_hypers=20)
 

diff --git a/examples/example_priors.py b/examples/example_priors.py
@@ -64,10 +64,9 @@ def sample_from_prior(self, n_samples):
 config_kernel = george.kernels.Matern52Kernel(np.ones([task.n_dims]),
                                                ndim=task.n_dims)
 
-noise_kernel = george.kernels.WhiteKernel(0.01, ndim=task.n_dims)
-kernel = cov_amp * (config_kernel + noise_kernel)
+kernel = cov_amp * config_kernel
 
-prior = MyPrior(len(kernel))
+prior = MyPrior(len(kernel) + 1)
 
 model = GaussianProcessMCMC(kernel, prior=prior, burnin=burnin,
                             chain_length=chain_length, n_hypers=n_hypers)
@@ -82,8 +81,8 @@ def sample_from_prior(self, n_samples):
 bo = BayesianOptimization(acquisition_func=acquisition_func,
                           model=model,
                           maximize_func=maximizer,
-                          task=task)
-
+                          task=task
+                          )
 bo.run(20)
 
 
diff --git a/examples/example_rf.py b/examples/example_rf.py
@@ -24,8 +24,7 @@
 # Define the acquisition function
 acquisition_func = EI(model,
                      X_upper=branin.X_upper,
-                     X_lower=branin.X_lower,
-                     par=0.1)
+                     X_lower=branin.X_lower)
 
 # Strategy of estimating the incumbent
 rec = PosteriorMeanAndStdOptimization(model, branin.X_lower,

diff --git a/examples/example_walker.py b/examples/example_walker.py
@@ -0,0 +1,33 @@
+import george
+import numpy as np
+
+from robo.models.gaussian_process_mcmc import GaussianProcessMCMC
+from robo.acquisition.ei import EI
+from robo.maximizers.direct import Direct
+from robo.task.controlling_tasks.walker import Walker
+from robo.solver.bayesian_optimization import BayesianOptimization
+from robo.priors.default_priors import DefaultPrior
+from robo.acquisition.integrated_acquisition import IntegratedAcquisition
+
+
+
+task = Walker()
+test = '/test'
+
+kernel = 1 * george.kernels.Matern52Kernel(np.ones([task.n_dims]),ndim=task.n_dims)
+prior = DefaultPrior(len(kernel) + 1)
+model = GaussianProcessMCMC(kernel, prior=prior,
+                            chain_length=100, burnin_steps=200, n_hypers=8)
+
+ei = EI(model, task.X_lower, task.X_upper)
+acquisition_func = IntegratedAcquisition(model, ei, task.X_lower, task.X_upper)
+
+maximizer = Direct(acquisition_func, task.X_lower, task.X_upper)
+
+bo = BayesianOptimization(acquisition_func=acquisition_func,
+                          model=model,
+                          maximize_func=maximizer,
+                          task=task,
+                          save_dir = test)
+
+print bo.run(2)
diff --git a/requirements.txt b/requirements.txt
@@ -5,6 +5,6 @@ scipy>=0.13.3
 matplotlib>=1.3.1
 cma
 direct
-george
+git+https://github.com/sfalkner/george.git
 git+https://github.com/SheffieldML/GPy.git
 git+https://bitbucket.org/aadfreiburg/random_forest_run