## Introduction

Hyperopt提供了一个优化接口，这个接口接受一个评估函数和参数空间，能计算出参数空间内的一个点的损失函数值。用户还要指定空间内参数的分布情况。 

Hyheropt四个重要的因素：指定需要最小化的函数，搜索的空间，采样的数据集(trails database)（可选），搜索的算法（可选）。

首先，定义一个目标函数,接受一个变量,计算后返回一个函数的损失值，比如要最小化函数q(x,y) = x^2 + y^2

指定搜索的算法，算法也就是hyperopt的fmin函数的algo参数的取值。当前支持的算法由随机搜索(对应是hyperopt.rand.suggest)，模拟退火(对应是hyperopt.anneal.suggest)，TPE算法。

关于参数空间的设置，比如优化函数q，输入
* fmin(q,space=hp.uniform(‘a’,0,1))

hp.uniform函数的第一个参数是标签，每个超参数在参数空间内必须具有独一无二的标签。hp.uniform指定了参数的分布。其他的参数分布比如hp.choice返回一个选项，选项可以是list或者tuple.options可以是嵌套的表达式，用于组成条件参数。 

* hp.pchoice(label,p_options)以一定的概率返回一个p_options的一个选项。这个选项使得函数在搜索过程中对每个选项的可能性不均匀。 
* hp.uniform(label,low,high)参数在low和high之间均匀分布。 
* hp.quniform(label,low,high,q),参数的取值是round(uniform(low,high)/q)*q，适用于那些离散的取值。 
* hp.loguniform(label,low,high)绘制exp(uniform(low,high)),变量的取值范围是[exp(low),exp(high)] 
* hp.randint(label,upper) 返回一个在[0,upper)前闭后开的区间内的随机整数。

搜索空间可以含有list和dictionary.

In [1]:
from hyperopt import hp

In [2]:
list_space = [
    hp.uniform('a',0,1),
    hp.loguniform('b',0,1)
]

tuple_space = (hp.uniform('a',0,1),hp.loguniform('b',0,1))

dict_space = {'a':hp.uniform('a',0,1),'b':hp.loguniform('b',0,1)}

### 简单例子

In [3]:
from hyperopt import hp,fmin,rand,tpe,space_eval

In [4]:
def q(args):
    x,y=args
    return x**2-2*x+1*y**2

In [5]:
space = [hp.randint('x',5),hp.randint('y',5)]

In [6]:
best = fmin(q,space,algo=rand.suggest,max_evals=10)

In [7]:
print(best)

{'x': 1, 'y': 0}


#### mnist数据集

In [11]:
import numpy as np
import pandas as pd
from keras.datasets import mnist
import xgboost as xgb
from random import shuffle
from xgboost.sklearn import XGBClassifier
from sklearn.cross_validation import cross_val_score
import pickle
import time
from hyperopt import fmin,tpe,space_eval,rand,Trials,partial,STATUS_OK

In [9]:
(x_train,y_train),(x_test,y_test) = mnist.load_data()

In [12]:
x_train = x_train.reshape(-1,784).astype('float')/255
x_test = x_test.reshape(-1,784).astype('float')/255

In [13]:
y_train = y_train==5
y_test = y_test==5

In [24]:
def GBM(argsDict):
    max_depth = argsDict['max_depth'] + 5
    n_estimators = argsDict['n_estimators'] * 5 + 50
    learning_rate = argsDict['learning_rate'] * 0.02 + 0.05
    subsample = argsDict['subsample'] * 0.1 + 0.7
    min_child_weight = argsDict['min_child_weight'] + 1
    print("max_depth: {}".format(max_depth))
    print("n_estimator: {}".format(n_estimators))
    print("learning_rate: {}".format(learning_rate))
    print("subsample: {}".format(subsample))
    print("min_child_weight: {}".format(min_child_weight))
    global x_train,y_train
    
    gbm = xgb.XGBClassifier(nthread=2,
                            max_depth=max_depth,
                            n_estimators=n_estimators,
                            learning_rate=learning_rate,
                            subsample=subsample,
                            min_child_weight=min_child_weight,
                            max_delta_step=10,
                            objective='binary:logistic')
    metric = cross_val_score(estimator=gbm,X=x_train[:1000],y=y_train[:1000],cv=5,scoring='roc_auc').mean()
    print(metric)
    return -metric

In [25]:
space = {"max_depth": hp.randint('max_depth',15),
         'n_estimators': hp.randint('n_estimators',15),
         'learning_rate':hp.randint('learning_rate',6),
         'subsample':hp.randint('subsample',4),
         'min_child_weight':hp.randint('min_child_weight',5)}

In [26]:
algo = partial(tpe.suggest,n_startup_jobs=1)

In [27]:
best = fmin(fn=GBM,space=space,algo=algo,max_evals=40)

max_depth: 10
n_estimator: 70
learning_rate: 0.13
subsample: 0.7
min_child_weight: 2
0.9667390781813578
max_depth: 10
n_estimator: 70
learning_rate: 0.13
subsample: 0.7
min_child_weight: 2
0.9667390781813578
max_depth: 6
n_estimator: 110
learning_rate: 0.11
subsample: 1.0
min_child_weight: 4
0.9689879004069175
max_depth: 6
n_estimator: 65
learning_rate: 0.11
subsample: 1.0
min_child_weight: 4
0.969673333354636
max_depth: 15
n_estimator: 65
learning_rate: 0.11
subsample: 0.7999999999999999
min_child_weight: 1
0.9753883043065944
max_depth: 15
n_estimator: 120
learning_rate: 0.07
subsample: 0.7999999999999999
min_child_weight: 1
0.9774039504484403
max_depth: 8
n_estimator: 120
learning_rate: 0.07
subsample: 0.8999999999999999
min_child_weight: 1
0.9786530180103895
max_depth: 8
n_estimator: 60
learning_rate: 0.05
subsample: 0.8999999999999999
min_child_weight: 5
0.9567555380873205
max_depth: 8
n_estimator: 120
learning_rate: 0.09
subsample: 0.8999999999999999
min_child_weight: 3
0.97015410

In [23]:
print(best)

{'learning_rate': 4, 'max_depth': 4, 'min_child_weight': 0, 'n_estimators': 4, 'subsample': 2}


In [28]:
print(best)

{'learning_rate': 1, 'max_depth': 3, 'min_child_weight': 0, 'n_estimators': 14, 'subsample': 2}
