[toc]
# Sklearn GridSearch


## 暴力搜索 GridSearchCV

原型

```
class sklearn.model_selection.GridSearchCV(estimator, param_grid, scoring=None, n_jobs=None, iid='deprecated', refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score=nan, return_train_score=False)[source]
```

参数：

- estimator 一个学习器对象
- param_grid 字典
- scoring : str, callable, list/tuple or dict, default=None 
    - 一般取 None 就可以: If None, the estimator's score method is used.
- cv 交叉验证的折数
    - None 使用默认的3折交叉验证

属性：

- best_params_ 返回最优的参数
- best_estimator_ 直接返回最优的学习器对象（注意：这个对象是训练好的。因此可以直接使用，不需要重新训练）


In [1]:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression

iris = load_iris()

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target)

lr = LogisticRegression()
tuned_params = {'C': [0.01, 0.1, 0.5, 1, ]}

clf = GridSearchCV(lr, tuned_params, cv=5)
print(clf.fit(X_train, y_train))

print(clf.best_params_)
# 使用最优学习器
clf.best_estimator_ # 最优学习器

GridSearchCV(cv=5, error_score=nan,
             estimator=LogisticRegression(C=1.0, class_weight=None, dual=False,
                                          fit_intercept=True,
                                          intercept_scaling=1, l1_ratio=None,
                                          max_iter=100, multi_class='auto',
                                          n_jobs=None, penalty='l2',
                                          random_state=None, solver='lbfgs',
                                          tol=0.0001, verbose=0,
                                          warm_start=False),
             iid='deprecated', n_jobs=None,
             param_grid={'C': [0.01, 0.1, 0.5, 1]}, pre_dispatch='2*n_jobs',
             refit=True, return_train_score=False, scoring=None, verbose=0)
{'C': 1}


0.9473684210526315

预测时是使用最优学习器进行评分和预测的。

In [3]:
clf.predict(X_test)  # 相当于 clf.best_estimator_.predict(X_testk)

array([1, 1, 1, 2, 2, 0, 2, 1, 1, 1, 2, 1, 0, 0, 0, 2, 2, 1, 1, 2, 2, 2,
       1, 2, 2, 2, 0, 0, 2, 0, 0, 2, 0, 2, 0, 2, 1, 1])

## 自定义评分函数

`GridSearchCV` 有一个 scoring 参数，可以指定评分函数。下面我们自定义一个评分函数

In [11]:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import make_scorer
import numpy as np

iris = load_iris()

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target)

lr = LogisticRegression()
tuned_params = {'C': [0.01, 0.1, 0.5, 1, ]}

def mse(y, yhat):
    return np.mean((y-yhat) ** 2)

# 自定义的评分函数需要用 make_scorer 包装
scorer = make_scorer(mse, greater_is_better=False)

clf = GridSearchCV(lr, tuned_params, cv=5, scoring=scorer)
clf.fit(X_train, y_train)

GridSearchCV(cv=5, error_score=nan,
             estimator=LogisticRegression(C=1.0, class_weight=None, dual=False,
                                          fit_intercept=True,
                                          intercept_scaling=1, l1_ratio=None,
                                          max_iter=100, multi_class='auto',
                                          n_jobs=None, penalty='l2',
                                          random_state=None, solver='lbfgs',
                                          tol=0.0001, verbose=0,
                                          warm_start=False),
             iid='deprecated', n_jobs=None,
             param_grid={'C': [0.01, 0.1, 0.5, 1]}, pre_dispatch='2*n_jobs',
             refit=True, return_train_score=False,
             scoring=make_scorer(mse, greater_is_better=False), verbose=0)