## 感知机（perceptron）是二分类线性分类模型，是神经网络和支持向量机（SVM）的基础。

### MODEL: $f(x)=sign(w^T\cdot x+b)$, $sign(x)=\left\{\begin{array}{cc} +1, & x\geq 0\\ -1, & x<0\ values \end{array}\right.$

### LOSS: $loss=-\sum_{x_i\in M}y_i(w^T\cdot x_i+b)$,$M$是将$x_i$分错类的集合。

### OPTIMIZER: Stochastic Gradient Descent, $w_{i+1}=w_{i}+\eta y_i\cdot x_i, b_{i+1}=b_{i}+\eta y_i$ if $x_i \in M_i$, $\eta$是步长

### THEORY: 
1.有解性：如果线性可分，那么一定通过SGD一定可以找到解：

proof: 为了方便描述，我们将$w^T\cdot x+b$看作是$w^T\cdot x$，假设$w_{0}=0$，$||w^*||=1$为解$i.e.y_i w^{*T}\cdot x_i > \gamma >0$，$||x||<R$

那么有$w^{*T}w_{k+1}=w^{*T}w_{k}+\eta y_k w^{*T}\cdot x_k>w^{*T}w_{k}+\eta\gamma>k\eta\gamma$

还有$||w_{k+1}||^2=||w_{k}+\eta y_k\cdot x_k||^2<||w_{k}||^2+||\eta y_k\cdot x_k||^2 < k\eta||y_k\cdot x_k||^2$

显然有$||w_{k+1}||>w^{*T}w_{k+1}||$, $i.e.\ k{\eta R}^2 > k^2{\eta\gamma}^2 \Longrightarrow k<\frac{R^2}{\gamma^2}$，即经过有限步一定可以得到最优解。

2.调整次数越多的$x_i$距离分解超平面越近

TRICK: 假设$w_{0}=0$，那么$w_{k}=\sum_{i=0}^{k-1}\eta y_kx_k=\sum_{j=0}^{n}\alpha_jy_kx_k$，$\alpha_j$是第$j$个样本更新的次数和步长的乘积，那么每次计算$sign(w^T\cdot x+b)$时，我们可以利用提前计算好的储存好的$x_i^T\cdot x_j$来加速计算，特别是当维度比较大的时候，加速效果更好。感觉这也是SVM核技巧的思想来源。

## sklearn中的perceptron介绍
class sklearn.linear_model.perceptron
可进行多分类，采用的是OVA (One Versus All）策略
Parameters:
- penalty: regularization term, option: None, ‘l2’ or ‘l1’ or ‘elasticnet’,default=None
- alpha : Constant that multiplies the regularization term if regularization is used. Defaults to 0.0001
- fit_intercept: 是否有截距b. Defaults to True.
- max_iter : 最大迭代epochs数.
- tol : 当loss > previous_loss - tol时停止迭代。Defaults to None. Defaults to 1e-3 from 0.21.
- shuffle : bool, optional, default True. Whether or not the training data should be shuffled after each epoch.
- verbose: integer, 显示详细信息的程度，defaults to 0
- eta0: double, 步长, defaults to 1.
- n_jobs: integer, 使用CPU的数目，computation. -1 means ‘all CPUs’. Defaults to 1.
- random_state: int, 随机种子，default None
- class_weight: dict, {class_label: weight} or “balanced” or None, optional. 每一类的权重，如果选“balanced”，则权重与出现的频次乘反比
- warm_start : bool, optional. 是否接着上次训练的结果继续训练。

Attributes:
- coef_ : w的值, shape = [1, n_features] if n_classes == 2 else [n_classes, n_features]
- intercept_ : b的值, shape = [1] if n_classes == 2 else [n_classes]
- n_iter_ : 迭代次数

Methods:
- decision_function(X): 相当于求$f(x)=sign(w^T\cdot x+b)$，二分类返回(n_sample,)，n分类返回(n_sample, n_class)
- fit(X, y, coef_init=None, intercept_init=None, sample_weight=None): Fit linear model with Stochastic Gradient Descent.
- get_params(deep=True)	Get parameters for this estimator.
- predict(X): Predict class labels for samples in X.
- score(X, y[, sample_weight]): 返回acc
- set_params(*args, **kwargs)	
- sparsify(), densify(): 把参数稀疏化

P.S. Perceptron与SGDClassifier共享底层实现。Perceptron()==SGDClassifier(loss="perceptron").

In [62]:
from sklearn.datasets import load_breast_cancer #乳腺癌
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Perceptron
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import acc, make_scorer
data = load_breast_cancer()
train_feats, val_feats, train_labs, val_labs = train_test_split(data["data"], data["target"], test_size=0.2, random_state = 0)

In [79]:
model = Perceptron(penalty="l1", alpha=1)#发现加正则化在验证集上效果会好
model = model.fit(train_feats, train_labs)
print("train:",model.score(train_feats, train_labs))
print("val:",model.score(val_feats, val_labs))

train: 0.9142857142857143
val: 0.9122807017543859


In [77]:
from sklearn.metrics import make_scorer, accuracy_score
score = make_scorer(accuracy_score)

model = Perceptron()
param_grid = {"penalty":["l1","l2"], "alpha":[0.01, 0.1, 1, 2]}
gsearch = GridSearchCV(model, param_grid, scoring=score)
gsearch = gsearch.fit(train_feats, train_labs)
print(gsearch.best_params_)