一般的线性模型:
$$
y = w^Tx+b
$$
广义线性模型 (将 $y$ 的值映射到 $[0,1]$ 内):
$$
y = g^{-1}(w^Tx+b)
$$
常用函数—— logistic function:
$$
y = \frac{1}{1+e^{-(w^Tx+b)}}
$$
说明:

$x=(x_1,...,)$ 表示属性

这里的 $y$ 可以看作类后验概率 $p(y=1|x)$, 就把它当作一个概率, 属性是 $x$ 时, $y=1$ 的概率.
所以 $1-y = p(y=0|x)$.

接下来就是确定参数 $\omega, b$

方便起见, 令
$$
\beta = (w;b)\quad x = (x;1)
$$

直接给结论: 我们需要最小化下面这个式子
$$
l(\beta) = \sum_{i=1}^{m}\big(-y_i\beta^Tx_i+\ln(1+e^{\beta^Tx_i})\big)
$$
找到使上面式子最小的 $\beta$, 从而可以得知参数 $w,b$.

$$
\boldsymbol{\beta}^{t+1}=\boldsymbol{\beta}^t-\left(\frac{\partial^2\ell(\boldsymbol{\beta})}{\partial\boldsymbol{\beta}\partial\boldsymbol{\beta}^\mathrm{T}}\right)^{-1}\frac{\partial\ell\left(\boldsymbol{\beta}\right)}{\partial\boldsymbol{\beta}}
$$
其中
$$
\begin{aligned}
\frac{\partial\ell\left(\boldsymbol{\beta}\right)}{\partial\beta}& =-\sum_{i=1}^m\hat{\boldsymbol{x}}_i(y_i-p_1(\hat{\boldsymbol{x}}_i;\boldsymbol{\beta})),  \\
\frac{\partial^2\ell\left(\boldsymbol{\beta}\right)}{\partial\boldsymbol{\beta}\partial\boldsymbol{\beta}^\mathrm{T}}& =\sum_{i=1}^m\hat{\boldsymbol{x}}_i\hat{\boldsymbol{x}}_i^\mathrm{T}p_1(\hat{\boldsymbol{x}}_i;\boldsymbol{\beta})(1-p_1(\hat{\boldsymbol{x}}_i;\boldsymbol{\beta})). 
\end{aligned}
$$

In [1]:
""" 
P69, 习题 3.3
import some essential data and package
"""
import sys
sys.path.append('../dataset')
import data3
import numpy as np

"""  
give data tha we need
"""
data = data3.createDataSet()

Data = np.zeros((17,3))
for i in range(len(data[0][:])):
    Data[i,:-1] = data[0][i][6:8]
Data[:8,2] = 1   # 1 means good, 0 means bad, Data = [x;y]

In [2]:
"""
every sample: X = (x,1); x is from Data!
"""
X = np.ones((17,3))
X[:,:-1] = Data[:,:-1]
Y = Data[:,-1]

In [3]:
def p(_x,_beta):
    return np.e**(_beta@_x)/(1+np.e**(_beta@_x))

def dbeta(_X,_Y,_beta):
    iterable = (x*(y-p(x,_beta)) for x, y in zip(_X, _Y))
    return -np.sum(np.fromiter(iterable, dtype=np.dtype((float, 3))), axis=0)

def ddbeta(_X,_beta):
    iterable = (np.outer(x,x)*p(x,_beta)*(1-p(x,_beta)) for x in _X)
    return np.sum(np.fromiter(iterable, dtype=np.dtype((float, (3,3)))), axis=0)

In [4]:
""" 
Initial guess beta=(w;b)
"""
beta = np.array([0.5,2,1])
max_iteration = 1000
for iter in range(max_iteration):
    beta_new = beta - np.linalg.inv(ddbeta(X,beta))@dbeta(X,Y,beta)
    if iter%100 == 0:
        print("error is:",np.max(np.abs(beta_new-beta)))
    beta = 0.1*beta_new+0.9*beta

error is: 12.049592391541125
error is: 0.0005829916964312787
error is: 1.548661643369087e-08
error is: 4.121147867408581e-13
error is: 2.220446049250313e-15
error is: 2.220446049250313e-15
error is: 2.220446049250313e-15
error is: 2.220446049250313e-15
error is: 2.220446049250313e-15
error is: 2.220446049250313e-15


In [10]:
for index, x in enumerate(X[:8]):
    print("the probability of", index,"is good melon is", p(x,beta))

the probability of 0 is good melon is 0.9715913420159087
the probability of 1 is good melon is 0.9384079673856944
the probability of 2 is good melon is 0.7066382101925186
the probability of 3 is good melon is 0.8135342097453221
the probability of 4 is good melon is 0.5048058213295722
the probability of 5 is good melon is 0.4530055563123081
the probability of 6 is good melon is 0.26036934431626446
the probability of 7 is good melon is 0.3997031501480675


In [13]:
for index, x in enumerate(X[8:]):
    print("the probability of", index+8,"is good melon is", p(x,beta))

the probability of 8 is good melon is 0.2339772217892385
the probability of 9 is good melon is 0.42110689643646904
the probability of 10 is good melon is 0.050146188396586834
the probability of 11 is good melon is 0.10851898057605383
the probability of 12 is good melon is 0.4025673048474584
the probability of 13 is good melon is 0.5312977379539784
the probability of 14 is good melon is 0.7926504989306007
the probability of 15 is good melon is 0.11608022112013412
the probability of 16 is good melon is 0.2955993485038269
