# 多分类逻辑回归和神经网络练习
# (Multiclass Logistic Regression and Neural Network Practice)

## 项目背景

#### 项目背景：
* 1.项目基于吴恩达教授的《机器学习》课程。
* 2.数据均来源于课程配套资料。


#### 项目目的：
* 1、使用多分类逻辑回归辨别手写数字（0-9）。
* 2、使用神经网络辨别手写数字（0-9）。
* 3、构建多分类逻辑回归的代价函数、梯度下降函数。
* 4、构建神经网络的预测函数。


#### 数据说明：
* 1、分析目的：通过训练集中5000张手写数字（像素20\*20）及正确数字得到辨别手写数字的模型。
* 2、数据说明：ex3data1：X、y数据集
    * X：5000行 * 400列，即5000个手写数字，每列代表数字一个像素点的灰度值
    * y：5000行 * 1列，即对应的5000个数字。
* 3、数据说明：ex3weights：神经网络参数（也叫权重）文件
    * Theta1：Layer1参数
    * Theta2：Layer2参数

# 导入相关的库

In [264]:
import numpy as np
import pandas as pd

import plotly.express as px
import plotly as py
import plotly.graph_objs as go
import cufflinks as cf
from plotly.offline import iplot,init_notebook_mode
cf.go_offline(connected=True)
init_notebook_mode(connected=True)

import scipy.io as sio #因为是mat文件，需要用scipy导入
from sklearn.metrics import classification_report

# 1.导入数据集

In [307]:
data = sio.loadmat("E:\Learning\python\data\wudaen Machine Learning\code\ex3-neural network\ex3data1.mat")

In [308]:
X=data["X"]
y=data['y'].reshape(data['y'].shape[0])

# 2.数据可视化

In [267]:
# 我们来看看数据
pick_one = np.random.randint(0, 5000) #未设定种子，每次都能查看不同的值
img = np.array(X[pick_one,:].reshape((20,20)).T)
fig = px.imshow(img,color_continuous_scale='gray')
fig.update_layout(width=100,height=100,coloraxis_showscale=False,
                  margin=dict(l=10, r=10, b=10, t=10),
                  xaxis=dict(showticklabels=False),yaxis=dict(showticklabels=False))
fig.show()
print('这是数字 {}'.format(y[pick_one]))

这是数字 4


# 3.多分类逻辑回归

## 3.1.数据X和y的处理

In [309]:
# 增加全部为1的x0列
X = np.insert(X, 0, values=np.ones(X.shape[0]), axis=1)
X.shape

(5000, 401)

In [312]:
# 因共有10个分类，将y分成10列
y_matrix = []

for k in range(1, 11):
    y_matrix.append((y == k).astype(int))    # 
# 当k=10，表示的是数字1，所以我们把最后一列调到第一列
y_matrix = [y_matrix[-1]] + y_matrix[:-1]
y_matrix = np.array(y_matrix)
y_matrix.shape

(10, 5000)

* 每一列分别代表0-9数字的真假，0表示假，1表示真。

## 3.2.定义代价函数和梯度下降函数

### 1）sigmoid函数
g 代表一个常用的逻辑函数（logistic function）为S形函数（Sigmoid function），公式为： \\[g\left( z \right)=\frac{1}{1+{{e}^{-z}}}\\] 
合起来，我们得到逻辑回归模型的假设函数： 
	\\[{{h}_{\theta }}\left( x \right)=\frac{1}{1+{{e}^{-{{\theta }^{T}}X}}}\\] 

In [284]:
def sigmoid(z):
    h=1 / (1+np.exp(-z))
    return h

### 2）正则化代价函数
$$J\left( \theta  \right)=\frac{1}{m}\sum\limits_{i=1}^{m}{[-{{y}^{(i)}}\log \left( {{h}_{\theta }}\left( {{x}^{(i)}} \right) \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1-{{h}_{\theta }}\left( {{x}^{(i)}} \right) \right)]}+\frac{\lambda }{2m}\sum\limits_{j=1}^{n}{\theta _{j}^{2}}$$

In [314]:
def reg_cost(theta,X,y,reg_rate):
    temp = ((-y) * np.log(sigmoid(X @ theta)))-((1-y) * np.log(1-sigmoid(X @ theta)))
    J = np.sum(temp)/len(X)
    reg = (reg_rate / (2 * len(X))) * np.sum(np.power(theta[1:], 2))
    return J+reg

In [323]:
# 计算theta为0时,y是否为0的代价函数
theta = np.zeros(X.shape[1])
reg_cost(theta,X,y_matrix[0],1)

0.6931471805599454

### 3）正则化梯度下降函数
* 偏导数计算，转化为向量化计算： $\frac{1}{m} X^T( Sigmoid(X\theta) - y )+\frac{\lambda }{m}{{\theta }_{j}}$
$$\frac{\partial J\left( \theta  \right)}{\partial {{\theta }_{j}}}=\frac{1}{m}\sum\limits_{i=1}^{m}{({{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}})x_{_{j}}^{(i)}}+\frac{\lambda }{m}{{\theta }_{j}}$$

In [273]:
# 正规化梯度下降函数
def reg_gradient(theta, X, y, reg_rate):
    
    reg= np.concatenate((np.array([0]),(reg_rate / len(X)) * theta[1:]),axis=0)
    grad = (X.T @ (sigmoid(X @ theta) - y)) / len(X)
    
    return grad+reg

In [324]:
# 计算theta为0时,y是否等于0的梯度步长
theta = np.zeros(X.shape[1])
reg_gradient(theta,X,y_matrix[0],1).shape

(401,)

## 3.3.逻辑回归函数

In [275]:
from scipy import optimize as opt
def logistic_regression(X, y, reg_rate):
    theta = np.zeros(X.shape[1])
    
    # 优化函数
    res = opt.minimize(fun=reg_cost,
                       x0=theta,
                       args=(X, y, reg_rate),
                       method='TNC',
                       jac=reg_gradient,
                       options={'disp': True})
    # 得到最终的theta
    final_theta = res.x

    return final_theta

In [325]:
k_theta = np.array([logistic_regression(X, y_matrix[k],1) for k in range(10)])
print(k_theta.shape)

(10, 401)


## 3.4.检验模型准确度
$${h_{\theta}}\left({x}\right)=X\times\theta^T$$

In [326]:
np.set_printoptions(suppress=True)
k_theta

array([[-5.40542751,  0.        ,  0.        , ..., -0.00011661,
         0.00000788,  0.        ],
       [-2.38187334,  0.        ,  0.        , ...,  0.00130433,
        -0.        ,  0.        ],
       [-3.18303389,  0.        ,  0.        , ...,  0.00446341,
        -0.00050887,  0.        ],
       ...,
       [-1.9049144 ,  0.        ,  0.        , ..., -0.00052751,
         0.00006621,  0.        ],
       [-7.98700752,  0.        ,  0.        , ..., -0.00008946,
         0.00000721,  0.        ],
       [-4.57358931,  0.        ,  0.        , ..., -0.00133391,
         0.00009969,  0.        ]])

In [330]:
prob_matrix = sigmoid(X @ k_theta.T)
y_pred = np.argmax(prob_matrix, axis=1)#返回沿轴axis最大值的索引

In [331]:
y_pred

array([0, 0, 0, ..., 9, 9, 7], dtype=int64)

In [332]:
y_pred.shape

(5000,)

In [333]:
y_answer = y.copy()
y_answer[y_answer==10] = 0

In [334]:
print(classification_report(y_answer, y_pred))

              precision    recall  f1-score   support

           0       0.97      0.99      0.98       500
           1       0.95      0.99      0.97       500
           2       0.95      0.92      0.93       500
           3       0.95      0.91      0.93       500
           4       0.95      0.95      0.95       500
           5       0.92      0.92      0.92       500
           6       0.97      0.98      0.97       500
           7       0.95      0.95      0.95       500
           8       0.93      0.92      0.92       500
           9       0.92      0.92      0.92       500

    accuracy                           0.94      5000
   macro avg       0.94      0.94      0.94      5000
weighted avg       0.94      0.94      0.94      5000



## 3.5.调用sklearn库

In [339]:
X2=data["X"]
y2=data['y'].reshape(data['y'].shape[0])

In [340]:
from sklearn.linear_model import LogisticRegression
for multi_class in ('multinomial', 'ovr'): #OVR表示一对多，MULTINOMIAL表示两两比较即OVO(一对一)
    clf = LogisticRegression(solver='sag',max_iter=200,random_state=42,
                             multi_class=multi_class).fit(X2, y2)

    # print the training scores
    print("training score : %.3f (%s)" % (clf.score(X2, y2), multi_class))

training score : 0.962 (multinomial)
training score : 0.945 (ovr)


# 4.神经网络
<img style="float: left;" src="code/img/nn_model.png">

## 4.1.加载参数文件

In [341]:
weights=sio.loadmat('E:\Learning\python\data\wudaen Machine Learning\code\ex3-neural network\ex3weights.mat')

In [343]:
theta1=weights['Theta1']
theta2=weights['Theta2']
theta1.shape, theta2.shape

((25, 401), (10, 26))

In [347]:
X3 = data["X"]
y3 = data['y'].reshape(data['y'].shape[0])
X3 = np.insert(X3, 0, values=np.ones(X3.shape[0]), axis=1) 
X3.shape,y3.shape

((5000, 401), (5000,))

## 4.2.正向传播预测（forward propagation）

In [353]:
# 计算第一层
a1 = X3
z2 = a1 @ theta1.T
z2.shape

(5000, 25)

In [357]:
# 计算第二层
z2 = np.insert(z2, 0, values=np.ones(z2.shape[0]), axis=1)
a2 = sigmoid(z2)
z3 = a2 @ theta2.T
z3.shape

(5000, 10)

In [371]:
# 求出预测的y值
a3 = sigmoid(z3)
y_pred3 = np.argmax(a3, axis=1)+1
y_pred3.shape

(5000,)

## 4.3.检验模型准确率

In [365]:
print(classification_report(y, y_pred3))

              precision    recall  f1-score   support

           1       0.97      0.98      0.97       500
           2       0.98      0.97      0.97       500
           3       0.98      0.96      0.97       500
           4       0.97      0.97      0.97       500
           5       0.98      0.98      0.98       500
           6       0.97      0.99      0.98       500
           7       0.98      0.97      0.97       500
           8       0.98      0.98      0.98       500
           9       0.97      0.96      0.96       500
          10       0.98      0.99      0.99       500

    accuracy                           0.98      5000
   macro avg       0.98      0.98      0.98      5000
weighted avg       0.98      0.98      0.98      5000



# 鸣谢：
感谢黄海广博士提供的读书笔记及各项资料，我会在机器学习路上继续加油！