# logisitc算法的gradient ascent推导

## 预备信息  
$X: 自变量矩阵, n \times p$, 即$X$的维度为n行p列。    
$\beta : 系数向量, p \times 1$,即$\beta$维度为p行1列。对应于代码里边的weights    
$y: 因变量向量, n \times 1$,即$y$的维度为n行1列。    
$p: 预测概率向量, n \times 1$,即$p$的维度为n行1列。
即:

\begin{equation*}
    \mathbf{X} = \left(
      \begin{array}{ccc}
        x_{11} & x_{12} & \ldots & x_{1p}\\
        x_{21} & x_{22} & \ldots  & x_{2p}\\
        \vdots & \vdots & \ddots & \vdots\\
        x_{n1} & x_{n2} & \ldots & x_{np}
      \end{array} \right) ,\quad
       \mathbf{y} = \left(
      \begin{array}{ccc}
        y_{1} \\
        y_{2} \\
        \vdots \\
        y_{n} 
      \end{array} \right) ,\quad
       \mathbf{\beta} = \left(
      \begin{array}{ccc}
        \beta_{1} \\
        \beta_{2} \\
        \vdots \\
        \beta_{p} 
      \end{array} \right), \quad
       \mathbf{p} = \left(
      \begin{array}{ccc}
        p_{1} \\
        p_{2} \\
        \vdots \\
        p_{n} 
      \end{array} \right)
  \end{equation*}

## 正式推导

接下来给出logisitc的优化函数：

$$J(\beta) = \prod_{i=1}^n p_{i}^{y_i} (1-p_i)^{1-y_i}, \quad p_i = \frac{1}{1+exp{(-X^{(i)} \beta)}}, \quad X^{(i)}为矩阵X的第i行$$

对上边式子取对数，可得：

$$J(\beta) = \sum_{i=1}^n {y_i}log(p_{i})  + {(1-y_i)}log(1-p_i), \quad p_i = \frac{1}{1+exp{(-X^{(i)} \beta)}}, \quad X^{(i)}为矩阵X的第i行$$

对$J(\beta)$关于$\beta_{j}$求导，即：

$$\frac{\partial J(\beta)}{\partial \beta_j}  = \frac{\partial J(\beta)}{\partial p_i} \cdot \frac{\partial p_i}{\partial (X^{(i)} \beta)} \cdot \frac{\partial (X^{(i)} \beta)}{\partial \beta_j}$$

接着对分别计算三个偏导数:

\begin{align*}
\frac{\partial J(\beta)}{\partial p_i} &= \sum_{i=1}^n (\frac{y_i}{p_i} - \frac{1-y_i}{1-p_i})= \sum_{i=1}^n \left[y_i \cdot (1+exp{(-X^{(i)} \beta)}) - (1-y_i)(1+exp{(X^{(i)} \beta)}) \right]\\ 
\end{align*}


\begin{align*}
\frac{\partial p_i}{\partial (X^{(i)} \beta)} = \frac{exp{(-X^{(i)} \beta})}{(1+exp{(-X^{(i)} \beta)})^2}
\end{align*}

$$\frac{\partial (X^{(i)} \beta)}{\partial \beta_j} = x_{ij}, \quad x_{ij}为X矩阵的i行j列$$

将上述结果代入$\frac{\partial J(\beta)}{\partial \beta_j}  = \frac{\partial J(\beta)}{\partial p_i} \cdot \frac{\partial p_i}{\partial (X^{(i)} \beta)} \cdot \frac{\partial (X^{(i)} \beta)}{\partial \beta_j}$中，得

\begin{align*}
\frac{\partial J(\beta)}{\partial \beta_j} &= \frac{\partial J(\beta)}{\partial p_i} \cdot \frac{\partial p_i}{\partial (X^{(i)} \beta)} \cdot \frac{\partial (X^{(i)} \beta)}{\partial \beta_j}\\
& = \sum_{i=1}^n \left[y_i \cdot (1+exp{(-X^{(i)} \beta)}) - (1-y_i)(1+exp{(X^{(i)} \beta)}) \right] \cdot \frac{exp{(-X^{(i)} \beta})}{(1+exp{(-X^{(i)} \beta)})^2} \cdot x_{ij}\\ 
& = \sum_{i=1}^n \left[y_i \cdot \frac{exp{(-X^{(i)} \beta)}}{(1+exp{(-X^{(i)} \beta)})} - (1-y_i)\frac{1}{(1+exp{(-X^{(i)} \beta)})} \right]  \cdot x_{ij}\\
& = \sum_{i=1}^n \left[y_i \cdot (1-p_i) - (1-y_i)p_i \right]  \cdot x_{ij}\\
& = \sum_{i=1}^n (y_i - p_i) \cdot x_{ij}
\end{align*}


由此，得出logistic系数更新的梯度上升公式:

\begin{align*}
\beta_j &= \beta_j + \alpha \cdot \frac{\partial J(\beta)}{\partial \beta_j}\\
& = \alpha \cdot \sum_{i=1}^n (y_i - p_i) \cdot x_{ij}\\
& =  \alpha \cdot (y_i - p_i) \cdot X^{(i)}
\end{align*}


**注1:**  上式最后一个等号是将求和向量化，从而得到代码中**随机梯度上升**的更新公式. 

**注2:** $X^{(i)}$为矩阵X的第i行。

**注3:** 以上公式中, $i = 1,2,3,...,n$, $\;$ $j = 1,2,3,...,p$ 

**注4**: 代码中的weights即本文档中的$\beta$, alpha即本文档中的$\alpha$。随机梯度上升代码中的h即本文档中的$p_i$

为编程方便，我们希望在代码中输入的是向量$\beta$, 向量$y$, 向量$p$, 矩阵$X$, 故将$\beta_j = \alpha \cdot (y_i - p_i) \cdot X^{(i)}$继续矢量化，得

$$\beta = \beta + \alpha \cdot X^{T} \cdot (y - p)$$

到此，logisitc的梯度上升以及随机梯度上升公式全部推导完毕.