### 1. Model
- Aim: we want to know the probability for $Y=1$, given data $X$
$$ P(Y=1|X) = \pi(X) \in [0,1]$$ 
since we only have $X,Y$ as the inputs, we want to use $X$ to calculate the probability, that is, 
$$ P(Y=1|X) = z(\mathbf{W}^T\mathbf{X}) $$
, where function $$f = z(X) \in [0,1]$$
or we can let 
$$g(P(Y=1|X)) = g(\pi(X)) = \mathbf{W}^T\mathbf{X}$$,
where function $$f = g(X) \in [-\infty,\infty]$$
- Let $g(X) = log(\frac{X}{1-X})$ to make it $\in [-\infty,\infty]$,
$$ log(\frac{P(Y=1|X)}{1-P(Y=1|X)}) = \frac{P(Y=1|X)}{P(Y=0|X)} = \mathbf{W}^T\mathbf{X} $$
solve the equations and get:
$$ P(Y=1|X) = \frac{e^{\mathbf{W}^T\mathbf{X}}}{1+e^{\mathbf{W}^T\mathbf{X}}} = \frac{1}{1+e^{-\mathbf{W}^T\mathbf{X}}}$$
$$ P(Y=0|X) = \frac{1}{1+e^{\mathbf{W}^T\mathbf{X}}} $$

### 2. Strategy
Different from minimizing the loss function, here we maximize the likelihood function to get the optimal model:
- Likelihood function:
$$ L(\mathbf{W}) = \prod_{i=1}^n (P(Y=1|X_{i}))^{y_{i}} (1-P(Y=1|X_{i}))^{1-y_{i}} $$
To solve the function in an easier way, do the log transformation:
$$ L(\mathbf{W}) = \sum_{i=1}^n (y_{i}log(P(Y=1|X_{i})) + (1-y_{i})log (1-P(Y=1|X_{i})))$$
$$ L(\mathbf{W}) = \sum_{i=1}^n y_{i} (log\frac{P(Y=1|X_{i})}{1-P(Y=1|X_{i})} + log(1-P(Y=1|X_{i})))$$
$$ L(\mathbf{W}) = \sum_{i=1}^n y_{i} (log \mathbf{W}^T\mathbf{X} - log (1+e^{\mathbf{W}^T\mathbf{X}}))$$
- Optimization problem:
$$ \max L(\hat{\mathbf{W}}) = \max \sum_{i=1}^n y_{i} (log \hat{\mathbf{W}}^T\mathbf{X} - log (1+e^{\hat{\mathbf{W}}^T\mathbf{X}}))$$
- The optimal model for X:
$$ P(Y=1|X) = \frac{e^{\hat{\mathbf{W}}^T\mathbf{X}}}{1+e^{\hat{\mathbf{W}}^T\mathbf{X}}} = \frac{1}{1+e^{-\hat{\mathbf{W}}^T\mathbf{X}}}$$
$$ P(Y=0|X) = \frac{1}{1+e^{\hat{\mathbf{W}}^T\mathbf{X}}} $$

### 3. Algorithm
- How to solve the optimization problem: gradient ascent or descent
- Gradient descent: Convert the optimization problem to a minimization problem for $-L(\mathbf{W})$
![alt text](images/2.1.1.png)
![alt text](images/2.1.2.png)

### 4. Multiclass classification
- multi-nominal logistic regression model
https://thinkgamer.blog.csdn.net/article/details/85209496?utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromMachineLearnPai2%7Edefault-1.control&dist_request_id=1619621304019_30327&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromMachineLearnPai2%7Edefault-1.control

$$\frac{\partial l}{\partial w_{0}}\ =\ \sum_{i=1}^n\ \frac{y_{i}}{\sigma (z_{i}}\ -\ \frac{1\ -\ y_{i}}{1-\sigma (z_{i}}\ \frac{\partial}{\partial w_{0}}\ \sigma (z_{i})$$

$$=\ \sum_{i=1}^n\ \frac{y_{i}\ -\ \sigma (z_{i})}{\sigma (z_{i})(1-\ \sigma (z_{i})}\ \sigma (z_{i})(1-\ \sigma (z_{i})$$

$$ \frac{1}{n} \sum_{i=1}^n (y_{i}-f(x_{i}))^2 $$

$$ 