# Logistic Regression 
- Logistic Regression is a model for classification in supervised learning

### Sigmoid function
$$ g(z) = \frac{1}{1+e^{-z}}$$
Intuition from the formula:
- if the value of $z$ is a large (positive) number, the $g(z)$ is 1, [$ e^{-(very\ large\ number) } = 0$]
- if it is very small (negative) number, the $g(z)$ is 0, [$ e^{-( - very\ large\ number) } = \infty$]
- and if the value of $z$ is zero then $g(z)$ is $0.5$
![image.png](attachment:c978f4a8-a2fb-4f52-bcf2-e47cdcb46c01.png)

- here, $ g(z) = f_{\vec w, b}(\vec x)$
- $ z = \vec w \cdot \vec x + b$
$$ g(z) = g(\vec w \cdot \vec x +b) = \frac{1}{1+e^{-(\vec w \cdot \vec x +b)}} $$

### Decision Boundary
- Decision boundary is set to define a boundary to differentiate between groups in a classification
- To achieve decision boundary we set, $z = 0$
#### For a linear regression
$$ z = \vec w \cdot \vec x + b = 0$$ 
- For example: In a linear regression with two variables $x_1 and x_2$
$$ w_1 \cdot x_1 + w_2 \cdot x_2 + b = 0  $$
$$ when,\ w_1 = 1,\ w_2 = 1\ and\ b = -3  $$
- The decision boundary is : $ x_1 + x_2 = 3 $
![image.png](attachment:52b7e1c1-6ae1-435e-a1d2-a8c08ab32844.png)

- In the figure if $x_1 + x_2 \geq 3$ that's when $ \hat y = 1$ else $\hat y = 0$ 

#### For a polynomial regression
$$ z = 0 $$
- For example: In a polynomial regress of two variables $x_1 and x_2$
$$ z = w_1.x_1^2 + w_2.x_2^2 + b $$
$$ when\ w_1 = w_2 = 1\ and\ b = -1 $$
- The decision boundary is : $ x_1^2 + x_2^2 = 1$
![image.png](attachment:c2a61bf9-9840-4b43-b590-63884d786d7d.png)

> Note: With more complex polynomial value for $z$ any shape of decision boundary can be achieved

# Cost function for Logistic Regression
$$ J(w,b) = \frac{1}{m} \sum_{i=0}^{m-1} L\left(f_{\vec w, b}(\vec x^{(i)}), y^{(i)}\right) $$
- where loss function $ L\left(f_{\vec w, b}(\vec x^{(i)}), y^{(i)}\right)$ is given as:

$$ L\left(f_{\vec w, b}(\vec x^{(i)}), y^{(i)}\right) = { -\log \left(f_{\vec w,b}(\vec x^{(i)})\right)\ if\ y = 1} $$
$$L\left(f_{\vec w, b}(\vec x^{(i)}), y^{(i)}\right) = {-\log \left(1-f_{\vec w,b}(\vec x^{(i)})\right)\ if\ y=0  }$$

- The formula can be simplified to:
$$ L\left(f_{\vec w, b}(\vec x^{(i)}), y^{(i)}\right) = \left[- y^{(i)}\log \left(f_{\vec w,b}(\vec x^{(i)})\right) - \left(1-y^{(i)}\right)\log \left(1 -f_{\vec w,b} (\vec x^{(i)})\right)\right] $$
- The first term's cancel out when actual label $y$ is 0 and second term's cancel out when label $y$ is 1
- Hence, the cost function for logistic regression becomes:
$$ J(\vec w, b) = - \frac{1}{m} \sum_{i=0}^{m-1} \left[y^{(i)}\log \left(f_{\vec w,b}(\vec x^{(i)})\right) + \left(1-y^{(i)}\right)\log \left(1 -f_{\vec w,b} (\vec x^{(i)})\right)\right] $$

![image.png](attachment:9e0099dd-5e97-4949-a5a4-b95731ee0af2.png)

- Since the prediction of logistic regression is always between $0$ and $1$, when the actual label $y=1$ the graph looks like:
![image.png](attachment:c10ec34e-4bde-4546-9294-017c8ee62f67.png)

- when the actual label $y=0$, the graph looks like
![image.png](attachment:20665bd8-9bfd-425a-a910-a2da2d83a55c.png)

## Intuition behind the above two figure
```python
while (actual label y =1 ):
    if (y_predict is closer to zero):
        yields high error
    elif (y_predict is closer to one):
        yields low error

while (actual label y = 0):
    if (y_predict is closer to zero):
        yields low error
    elif(y_predict is closer to 1):
        yields high error
```

# Gradient Descent in Logistic Regression
- The cost function is given as :

$$ w_j = w_j - \alpha \frac{\partial J(\vec w, b)}{\partial w_j} \tag 1$$
$$ b = b - \frac{\partial J(\vec w, b)}{\partial b} \tag 2$$ 
$$ \frac{\partial J(\vec w, b)}{\partial w_j} = \frac{1}{m}\sum_{i=1}^{m}{\left(f_{\vec w,b}(\vec x^{(i)}) - y^{(i)}\right) x_j^{(i)}} \tag 3$$
$$ \frac{\partial J(\vec w, b)}{\partial b} = \frac{1}{m}\sum_{i=1}^{m}{\left(f_{\vec w,b}(x^{(i)}) - y^{(i)}\right)} \tag 4$$

- Substituting $eq^n (3)\ and\ (4)\ in\ (1)\ and\ (2)$, and updating $w$ and $b$ simultaneously until convergence
$$ Repeate\ until\ convergence \{ $$
$$ w_j = w_j - \alpha \frac{1}{m}\sum_{i=1}^{m}{\left(f_{\vec w,b}(\vec x^{(i)}) - y^{(i)}\right) x_j^{(i)}}$$
$$ b = b - \alpha \frac{1}{m}\sum_{i=1}^{m}{\left(f_{\vec w,b}(x^{(i)}) - y^{(i)}\right)}$$
$$\}$$

# This Looks exactly same as in Linear Regression
- This is because, in logistic regression the $f_{\vec w,b}(\vec x^{(i)})$ is changed as
$$f_{\vec w,b}(\vec x^{(i)}) = \frac{1}{1 + e^{-(\vec w \cdot \vec x + b)}}$$