# Logistic Regression Model

## Cost Function

SSE is not the ideal cost function here, since the cost is non-convex thrwarted by many local minima.

![ ](bad-cost.png)

Calling the loss on a single training example:

$loss = L(f_{\vec{w},b}(\vec{x}^{(i)}), y^{(i)})$

Collecting the loss functions into a parent piecewise function is done to force into convex submission. Rearranging the SSE cost function:

$J(w, \vec{b}) = \frac{1}{m}\sum_{i=1}^m(L(f_{\vec{w}, b}(x^{(i)}) - y^{(i)}))$

Where $L$ is defined as (yes, it removes the $\frac{1}{2}$ from the linear regression model):

$L(f_{\vec{w},b}(\vec{x}^{(i)}), y^{(i)}) = 
\begin{cases}
 -log(f_{\vec{w},b}(\vec{x}^{(i)}))   &\text{if } y^{(i)} = 1\\
 -log(1 - f_{\vec{w},b}(\vec{x}^{(i)}))   &\text{if } y^{(i)} = 0
\end{cases}$

Since $f$ is the output of logistic regression, it will always be on the range of 0 to 1. Plotting the relevant part of the curve for $y^{(i)} = 1$ with the x & y axis as the prediction function $f$ and loss $L$ respectively:

![ ](neg-log.png)

As $f_{\vec{w},b}(\vec{x}^{(i)}) -> 1$ then $loss -> 0$  
As $f_{\vec{w},b}(\vec{x}^{(i)}) -> 0$ then $loss -> inf$

Plotting the relevant part of the curve for $y^{(i)} = 1$ with the x & y axis as the function $f$ and loss $L$ respectively:

![ ](one-minus-neg-log.png)

As $f_{\vec{w},b}(\vec{x}^{(i)}) -> 0$ then $loss -> 0$  
As $f_{\vec{w},b}(\vec{x}^{(i)}) -> 1$ then $loss -> inf$

--- 

Concluding that the further prediction $f_{\vec{w},b}(\vec{x}^{(i)})$ is from the target $y^{(i)}$, the higher the loss $L$. Plotting the nicer costs after the piecewise loss function is applied:

![ ](good-cost.png)

Why did we choose this function? Statistics! Specifically, the maximum likelihood.

## Simlifying the Cost Function

Done for GD.

Writing out the complete loss equation $L$ (equivalent to the piecewise form):

$L(f_{\vec{w},b}(\vec{x}^{(i)}), y^{(i)}) = y^{(i)}log(f_{\vec{w},b}(\vec{x}^{(i)}) - (1 - y^{(i)})log(1 - f_{\vec{w},b}(\vec{x}^{(i)}), y^{(i)})$

Using the cost:

$J(w, \vec{b}) = \frac{1}{m}\sum_{i=1}^m(L(f_{\vec{w}, b}(x^{(i)}) - y^{(i)}))$

Plugging in $L$ we get the general equation used for logistic regresion by GD:

$J(w, \vec{b}) = -\frac{1}{m}\sum_{i=1}^m[y^{(i)}log(f_{\vec{w},b}(\vec{x}^{(i)}) + (1 - y^{(i)})log(1 - f_{\vec{w},b}(\vec{x}^{(i)})]$


In [2]:
import numpy as np
%matplotlib widget
import matplotlib.pyplot as plt
from lab_utils_common import  plot_data, sigmoid, dlc
plt.style.use('./deeplearning.mplstyle')

X_train = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])  #(m,n)
y_train = np.array([0, 0, 0, 1, 1, 1])                                           #(m,)

ModuleNotFoundError: No module named 'ipympl'

In [None]:
def compute_cost_logistic(X, y, w, b):
    """
    Computes cost

    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      cost (scalar): cost
    """

    m = X.shape[0]
    cost = 0.0
    for i in range(m):
        z_i = np.dot(X[i],w) + b
        f_wb_i = sigmoid(z_i)
        cost +=  -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i)
             
    cost = cost / m
    return cost

In [None]:
w_tmp = np.array([1,1])
b_tmp = -3
print(compute_cost_logistic(X_train, y_train, w_tmp, b_tmp))