### Logistic Regression - Model Representation
For one example $x^{(i)}$:
$$z^{(i)} = w^T x^{(i)} + b \tag{1}$$
$$\hat{y}^{(i)} = sigmoid(z^{(i)})\tag{2}$$ 

Learn the weights (w) and bias (b) that minimize the cost function.
The cost is computed by summing over all training examples and is defined as:

$$ J = -\frac{1}{m} \sum_{i=1}^m y^{(i)}  [\log(\hat{y}^{(i)}) + (1-y^{(i)} )  \log(1-\hat{y}^{(i)})]\tag{3}$$


In [130]:
import numpy as np

np.set_printoptions(suppress = True,
   formatter = {'float_kind':'{:f}'.format})

In [131]:
def sigmoid(x):
    return 1/(1+np.exp(-x))

In [132]:
print ("sigmoid([0, 2]) = " + str(sigmoid(np.array([0,2]))))

sigmoid([0, 2]) = [0.500000 0.880797]


### Gradient Descent for Logistic Regression
$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline\;
& w_j = w_j -  \alpha \frac{\partial J}{\partial w_j} \tag{1}  \; & \text{for j = 0..n-1}\newline
&b\ \ = b -  \alpha \frac{\partial J}{\partial b}  \newline \rbrace
\end{align*}$$

where, n is the number of features, parameters $w_j$,  $b$, are updated simultaneously and where  

$$
\begin{align}
\frac{\partial J}{\partial w_j}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (\hat{y}^{(i)} - y^{(i)})x_{j}^{(i)} \tag{2}  \\
\frac{\partial J}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (\hat{y}^{(i)} - y^{(i)}) \tag{3}
\end{align}
$$
* m is the number of training examples in the data set

* $\hat{y}^{(i)}$ is the model's prediction, while $y^{(i)}$ is the target value for the i-th example:

$$z^{(i)} = w^T x^{(i)} + b \tag{1}$$
$$\hat{y}^{(i)} = sigmoid(z^{(i)})\tag{2}$$ 
  


In [133]:

# Create gradient descent function
def gradient_descend(X, y, w, b, lr):
    # Get the number of samples
    m = X.shape[0]
    y_pred = np.dot(X, w) + b
    y_pred = sigmoid(y_pred)
    #print ("y_pred", y_pred)

    dw = np.dot(X.T, (y_pred-y)) / m
    #print ("dw", dw)
    db = np.sum(y_pred-y) / m
    #print ("db", db)

    # Make an update to the w parameter 
    w = w - (lr * dw)
    b = b - (lr * db)
    return w, b

In [134]:
def log_loss(y, y_pred):
    # Calculate log loss (cross-entropy loss)
    epsilon = 1e-15  # Small value to avoid log(0)
    m = y.shape[0]
    return -1/m * np.sum(y*np.log(y_pred + epsilon) + (1-y)*np.log(1-y_pred + epsilon))


In [135]:
def fit(X, y, lr = 0.01, n_iters = 1000):
    n_samples, n_features = X.shape
    # Parameters
    w = np.zeros(n_features)
    b = 0
    
    # Iteratively make updates
    for epoch in range(n_iters): 
        w, b = gradient_descend(X, y, w, b, lr)
        # Debugging - Calculate the cost and print it every 100 epochs
        if epoch % 100 == 0:
            y_pred = np.dot(X, w) + b
            y_pred = sigmoid(y_pred)
            # compute cost
            cost = log_loss(y, y_pred)
            print(f'After {epoch} iterations the cost is {cost}')
    
    return w, b

In [136]:
def predict(X, w, b):
    y_pred = np.dot(X, w) + b
    y_pred = sigmoid(y_pred)
    return y_pred

In [137]:
def accuracy(y_pred, y_test):
    return np.sum(y_pred == y_test) / len(y_test)

Testing the model

In [138]:
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt

bc = datasets.load_breast_cancer()
X, y = bc.data, bc.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)

# Print 5 samples
print("X_train - 2 samples", X_train[:2])
print("y_train - 2 values", y_train[:2])

w, b = fit(X_train, y_train, lr=0.01)
y_pred = predict(X_test, w, b)
# Convert the probabilities to binary values
y_pred = [1 if i > 0.5 else 0 for i in y_pred]
acc = accuracy(y_pred, y_test)

print("Weights: ", w)
print("Bias: ", b)
print("y_pred", y_pred)
print("Accuracy: ", acc)

X_train - 2 samples [[12.880000 18.220000 84.450000 493.100000 0.121800 0.166100 0.048250
  0.053030 0.170900 0.072530 0.442600 1.169000 3.176000 34.370000
  0.005273 0.023290 0.014050 0.012440 0.018160 0.003299 15.050000
  24.370000 99.310000 674.700000 0.145600 0.296100 0.124600 0.109600
  0.258200 0.088930]
 [11.130000 22.440000 71.490000 378.400000 0.095660 0.081940 0.048240
  0.022570 0.203000 0.065520 0.280000 1.467000 1.994000 17.850000
  0.003495 0.030510 0.034450 0.010240 0.029120 0.004723 12.020000
  28.260000 77.800000 436.600000 0.108700 0.178200 0.156400 0.064130
  0.316900 0.080320]]
y_train - 2 values [1 1]
After 0 iterations the cost is 21.86190681699841
After 100 iterations the cost is 21.853509003667817
After 200 iterations the cost is 5.374454308137391
After 300 iterations the cost is 6.518523819521247
After 400 iterations the cost is 5.083635581070126
After 500 iterations the cost is 3.3803829891794175
After 600 iterations the cost is 4.738829858040198
After 700 ite

  return 1/(1+np.exp(-x))
