<div class="alert alert-success">
    <h1>Logistic Regression Algorithm</h1>
</div>

In [1]:
#importing the essential libraries

import numpy as np # numpy imported for vectorization

## Mathematical expression of the algorithm ##

For one example $x^{(i)}$:
$$z^{(i)} = w^T x^{(i)} + b \tag{1}$$
$$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$$ 
Loss:
$$ \mathcal{L}(a^{(i)}, y^{(i)}) =  - y^{(i)}  \log(a^{(i)}) - (1-y^{(i)} )  \log(1-a^{(i)})\tag{3}$$

The cost is then computed by summing over all training examples:
$$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{4}$$


<div class="alert alert-warning">
    <h3>Sigmoid function</h3>
</div>

$sigmoid(x) = \frac{1}{1+e^{-x}}$ is sometimes also known as the logistic function. It is a non-linear function used not only in Machine Learning (Logistic Regression), but also in Deep Learning.

<img src="images/Sigmoid.png" style="width:500px;height:228px">

$$ \text{For } x \in \mathbb{R}^n \text{,     } sigmoid(x) = sigmoid\begin{pmatrix}
    x_1  \\
    x_2  \\
    ...  \\
    x_n  \\
\end{pmatrix} = \begin{pmatrix}
    \frac{1}{1+e^{-x_1}}  \\
    \frac{1}{1+e^{-x_2}}  \\
    ...  \\
    \frac{1}{1+e^{-x_n}}  \\
\end{pmatrix} $$


In [2]:
# creating the sigmoid function
def sigmoid(x):
    return 1/(1 + np.exp(-x))

In [3]:
## TEST

w, b, X = np.array([[1.],[2.]]), 2., np.array([[1.,2.,-1.],[3.,4.,-3.2]])

sigmoid(np.dot(w.T,X) + b)

array([[0.99987661, 0.99999386, 0.00449627]])

### Sigmoid gradient
The sigmoid gradient is computed to optimize loss functions using **back propagation**.
The formula is: $$sigmoid\_derivative(x) = \sigma'(x) = \sigma(x) (1 - \sigma(x))\tag{1}$$

Useful consideration:
 $s = \sigma(x)\tag{2}$
 $\sigma'(x) = s(1-s)\tag{3}$


In [4]:
# computung the sigmoid derivative
def sigmoid_derivative(x):
    s = 1/(1 + np.exp(-x))
    return s(1 - s)

### Initializing parameters w and b
To allow room for vectorization, parameter (w) will be a numpy array of zeros.

In [5]:
# initialization of w and b
def initialize_param_zero(Nx):
    
    w = np.zeros([Nx,1])
    b = 0
    
    #assert(w.shape == (Nx, 1))
    assert(isinstance(b, float) or isinstance(b, int))
    
    return w, b
    

In [6]:
w, b = initialize_param_zero(2)
print(w,'\n\n',b)

[[0.]
 [0.]] 

 0


<div class="alert alert-warning">
    <h3>Forward and Back propagation</h3>
</div>

In the forward propagation, the output is predicted, while in the back propagation, the Gradient can be computed, like wise the derivative.

Formulas:
- Activation:    $A = \sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})\tag{1}$
- Cost:  $J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})\tag{2}$

Gradients:
$$ \frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{3}$$
$$ \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{4}$$

In [7]:
# creating the function for the propagation
def propagate(w, b, X, Y):
    
    m = X.shape[1] # number of training examples
    
    # FORWARD PROPAGATION
    A = sigmoid(np.dot(w.T,X) + b) # computing activation
    J = (-1/m)*np.sum(Y*np.log(A) + (1 - Y)*np.log(1 - A)) # computing the cost
    
    # BACK PROPAGATION - Computing the gradients
    dw = (1/m)*np.dot(X, (A - Y).T)
    db = (1/m)*np.sum(A - Y)
    
    #assert(dw.shape == X.shape)
    assert(db.dtype == float)
    cost = np.squeeze(J)
    assert(cost.shape == ())
    
    gradnts = {"dw" : dw, "db" : db}
    
    return gradnts, cost

In [8]:
## TEST

w, b, X, Y = np.array([[1.],[2.]]), 2., np.array([[1.,2.,-1.],[3.,4.,-3.2]]), np.array([[1,0,1]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))

dw = [[0.99845601]
 [2.39507239]]
db = 0.001455578136784208
cost = 5.801545319394553


### Optimization

The goal is to learn $w$ and $b$ by minimizing the cost function $J$. 
- For a parameter $\theta$, the update rule is $ \theta = \theta - \alpha \text{ } d\theta$, where $\alpha$ is the learning rate.

In [9]:
# creating the optimization function
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    
    costs = []
    
    for i in range(num_iterations):
        
        
        # Cost and gradient calculation
        gradnts, cost = propagate(w, b, X, Y)
        
        # Retrieve derivatives from grads
        dw = gradnts["dw"]
        db = gradnts["db"]
        
        # update rule 
        w = w - learning_rate*dw
        b = b - learning_rate*db
        
        # Record the costs
        if i % 100 == 0:
            costs.append(cost)
        
        # Print the cost every 100 training examples
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    
    params = {"w": w,
              "b": b}
    
    gradnts = {"dw": dw,
             "db": db}
    
    return params, gradnts, costs

In [10]:
## TEST

params, gradnts, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False)

print ("w = " + str(params["w"]))
print ("b = " + str(params["b"]))
print ("dw = " + str(gradnts["dw"]))
print ("db = " + str(gradnts["db"]))

w = [[0.19033591]
 [0.12259159]]
b = 1.9253598300845747
dw = [[0.67752042]
 [1.41625495]]
db = 0.21919450454067652


### Predicting the labels

The activation function is computed, and the entries are converted to either 0 or 1 depending on the size ofvthe result.

In [11]:
# creating the function to predict
def predict(w, b, X):
    
    m = X.shape[1]
    Y_prediction = np.zeros((1,m))
    w = w.reshape(X.shape[0], 1)
    
    A = sigmoid(np.dot(w.T,X) + b)
    
    for i in range(A.shape[1]):
        if A[0,i] <= 0.5:
            Y_prediction[0,i] = 0
        else:
            Y_prediction[0,i] = 1
    
    assert(Y_prediction.shape == (1, m))
    
    return Y_prediction

In [12]:
# TEST

w = np.array([[0.1124579],[0.23106775]])
b = -0.3
X = np.array([[1.,-1.1,-3.2],[1.2,2.,0.1]])
print ("predictions = " + str(predict(w, b, X)))

predictions = [[1. 1. 0.]]


<div class="alert alert-info">
    <h2>The Resulting Model</h2>
All the functions are brought together to build the model.
</div>

In [13]:
# The model comprises of the previous functions we have created.
# Computing the model to summon the algorithm

def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    
    w, b = initialize_param_zero(X_train.shape[0])

    parameters, grads, costs = optimize(w, b, X_train, X_test, num_iterations=100, learning_rate=0.009, print_cost=False)
    
    w = parameters["w"]
    b = parameters["b"]
    print(w.shape)
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)
    
    print("Train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("Test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d

In [14]:
# TESTING THE MODEL 

from utils import load_dataset

train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

In [15]:
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1)
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1)

train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.

In [16]:
train_set_x.shape

(209, 12288)

In [None]:
d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)

<div class="alert alert-success">
    <center><i>Written on Pydroid3</i></center>
</div>