### Logistic regression from scratch
**We will :**
- Build the general architecture of a learning algorithm, including:
    - Initializing parameters
    - Calculating the cost function and its gradient
    - Using an optimization algorithm (gradient descent) 
- Gather all three functions above into a main model function, in the right order.

## 1 - Packages ##

import all the packages that you will need during this assignment. 
- [numpy](www.numpy.org) is the fundamental package for scientific computing with Python.
- [matplotlib](http://matplotlib.org) is a famous library to plot graphs in Python.
* [scikit-learn](http://scikit-learn.org/stable/) a library with Simple and efficient tools for data mining and data analysis

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

%matplotlib inline
np.random.seed(42)

## 2 - Dataset ##

we will use the make_classification data from sklearn

Loading the data by with the  following code.

In [2]:
X,Y=datasets.make_classification(n_samples=100000, n_features=100,
                                    n_informative=100,n_classes=2, n_redundant=0,
                                    random_state=42)



## - Data-split  ##

we will split the data with the following distribution 
- 99% -training set
- 1% -test set

we will use the sklearn train_test_split

In [3]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.01,
                                                    random_state=42)

For convenience, we reshape the data into  a numpy-array of shape (1, m). After this, our training (and test) dataset is a numpy-array where each column represents one training example. There should be m_train (respectively m_test) columns.

In [4]:
# we need to reshape our data to column vectors 
X_train=X_train.reshape(X_train.shape[0],-1).T
X_test=X_test.reshape(X_test.shape[0],-1).T
y_train=y_train.reshape(y_train.shape[0],-1).T
y_test=y_test.reshape(y_test.shape[0],-1).T


## 3 - Building the parts of our algorithm ## 

The main steps for building a Neural Network are:
1. Define the model structure (such as number of input features) 
2. Initialize the model's parameters
3. Loop:
    - Calculate current loss (forward propagation)
    - Calculate current gradient (backward propagation)
    - Update parameters (gradient descent)

You often build 1-3 separately and integrate them into one function we call `model()`.

### 3.1 - Helper functions

**sigmoid**:  implementing `sigmoid()`.  $sigmoid( w^T x + b) = \frac{1}{1 + e^{-(w^T x + b)}}$ to make predictions. 

In [5]:
def sigmoid(z):
    """
    Compute the sigmoid of z

    Arguments:
    z -- A scalar or numpy array of any size.

    Return:
    s -- sigmoid(z)
    """
    s=1/(1+np.exp(-z))
    return s

### 3.2 - Initializing parameters

**parameter initilization:** Implementing parameter initialization we  have to initialize w as a vector of zeros. I

In [6]:
def parameter_initiliazation(dimension):
    """
    This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
    
    Argument:
    dim -- size of the w vector we want (or number of parameters in this case)
    
    Returns:
    w -- initialized vector of shape (dim, 1)
    b -- initialized scalar (corresponds to the bias)
    """
    
    w=np.zeros((dimension,1),dtype=float)
    b=0
    return w,b

### 3.3 -forward propagation

**forward propagation:** Implementing forward propagation 
- We get X
- We compute $A = \sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})$


In [7]:
def forward_propagation(X,W,b):
    """
    This function computes the forward propagation computation by getting Z then applying the sigmoid function
    
    Arguments:
        X -> input matrix
        W-> Weights vector
        b-> bias scalar
    Returns:
        A->Activations vector
    """
    Z=np.dot(W.T,X)+b
    A=sigmoid(Z)
    return A
    

### 3.4 -Compute Cost 

**Cost function:** Computing the cost  
- We calculate the cost function: $J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})$



In [8]:
def compute_cost(A,Y,m):
    """
    This function computes the cost of the logistic regression
    
    
    Arguments:
        A-> Activations from the forward propagation
        Y-> The correct labels 
        m-> The number of examples in the set
    Returns:
        cost-> logistic regression cost
    """
    cost=-(1/m)*np.sum(Y*np.log(A)+(1-Y)*np.log(1-A))
    return cost

 
### 3.4 -Back propagation

**Back propagation:** We Compute the gradients using the following formulas. these can be verified using calculus{partial derivatives}  
$$ \frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{7}$$
$$ \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{8}$$

In [9]:
def back_propagation(A,X,Y,m):
    """
    This function computes the gradients 
    """
    dw=1/m*np.dot(X,(A-Y).T)
    db=1/m*np.sum(A-Y)
    return dw,db

### 3.4 -update parameters

**Update parameters:** We update parameters after getting the gradients utilizing the following formulas
- W=W- $\alpha$dw
- b=b- $\alpha$db

In [10]:
def update_parameters(W,b,dw,db,learning_rate):
    """
    This function updates the parameters w and b
    
    Arguments:
        W- weights matrix
        
        b- bias scalar
        dw-gradient scalar
        db- gradient bias
    """
    W=W-learning_rate*dw
    b=b-learning_rate*db
    return W,b

We are able to use w and b to predict the labels for a dataset X. 

1. Calculate $\hat{Y} = A = \sigma(w^T X + b)$

2. Convert the entries of a into 0 (if activation <= 0.5) or 1 (if activation > 0.5), stores the predictions in a vector

In [11]:
def predict(W,b,X):
    '''
    Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
    
    Arguments:
    w -- weights, a numpy array of size (features, 1)
    b -- bias, a scalar
    X -- data of size (features, number of examples)
    
    Returns:
    -- a numpy array (vector) containing all predictions (0/1) for the examples in X
    '''
    
    z=np.dot(W.T,X)+b
    A=sigmoid(z)
    return np.choose(A < 0.5,[1,0])

##  - Merge all functions into a model ##

we  will now see how the overall model is structured by putting together all the building blocks (functions implemented in the previous parts) together, in the right order.



In [12]:
def model(X_train,y_train,X_test,y_test,num_iterations,learning_rate=0.1,print_costs=True):
    """
    Builds the logistic regression model by calling the function you've implemented previously
    
    Arguments:
        X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
        Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
        X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
        Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
        num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
        learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
        print_cost -- Set to true to print the cost every 100 iterations
    
    Returns:
         d-> a dictionary containing the varios parameters and results
    """
    W,b=parameter_initiliazation(X_train.shape[0])
    m=X_train.shape[1]
    costs=[]
    for i in range(num_iterations):
        A=forward_propagation(X_train,W,b)
        cost=compute_cost(A,y_train,m)
        costs.append(cost)
        dw,db=back_propagation(A,X_train,y_train,m)
        W,b=update_parameters(W,b,dw,db,learning_rate)
        if i%10==0 and print_costs:
            print("cost after {} iterations is {}".format(i,cost))
       
    Y_prediction_train=predict(W,b,X_train)
    Y_prediction_test=predict(W,b,X_test)
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - y_test)) * 100))

    
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : W, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d

In [13]:
d=model(X_train,y_train,X_test,y_test,learning_rate=0.01,num_iterations=100)

cost after 0 iterations is 0.6931471805599453
cost after 10 iterations is 0.4643811720781759
cost after 20 iterations is 0.4197591934353743
cost after 30 iterations is 0.4005632032867893
cost after 40 iterations is 0.3900921286446589
cost after 50 iterations is 0.3836970762049911
cost after 60 iterations is 0.3795215404661102
cost after 70 iterations is 0.3766711245953468
cost after 80 iterations is 0.37466210379608506
cost after 90 iterations is 0.3732114377241076
train accuracy: 83.55757575757576 %
test accuracy: 83.2 %


#### Our models gets upto 83% accuracy in both test and train sets

### TensorFlow implementation of the Logistic regression

In [14]:
import tensorflow as tf

In [15]:
def create_placeholders(n_x,n_y):
    """
    This function creates the placeholders for the X inputs and labels y ,
    we set the second dimension in shape to None since we dont want to have a fixed number of inputs to feed
    
    Arguments:
        n_x -> dimension of the input features X
        n_y -> dimension of the labels Y
    Returns:
        X->placeholder for X
        Y-> placeholder for Y   
        
    """
    X=tf.placeholder(dtype=tf.float32,shape=(n_x,None),name="X")
    Y=tf.placeholder(dtype=tf.float32,shape=(n_y,None),name="Y")
    return X,Y

In [16]:
def create_variables(dimension):
    """
    This function creates the variables for weights W and bias b
    
    Argument:
        dimension-> dimension of the weights in regards to input X
    Returns:
        W: variable for weights
        b: variable for bias
    """
    W=tf.get_variable(dtype=tf.float32,shape=(dimension,1),name="W",initializer=tf.zeros_initializer(dtype=tf.float32))
    b=tf.get_variable(dtype=tf.float32,name="b",initializer=tf.constant(0.))
    
    return W,b
    

In [17]:
def propagate(X,W,b):
    """
    This function runs a forward pass for the network
    
    Arguments:
        X-> input features X
        W-> weights matrix
        b->bias
    Returns:
        Activations A
    """
    
    Z=tf.add(tf.matmul(tf.transpose(W),X),b)
    A=tf.sigmoid(Z)# We it use when not utilizing tf.nn.sigmoid_cross_entropy_with_logits
    return A
    
        

In [18]:
def compute_cost(A,Y,m):
    """
    This function computes the cost of the Logistic regression
    
    Arguments:
        A-> predicted labels from the  activation or logits when tf.nn.s.... is used
        Y-> true labels 
    Returns:
        cost-> the logistic regression cost
    
    """
    cost=-(1/m)*tf.reduce_sum(Y*tf.log(A)+(1-Y)*tf.log(1-A))
    #or
    #cost=tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=A,labels=Y))#-> no sigmoid activation in the propagate function
    
    return cost
    
        

In [27]:
def tensorflow_model(X_train,X_test,y_train,y_test,learning_rate=0.1,num_iterations=100,print_costs=True):
    """
    Builds the tensorflow  logistic regression using functions above
    
    Arguments:
        X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
        Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
        X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
        Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
        num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
        learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
        print_cost -- Set to true to print the cost every 100 iterations
    
    Returns:
         d-> a dictionary containing the varios parameters and results
    """
    tf.reset_default_graph()
    m=X_train.shape[1]
    X,Y=create_placeholders(X_train.shape[0],y_train.shape[0])
    W,b=create_variables(X_train.shape[0])
    A=propagate(X,W,b)
    cost=compute_cost(A,Y,m)
    
    costs_list=[]
    optimizer=tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)
    init = tf.global_variables_initializer()
    with tf.Session() as sess:
        sess.run(init)
        for i in range(num_iterations):
            _,costs=sess.run([optimizer,cost],feed_dict={X:X_train,Y:y_train})
            costs_list.append(costs)  
            if i%10==0 and print_costs:
                 print("cost after {} iterations is {}".format(i,costs))
                

        # Calculate the correct predictions
        predictions_train=np.choose(sess.run(A,feed_dict={X:X_train}) < 0.5,[1,0])
        predictions_test=np.choose(sess.run(A,feed_dict={X:X_test}) < 0.5,[1,0])
        train_correct_prediction = tf.equal(predictions_train, y_train)
        test_correct_prediction = tf.equal(predictions_test, y_test)


            # Calculate accuracy on the test set
        train_accuracy = tf.reduce_mean(tf.cast(train_correct_prediction, "float"))
        test_accuracy = tf.reduce_mean(tf.cast(test_correct_prediction, "float"))
        print("The training accuracy is {} and test accuracy is {}".format(train_accuracy.eval()*100,test_accuracy.eval()*100))

        d={'costs':costs_list,
          'train_accuracy':train_accuracy.eval(),
           'test_accuracy':test_accuracy.eval(),
           "w" : W.eval(), 
            "b" : b.eval(),
          "learning_rate" : learning_rate,
          "num_iterations": num_iterations
           
          }
        return d

    

    

In [28]:
d=tensorflow_model(X_train,X_test,y_train,y_test,learning_rate=0.1,num_iterations=10)

cost after 0 iterations is 0.693192720413208
The training accuracy is 83.56666564941406 and test accuracy is 82.99999833106995
