# Logistic Regression 
in this jupyter notebook we will start by providing a model , cost and applying gradient descent to binary classification problem. in a later notebook we will walk through multiclass logistic regression. 
## model 
you might remember that linear regression had a model of the following formula $f(x) = wx+b$. However trying to apply this to a classification will result in a lot of missclassified cases thus we will need a different model thus we will define it as follows : 
 $f(x) =  \frac {1} {1 + \exp(-wx+b)}$       now, we will try to code the model and then apply it. 

In [5]:
# import necessary libraries 
import numpy as np 

In [6]:
# define the model 
def linear_regression_model (X,W,b) : 
    # X is a set feature (training example) 
    # W is a set of parameters (scalar) 
    # b is the parameter 
    f_X = np.dot (X,W) # compute the scalar product 
    f_X += b # add a bias 
    return f_X # return the model 

In [24]:
# define the sigmoid 
def sigmoid (z) : 
    sigmoid = 1 / (1+np.exp(-z))
    return sigmoid 

In [25]:
# define the logistic regression 
def logistic_regression_model (X,W,b): 
    return sigmoid(linear_regression_model(X,W,b))

In [26]:
# define a x 
x = np.array ([
    [1,2,3] , 
    [2,3,4] , 
    [6,7,8]
])
# define a w 
w = [1,2,3] 
# define a b
b = 10 
print (f'the result of the logisitic regression is : {logistic_regression_model(x[2],w,b)}')# sending only one training example 

the result of the logisitic regression is : 1.0


## Cost function 
now, we will define the cost function . you might think about using the cost function that we have seen in the linear regression how if you try to plot it with a logistic regression you find that it has a lot of local minimum which lead to gradient descent being stuck in one of them before reaching the global minimum thus we shall define a new loss function for a training example and average over all the training set to get the cost function
$ loss = -y^i * \log(f(x)) - (1-y^i) * \log(1-f(x))$ and then you can sum and divide by 2m to get the cost function 

In [27]:
# define the loss function 
def loss (X,Y,W,b) : 
    # X is a set of feature one training example 
    # Y is the output of one training example 
    # w is the set of parameters scalar 
    # b is the bias parameter 
    f_x = logistic_regression_model(X,W,b)
    epsilon = 1e-10
    first_term = Y * np.log(f_x +epsilon) 
    second_term = (1-Y) * np.log(1-f_x + epsilon) 
    loss = -(first_term + second_term) 
    return loss 


In [28]:
# define the cost function 
def logistic_regression_cost (X,Y,W,b) : 
    # X the set of features in the whole training set 
    # Y the set of outputs in the whole training set 
    # W the set of scalar parameters 
    # b is the bias parameter 
    m = X.shape[0]
    cost = 0.0 
    for i in range (m) : 
        cost += loss (X[i] ,Y[i] , W , b) 
    cost /= (m) 
    return cost 


In [29]:
# test the function 
# define a x 
x = np.array ([
    [1,2,3] , 
    [2,3,4] , 
])
# define y 
y = np.array ([1,0])
# define a w 
w = [1,2,3] 
# define a b
b = 10 
# print the cost 
print (f'the cost for the above data : {logistic_regression_cost(x,y,w,b)}')


the cost for the above data : 11.512458279376084


## Gradient 
we will define here a more of general gradient descent by passing the cost and the model as parameter at first we will start by implementing a function that computes the gradient and then implements the function for the optimization function gradient descent. for computing the gradient we have the following formula : 
$ \frac{\partial J(W, b)}{\partial W_j} = \frac{1}{m} \sum_{i=1}^{m} \left( \hat{y}^{(i)} - y^{(i)} \right) X_j^{(i)} $              
                                                                                                                                            $  \frac{\partial J(W, b)}{\partial b} = \frac{1}{m} \sum_{i=1}^{m} \left( \hat{y}^{(i)} - y^{(i)} \right) $


In [34]:
# define the gradient for logistic regression 
def logistic_regression_gradient(X, Y, W, b):
    predictions = logistic_regression_model(X, W, b)
    error = predictions - Y
    dj_db = np.sum(error) / X.shape[0]
    dj_dw = np.dot(X.T, error) / X.shape[0]
    
    return dj_dw, dj_db


In [35]:
# define the gradient descent 
def Gradient_descent_logistic (X,Y,W,b , alpha ) :
    # X is the set of features in the training set 
    # Y is the set of targets in the training set 
    # W is the set of scalar parameters 
    # b is the bias parameter 
    # alpha is the learining rate 
    W = np.array(W, dtype=np.float64)  # Ensure W is float64
    b = float(b)  
    current_cost = logistic_regression_cost (X,Y,W,b)
    for i in range (1000) : 
        dw , db = logistic_regression_gradient (X, Y, W, b) # compute the gradient  
        W -= (alpha * dw ) # update the W
        b -= (alpha * db ) # update the b 
        next_cost = logistic_regression_cost (X,Y,W,b) 
        if current_cost > next_cost : 
            current_cost = next_cost 
            print (f'the cost at iteration {i} is : {current_cost}' )
        else : 
            break 
    print (f'The new values of W {W} and b {b}')

## Testing 
now, I will try to look for a training set on kaggle to test the logistic regression model we have implemented. I have a data set that is set for the detection of heart illnesses I will try to load and get the data and then run the gradient descent but before we will need to import pandas 

In [36]:
# import pandas 
import pandas as pd 

In [37]:
# load the data 
data = pd.read_csv('test.csv') # this is the data of patient and whether they have a heart disease or not 
# divide it into target and features 
X = data.iloc[:,:-1].to_numpy() 
Y = data.iloc [:,-1].to_numpy() 
# initialize the scalar 
W = np.zeros(X.shape[1], dtype=np.float64)
# initialize the bias 
b = 10 
# initialize the alpha 
alpha = 0.00001
Gradient_descent_logistic(X,Y,W,b,alpha)


the cost at iteration 0 is : 7.446610225779649
the cost at iteration 1 is : 6.832054866225674
the cost at iteration 2 is : 6.217728240287832
the cost at iteration 3 is : 5.6039149313075365
the cost at iteration 4 is : 4.991262646113674
the cost at iteration 5 is : 4.381252450227888
the cost at iteration 6 is : 3.777250682600214
the cost at iteration 7 is : 3.186653947789589
the cost at iteration 8 is : 2.6242147501394673
the cost at iteration 9 is : 2.113944018208496
the cost at iteration 10 is : 1.6837726133388269
the cost at iteration 11 is : 1.3521037178942255
the cost at iteration 12 is : 1.1168491540901826
the cost at iteration 13 is : 0.9589754325043346
the cost at iteration 14 is : 0.855338292337225
the cost at iteration 15 is : 0.7871823771201852
the cost at iteration 16 is : 0.7417359897240411
the cost at iteration 17 is : 0.7108799473797095
the cost at iteration 18 is : 0.6895414210470278
the cost at iteration 19 is : 0.6745305533422933
the cost at iteration 20 is : 0.6638084