# Regularising Logistic Regression

Implementation if regularized logistic regression. 

The regularized cost function in logistic regression is

$$ J(W) = \frac{1}{n} \left[ -Y \log\left(h\left( X \right) \right) - \left( 1 - Y\right) \log \left( 1 - h\left( X \right) \right) \right] + \frac{\lambda}{2n} \sum_{j=1}^m w_j^2 \qquad \text{ for } j > 0$$

weshould not regularize the parameters $w_0$. The gradient of the cost function is a vector defined as follows:

$$ \frac{\partial J(W)}{\partial w_0} =\nabla_{w_0} = \frac{1}{n} \left( h \left( X \right) - Y \right) X_0 \qquad \text{ where } X_0 \text{ is a vector or 1's, corresponding to intercept } w_0 $$

$$ \frac{\partial J(W)}{\partial w_j} =\nabla_{w_j} = \frac{1}{n} \left( h \left( X \right) - Y \right) X_j  + \frac{\lambda}{m}w_j \qquad \text{ where } X_j \\ \text{ is the array of X's, except } X_0 
\text{ corresponding to intercept } w_0 $$



In [22]:
import numpy as np
import pandas as pd
# DO NOT use any other import statements for this question

df = pd.read_csv('titanic.csv')
data = df[['Survived', 'Pclass', 'Sex', 'Age', 'SibSp', 'Parch']].dropna()
data.loc[data["Sex"] == "male", "Sex"] = 1
data.loc[data["Sex"] == "female", "Sex"] = 0
data = np.array(data)
X, Y = data[:, 1:], data[:, 0]

# normalise all cols
for c in range(X.shape[1]):
    X[:,c] = (max(X[:,c]) -  X[:,c])/(max(X[:,c]) - min(X[:,c]))
    
# break into train/test
split = int(0.8 * data.shape[0])

X_train = X[:split]
X_test = X[split:]
Y_train = Y[:split]
Y_test = Y[split:]

# Add intercept term to X
X_train = np.concatenate([np.ones((X_train.shape[0], 1)), X_train], axis=1)
X_test = np.concatenate([np.ones((X_test.shape[0], 1)), X_test], axis=1)

In [23]:
def sigmoid(z):
    """
    Compute sigmoid function given the input z.
    
    Parameters
    ----------
    z : array_like
    
    Returns
    -------
    g : array_like
        The computed sigmoid function. g has the same shape as z, since
        the sigmoid is computed element-wise on z.
        
    Instructions
    ------------
    Computing the sigmoid of each value of z (z can be a matrix, vector or scalar).
    """
    # converting input to a numpy array
    z = np.array(z).astype("float")
        
    # ====================== CODE HERE ======================  
    g = 1 / (1 + np.exp(-z)) # This is the basic formula for sigmoid
    # =============================================================
    return g

In [24]:
def logreg_costFunctionReg(W, X, Y, lambda_):
    """
    Compute cost and gradient for logistic regression with regularization.
    
    Parameters
    ----------
    W : Logistic regression vector of m parameters,
        where m is the number of features including any intercept.
    
    X : The data set with shape (n,m). n is the number of examples, and
        m is the number of features.
    
    y : The vector of data labels of size n.
    
    lambda_ : float
        The regularization parameter. 
    
    Returns
    -------
    J : float
        The computed value for the regularized cost function. 
   
    """
    # Initialize some useful values
    n = Y.size  # number of training examples

    # You need to return the following variables correctly 
    J = 0
    # ===================== CODE HERE ======================
    W_temp = np.copy(W)
    W_temp[0] = 0 # To make the wj term as per the equation, to multiply with regularising term
    h = sigmoid(np.dot(X, W))
    J = (-1/n) * (np.dot(Y, np.log(h)) + np.dot((1-Y) , np.log(1-h))) + (lambda_/(2*n)) * np.sum(np.square(W_temp))
    
    return J


In [25]:
def logreg_GradFunctionReg(W, X, Y, lambda_):
    """
    Compute cost and gradient for logistic regression with regularization.
    
    Parameters
    ----------
    W : Logistic regression vector of m parameters,
        where m is the number of features including any intercept.
    
    X : The data set with shape (n,m). n is the number of examples, and
        m is the number of features.
    
    y : The vector of data labels of size n.
    
    lambda_ : float
        The regularization parameter. 
    
    Returns
    -------
    grad : A vector of size m which is the gradient of the cost
        function with respect to theta, at the current values of theta.
    
    """
    # Initializing some useful values
    n = Y.size  # number of training examples

    # To return the following variables correctly 
    grad = np.zeros(W.shape)

    # ===================== CODE HERE ======================
    h = sigmoid(np.dot(X, W))
    W_temp = np.copy(W)
    W_temp[0] = 0
    grad = (1 / n) * (np.dot((h - Y) , X))
    grad = grad + (lambda_ / n) * W_temp
    return grad

In [26]:
def logreg_gradient_descent_reg(X, Y, W, cost_function, gradient_function, alpha, num_iters, lambda_): 
    """
    Performs batch gradient descent to learn theta. Updates theta by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      X                   : Data ndarray array, n examples with m features
      Y                   : ndarray vector of target n values
      W                   : ndarray vector of initial m model parameters 
      cost_function       : function to compute cost
      gradient_function   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
    Returns:
      W (ndarray)         : Updated values of parameters 
      """
    
    # =====================CODE HERE ======================
    grad = np.zeros(W.shape)
    for i in range(num_iters):
        grad = gradient_function(W, X, Y, lambda_)
        W = W -  alpha * grad
        J = cost_function(W, X, Y, lambda_)
    # =============================================================
    
    return J, W

In [27]:
# initialize parameters
initial_W = np.array([-40]*X_train.shape[1])

# some gradient descent settings
iterations = 20000
alpha = 0.02
lambda_ = 2
W = 0
J = 0
acc = 0

"""
Apply functions coded above to calculate:
    final W after training,
    cost J for training set after training
    accuracy for test set
Using given datasets and parameters
"""

# ===================== CODE HERE ======================
J, W = logreg_gradient_descent_reg(X_train, Y_train, initial_W, logreg_costFunctionReg, logreg_GradFunctionReg, alpha, iterations, lambda_)
y_pred = np.round(sigmoid(np.dot(X_test, W)))
correct = np.sum(y_pred == Y_test)
acc = correct / len(Y_test)

# ===========================================================

print('Please copy the folowing result line to Question 4 "(sumW = )"')
print(np.round(np.sum(W), 2))
print('Please copy the folowing result line to Question 4 "(J = )"')
print(np.round(J,2))
print('Please copy the folowing result line to Question 4 "(Accuracy = )"')
print(np.round(acc,2))

Please copy the folowing result line to Question 4 "(sumW = )"
1.95
Please copy the folowing result line to Question 4 "(J = )"
0.49
Please copy the folowing result line to Question 4 "(Accuracy = )"
0.8
