
# Classification problem: Linear Classification (LC) vs Logistic Regression (LR).

## Linear Classification Model
 The parameter vector theta includes the bias, so we append a 1 to the feature vector x.
The prediction is $y = sign(h_{\theta}) = sign(\theta^Tx)$.

In [70]:
##########################################################
# Compute the prediction of a linear classification model
##########################################################

def predict_lc(x, theta):
    '''
    Inputs:
        x: np.ndarray of shape (num_samples, num_features + 1)
        theta: np.ndarray of shape (num_features + 1, 1)
    Outputs:
        y: np.ndarray of shape (num_samples, 1)
    '''
    # y = np.sign( np.dot(  theta.T, x.T) )
    y = np.sign( np.dot( x, theta ) )
    return y

In [74]:
#########################################################
# Loss function: mean squared error (in vectorised form)
#########################################################

def mse(y_true, y_pred):
    '''
    Inputs:
        y_true: np.ndarray of shape (m, 1)
        y_pred: np.ndarray of shape (m, 1)
        (m is the number of samples)
    '''
    m = len(y_pred)
    # m = len(y_true)
    cost = y_pred - y_true
    J =  np.dot(cost.T, cost) / (2 * m)
    return J

In [75]:
####################################
# Gradient of the MSE loss function
####################################

def gradient(y_true, y_pred, x):
    '''
    Inputs:
        y_true: np.ndarray of shape (m, )
        y_pred: np.ndarray of shape (m, )
        x: np.ndarray of shape (m, 3)
    Outputs:
        dJ: np.array of shape (3, 1)
    '''
    # Reshape arrays
    y_true = y_true.reshape(-1, 1) # new shape (m, 1)
    y_pred = y_pred.reshape(-1, 1) # new shape (m, 1)

    m = len(y_pred) # = len(y_true), number of samples    
    dJ = np.dot( (y_pred - y_true).T, x ).T / m
    return dJ

In [78]:
#######################################################
# Optimize Linear Classification with Gradient Descent
#######################################################

def gradient_descent(x, y, activation_function, cost_function, gradient_function,
                     epochs = 400, seed = 1234, learning_rate = 0.01, print_every = 10):
    '''
    Inputs:
        x: np.ndarray input data of shape [num_samples, num_feat + 1]
        y: np.ndarray target data of shape [num_samples, 1]
        ...
        gradient_function: the gradient of the cost function
        ...
    Outputs:
        theta: the optimal parameter vector
        loss: th loss vector
    '''
    # Initialize theta parameters
    np.random.seed(seed);
    theta = np.random.normal(0, 0.001, size = (x.shape[1], 1)) / np.sqrt(2)

    loss = []
    print('Training...')
    print(''.join(['=' for _ in range(40)]))
    # print("=" * 40)

    # Iterations of gradient descent
    for epoch in range(epochs + 1):
        loss_epoch = [];

        # Model prediction (with possible activation function)
        z = x.dot(theta);
        h = activation_function(z) if activation_function is not None else z;

        # update loss (with the chosen cost function)
        loss += [cost_function(y, h)];

        # gradient computation and parameters update (with the chosen gradient function and learning rate)
        dJ = gradient_function(y, h, x);
        theta = theta - learning_rate * dJ;

        # Print loss information
        if epoch % print_every == 0:
            print(f'Epoch {epoch}: Loss {loss[-1]}');

    loss = np.array(loss).reshape(-1) # this is probably useful
    return theta, loss

In [None]:
################################################################################
# Evaluate Linear Classification (using "theta", i.e. the optimised parameters,
# and predict_lc(x, theta) = sign(theta . x)
################################################################################

def evaluate_lc(x, theta, y):
    return predict_lc(x, theta)

# Logistic Regression Model

Logistic regression model does not use the sign function; it uses a sigmoid function

$$g(z) = \frac{1}{1 + e^{-z}}$$

as an activation function:

$$h_{\theta}(x) = g(\theta^Tx)$$.

The prediction can be made considering 0.5 as a threshold:

$$y = round(h_{\theta}(x)) = round\left(\frac{1}{1 + e^{-\theta^Tx}}\right)$$

In [79]:
################################
# A common sigmoid function
################################

def sigmoid(z):
    '''
    Inputs:
        z: np.ndaray of shape (m, 3)
    Outputs:
        s: np.ndarray of shape (m, 3) where s[i, j] = g(z[i, j])
    '''
    return np.divide(1, 1 + np.exp(np.negative(z)))

## Cross entropy loss function 

$$J(\theta) = \frac{1}{m}\sum_{i=1}^m \left[ -y^{(i)}\log (h_{\theta}(x^{(i)})) - (1 - y^{(i)}) \log(1 - h_{\theta}(x^{(i)})) \right]$$




In [80]:
##############################
# Cross entropy loss function 
##############################

def xent(y_true, y_pred):
    '''
    Inputs:
        y_true: np.ndarray of shape (m,)
        y_pred: np.ndarray of shape (m,) ( y_pred == h_theta{x^i} )
    Outputs:
        J: float
    '''
    m = len(y_true) # = len(y_pred) == num_samples
    J = np.negative( np.dot(y_true, np.log(y_pred)) + np.dot((1 - y_true), np.log(1 - y_pred)) ) / m
    return J

In [81]:
################################################################################
# Evaluate Logistic Classification (using "theta", i.e. the optimised parameters,
# and predict_lr(x, theta) = round(sigmoid(theta . x))
################################################################################

def predict_lr(x, theta):
    return np.round(sigmoid(np.dot(x, theta)))