## Project Overview
Title: Logistic Regression Model

Author: Jayden Chan

Description: This notebook will seek to implement a logistic regression model on a dataset of credit card fraud. The dataset is comprised of several features including: distance from home, distance from last transaction, ratio to median purchase price, etc. The goal of this model will be to predict whether a credit card transaction is fraudulent or not.

Credits: The dataset used in this notebook has been sourced from [Kaggle](https://www.kaggle.com/datasets/dhanushnarayananr/credit-card-fraud). Some code has been sourced from [Coursera](https://www.coursera.org/learn/machine-learning) and edited to fit project requirements.

## Packages
- [numpy](www.numpy.org)
- [matplotlib](http://matplotlib.org)
-  ``utility.py`` contains useful helper functions for the implementation of this notebook.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from utility import *
import copy
import math

%matplotlib inline

## Logistic Regression

### The Data
- `categories` contains the categories of the dataset.
- `x_array` contains the features in a 2D array.
- `y_array` contains the labels.
    - `y_array` = 1 if the transaction was fraudulent
    - `y_array` = 0 if the transaction was legitimate

In [2]:
categories, x_array, y_array = load_data("card_transdata.csv")

In [3]:
print("The categories are:\n", categories[:])

The categories are:
 ['distance_from_home' 'distance_from_last_transaction'
 'ratio_to_median_purchase_price' 'repeat_retailer' 'used_chip'
 'used_pin_number' 'online_order']


In [4]:
print("First five elements in x_array are:\n", x_array[:5])
print("Type of x_array", type(x_array))

First five elements in x_array are:
 [[57.87785658  0.31114001  1.94593998  1.          1.          0.
   0.        ]
 [10.8299427   0.1755915   1.29421881  1.          0.          0.
   0.        ]
 [ 5.09107949  0.80515259  0.42771456  1.          0.          0.
   1.        ]
 [ 2.24756433  5.60004355  0.36266258  1.          1.          0.
   1.        ]
 [44.190936    0.56648627  2.2227673   1.          1.          0.
   1.        ]]
Type of x_array <class 'numpy.ndarray'>


In [None]:
print("First five elements in y_array are:\n", y_array[:5])
print("Type of y_array", type(y_array))

In [None]:
print ('The shape of x_array is: ' + str(x_array.shape))
print ('The shape of y_array is: ' + str(y_array.shape))
print ('We have m = %d training examples' % (len(y_array)))

### Data Scaling

We will perform feature scaling.

In [None]:
x_array = feature_scaling(x_array)

### Visualize the Data
Change the `category1` and `category2` inputs to determine what is graphed on the x-axis and y-axis.

Change the `x_limiter` and `y_limiter` inputs to determine how the x-axis and y-axis are scaled.

Change the `marker_size` input to determine the size of markers on the graph.

In [None]:
# ===== User Inputs =====
category1 = 0
category2 = 2

x_limiter = 0.25
y_limiter = 0.25

marker_size = 3

# ===== Graph Code =====
logistic_graph(x_array, y_array, category1, category2, marker_size, pos_label="Fraudulent", neg_label="Legitimate")

plt.ylabel(categories[category1])
plt.xlabel(categories[category2])
plt.title("Scatter plot of training data")
plt.legend(loc="upper right")

plt.xlim(0, x_limiter)
plt.ylim(0, y_limiter)

plt.show()

### Feature Mapping

Because the datset is nonlinear, we will have to perform feature mapping.

In [None]:
print("Original shape of data:", x_array.shape)

mapped_x = map_features(x_array)
print("Shape after feature mapping:", mapped_x.shape)

In [None]:
print("x_array[0]:", x_array[0])
print("mapped x_array[0]:", mapped_x[0])

### Implement Gradient Descent

Also, we will have to account for regularization.

`compute_cost` and `compute_gradients` are helper functions to implement gradient descent.

`compute_cost_reg` and `compute_gradients_reg` account for regularization.

In [None]:
def compute_cost(x, y, w, b, optional):
    '''
    Computes the cost function over the entire dataset.
    
    Args:
        x (ndarray Shape (m,n)) : An array containing the featuries of the dataset, excluding the last column.
        y (ndarray Shape (m,1)) : An array containing the labels of the dataset. 
        w (ndarray Shape (n,1)) : An array containing the parameters of the model.
        b (scalar)              : The bias term.
    Returns:
        cost (scalar) : The total cost.
    '''
    
    m = x.shape[0]
    z_wb = np.dot(x, w) + b
    
    f_wb = sigmoid(z_wb)
    
    epsilon = 1e-10
    f_wb = np.clip(f_wb, epsilon, 1 - epsilon)
    
    loss = -np.dot(y.T, np.log(f_wb)) - np.dot((1 - y).T, np.log(1 - f_wb))
    
    cost = (1/m)*loss
    
    return cost

In [None]:
def compute_cost_reg(x, y, w, b, lambda_const):
    '''
    Computes the cost function over the entire dataset while accounting for regularization.
    
    Args:
        x (ndarray Shape (m,n))      : An array containing the featuries of the dataset, excluding the last column.
        y (ndarray Shape (m,1))      : An array containing the labels of the dataset. 
        w (ndarray Shape (n,1))      : An array containing the parameters of the model.
        b (scalar)                   : The bias term.
        lambda_const (scalar, float) : Regularization constant
    Returns:
        cost (scalar) : The total cost.
    '''
    
    m, n = x.shape
    
    orig_cost = compute_cost(x, y, w, b, 0)
    reg_cost = 0
    
    for i in range(n):
        reg_cost += w[i]**2
    
    reg_cost *= (lambda_const / (2 * m))
    
    cost = orig_cost + reg_cost
    
    return cost

In [None]:
def compute_gradient(x, y, w, b, optional):
    '''
    Compute the gradients of the loss function with respect to the parameters w and b.
    
    Args:
        x (ndarray Shape (m,n)) : An array containing the featuries of the dataset, excluding the last column.
        y (ndarray Shape (m,1)) : An array containing the labels of the dataset. 
        w (ndarray Shape (n,1)) : An array containing the parameters of the model.
        b (scalar)              : The bias term.
    Returns:
        d_dw (ndarray Shape (n,1)) : An array containing the gradients of the loss function with respect to the parameters w.
        d_db (scalar)              : The gradient of the loss function with respect to the bias term.
    '''

    m = x.shape[0]
    z_wb = np.dot(x, w) + b
    
    f_wb = sigmoid(z_wb)
        
    dj_dw = np.dot(x.T, f_wb - y)
    dj_db = np.sum(f_wb - y)
        
    dj_dw /= m
    dj_db /= m
    
    return dj_dw, dj_db

In [None]:
def compute_gradient_reg(x, y, w, b, lambda_const):
    '''
    Compute the gradients of the loss function with respect to the parameters w and b.
    
    Args:
        x (ndarray Shape (m,n))      : An array containing the featuries of the dataset, excluding the last column.
        y (ndarray Shape (m,1))      : An array containing the labels of the dataset. 
        w (ndarray Shape (n,1))      : An array containing the parameters of the model.
        b (scalar)                   : The bias term.
        lambda_const (scalar, float) : Regularization constant
    Returns:
        d_dw (ndarray Shape (n,1)) : An array containing the gradients of the loss function with respect to the parameters w.
        d_db (scalar)              : The gradient of the loss function with respect to the bias term.
    '''

    m, n  = x.shape
        
    dj_dw, dj_db = compute_gradient(x, y, w, b, 0)
        
    for i in range(n):
        dj_dw_reg = (lambda_const / m) * w[i]
        dj_dw[i] += dj_dw_reg
    
    return dj_dw, dj_db

`gradient_descent` allows for regularization by changing the `cost_function` and `gradient_function`.

In [None]:
def gradient_descent(x, y, w_i, b_i, cost_function, gradient_function, alpha, iterations, lambda_const, tolerance):
    '''
    Performs batch gradient descent by simultaneously updating w and b in order to reduce cost.
    
    Args:
        x (ndarray Shape (m,n))      : An array containing the featuries of the dataset, excluding the last column.
        y (ndarray Shape (m,1))      : An array containing the labels of the dataset. 
        w_i (ndarray Shape (n,1))    : An array containing the initial parameters of the model.
        b_i (scalar)                 : The initial bias term.
        cost_function                : Function to compute cost
        gradient_function            : Function to compute gradient
        alpha (float)                : Learning Rate
        iterations (int)             : The number of times to run this function
        lambda_const (scalar, float) : Regularization constant
    Returns:
        w_f (ndarray Shape (n,1)) : An array containing the updated parameters of the model.
        b_f (scalar)              : The updated bias term.
    '''
    
    cost_history = []
    prev_cost = float('inf')

    for i in range(iterations):
        dj_dw, dj_db = gradient_function(x, y, w_i, b_i, lambda_const)
        
        # Simultaneous update of weights and bias
        w_i -= alpha * dj_dw
        b_i -= alpha * dj_db
        
        cost = cost_function(x, y, w_i, b_i, lambda_const)
        cost_history.append(cost)
        
        if abs(cost - prev_cost) < tolerance:
            print(f"Convergence reached at iteration {i}. Cost {float(cost_history[-1]):8.2f}")
            break

        prev_cost = cost
        
        if i % (iterations // 10) == 0 or i == (iterations-1):
            print(f"Iteration {i:4}: Cost {float(cost_history[-1]):8.2f}")
        
    return w_i, b_i

### Perform Batch Gradient Descent

We will be performing regularized batch gradient descent.

In [None]:
np.random.seed(1)
m, n = x_array.shape
w_i = 0.001 * (np.random.rand(mapped_x.shape[1]) - 0.5 )
b_i = 10

# ===== User Settings =====
iterations = 10000
alpha = 0.1
lambda_const = 0.01
tolerance = 1e-6

# =========================
w, b = gradient_descent(mapped_x, y_array, w_i, b_i, compute_cost_reg, compute_gradient_reg, 
                        alpha, iterations, lambda_const, tolerance)

### Visualize the Decision Boundary

Because there are multiple features, an n+1 dimensional graph is necessary.

### Prediction Accuracy

In [None]:
def predict(x, w, b): 
    """
    Predict whether the label is 0 or 1.
    
    Args:
        x (ndarray Shape (m,n)) : An array containing the featuries of the dataset, excluding the last column.
        w (ndarray Shape (n,1)) : An array containing the parameters of the model.
    Returns:
        p (ndarray (m,1)) : The predictions for x using a threshold at 0.5
    """
    
    z_wb = np.dot(x, w) + b
    
    probabilities = sigmoid(z_wb)
    
    p = (probabilities >= 0.5).astype(int)
        
    return p

In [None]:
p = predict(mapped_x, w, b)

print('Train Accuracy: %f'%(np.mean(p == y_array) * 100))