# Overfitting

### Overfitting is the process of training a model to perfectly predict all the data points in a training dataset. It is seen in different cases while training a model. Some of the cases of over fitting are as follows:

- When there is limited data in the training set
- When the model function is a higher order polynomial 
- When the weights of the model function vary in range and hence feature scaling

# Dealing with Overfitting

### An overfit model can be dealth with in many ways

### Basic Measures (Try First)

- Addition of more training examples in the training dataset
- Excluding any unwanted features in the training dataset if necessary
- Reducing the a higher order model to a simpler model

### Regularization (If nothing works)

- Regularization is the process of nullifying the effects of an overfit model by penalising the model for every wrong prediction made

- This is done by introducing a new term to the cost function of logistic regression then `Regularization Term`

- (lambda / 2m) * sum(w[j] ** 2)

- Where `lambda` is the `Regularization Parameter`

### Regularization thus effectively reduces the parameters of the model as a whole to a small range while keeping the cost function at a global minima

- `Cost Function`: Same as Usual + `Regularization Term`

- `Gradient dj_dw`: Same as Usual + `(lamdba / m) * w[j]`

- `Gradient dj_db`: Same as Usual

## Creating the model as before with regularization

In [5]:
# Necessary Imports
import math, copy
import numpy as np

## Sigmoid Function

In [6]:
def sigmoid_val(x):
    return 1 / (1 + math.e ** -(x))

## Training Data

In [9]:
X_train = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])
y_train = np.array([0, 0, 0, 1, 1, 1])

## Regularized Cost Function

In [13]:
def compute_cost_logistic(X, y, w, b, lambda_):
    
    """
    This function calculates the simplified loss of every training example in logistic regression using 
    the above equation
    
    Args:
        X - Features
        y - Targets
        w, b - Parameters of the model
        lambda_ - Regularization Parameter
        
    Returns:
        total_cost - A scalar value of the total cost of the model after iterating over all the training examples
    """
    
    # Size of the training data
    m, n = X.shape
    
    # Total cost value
    total_cost = 0
    
    # Regularization Term
    reg_term = 0
    
    # Iterating over each training example to calculate the cost
    for i in range(m):
        
        # Calculating the Linear Regression Value
        z_wb = np.dot(X[i], w) + b
        
        # Mapping the prediction to Sigmoid Function
        f_wb = sigmoid_val(z_wb)
        
        # Calculating the loss
        total_cost += (-y[i] * math.log(f_wb)) - (1 - y[i]) * math.log(1 - f_wb)
        
    
    # Calculating the Regularization Term
    for j in range(n):
        reg_term += w[j] ** 2
        
    reg_term *= (lambda_ / (2 * m))
        
    # Taking the mean of all the values and adding the reg term
    total_cost /= m
    total_cost += reg_term
    
    return total_cost

In [14]:
# Checking the performance with temporary values
w_temp = np.array([1, 1])
b_temp = -3

print(f"The cost of the model is: {compute_cost_logistic(X_train, y_train, w_temp, b_temp, 1)}")

The cost of the model is: 0.5335334530721841


## Calculating the Gradients

In [15]:
def compute_gradient_logistic(X, y, w, b, lambda_):
    
    """
    This function computes the gradient of required by Gradient Descent to update the model parameters
    
    Args:
        X - Features
        y - Targets
        w, b - Model Parameters
        lambda_ - Regularization Parameter
        
    Returns:
        dj_dw, dj_db - The optimized values of gradients for all the parameters for all the training examples for 
                       a given iteration of gradient descent
    """
    
    # Shape of the training date
    m, n = X.shape
    
    # Initialisng the gradient values
    dj_dw = np.zeros((n,))
    dj_db = 0
    
    # Looping over all the traning examples to calculate the gradients
    for i in range(m):
        
        # Linear Regression
        z_wb = np.dot(X[i], w) + b
        
        # Logistic Regression
        f_wb = sigmoid_val(z_wb)
        
        # Calculating the error
        err = f_wb - y[i]
        
        # Updating the gradients of each feature
        for j in range(n):
            dj_dw[j] += (err * X[i, j]) + ((lambda_ / m) * w[j])
            
        dj_db += err
        
    # Taking the mean of the gradients
    dj_dw /= m
    dj_db /= m
        
    return dj_dw, dj_db

In [17]:
# Computing the gradient for gradient descent
w_tmp = np.array([2.,3.])
b_tmp = 1.
dj_dw_tmp, dj_db_tmp = compute_gradient_logistic(X_train, y_train, w_tmp, b_tmp, 1)

print(f"dj_db: {dj_db_tmp}" )
print(f"dj_dw: {dj_dw_tmp.tolist()}" )

dj_db: 0.49861806546328574
dj_dw: [0.8316667266120293, 0.9988394298399669]


## Gradient Descent

In [18]:
def gradient_descent(X, y, w_in, b_in, alpha, lambda_, cost_function, gradient_function, num_iterations):
    
    """
    This function computes the gradient descent for logistic regression and optimizes the parameters of the model 
    while finding a global minima to minimize the cost of the prediction made by the model
    
    Args:
        X - Features
        y - Targets
        w, b - Parameters of the model
        alpha - Learning Rate of the model
        lambda_ - Regularization Parameter
        cost_function - Used to calculate the cost of the model perdiction for each training example
        gradient_function - Used to calculate the gradient of all the parameters used by the model
        num_iterations - The no of times gradient descent algorithm is run on the given dataset
        
    Returns:
        w, b - Final value of the parameters of the model after gradient descent
    """
    
    # Size of traning data
    m, n = X.shape
    
    # Making a deep copy of the parameters to not affect the global variables
    w = copy.deepcopy(w_in)
    b = b_in
    
    # Storing the history of model cost during gradient descent
    J_history = []
    
    # Looping over the training data for num_iterations
    for i in range(num_iterations):
        
        # Compute Gradients
        dj_dw, dj_db = gradient_function(X, y, w, b, lambda_)
        
        # Updating the parameters
        w = w - alpha * dj_dw
        b = b - alpha * dj_db
        
        # Storing the cost of every iteration
        if i < 10000:
            J_history.append(cost_function(X, y, w, b, lambda_))
            
        # Printing the cost for every 1000 iterations
        if i % math.ceil(num_iterations / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]}   ")
        
    return w, b, J_history 

## Optimizing the model after Regularized Gradient Descent

In [20]:
# Some Initial Values of Parameters of the model
w_tmp  = np.zeros_like(X_train[0])
b_tmp  = 0.

# Gradient Descent Parameters
alph = 0.1
iters = 10000

# Computing Gradient Descent
w_out, b_out, _ = gradient_descent(X_train, y_train, w_tmp, b_tmp, alph, 1, compute_cost_logistic, 
                                   compute_gradient_logistic, iters) 
print(f"\nupdated parameters: w:{w_out}, b:{b_out}")

Iteration    0: Cost 0.6846857000420554   
Iteration 1000: Cost 0.5203321414569039   
Iteration 2000: Cost 0.5203016151380848   
Iteration 3000: Cost 0.5203016077007424   
Iteration 4000: Cost 0.5203016076989254   
Iteration 5000: Cost 0.520301607698925   
Iteration 6000: Cost 0.5203016076989251   
Iteration 7000: Cost 0.520301607698925   
Iteration 8000: Cost 0.520301607698925   
Iteration 9000: Cost 0.520301607698925   

updated parameters: w:[0.90411532 0.73588062], b:-2.3337541849928076


In [21]:
# Predicting values using the model
X_pred_1 = np.array([1, 1])
y_pred_1 = 0

X_pred_2 = np.array([2, 2])
y_pred_2 = 1

y_hat_1 = sigmoid_val(np.dot(w_out, X_pred_1) + b_out)
y_hat_2 = sigmoid_val(np.dot(w_out, X_pred_2) + b_out)

if y_hat_1 < 0.5:
    print(f"Model Prediction: {y_hat_1} 0 \t Actual Value: {y_pred_1}")
    
else:
    print(f"Model Prediction: {y_hat_1} 1 \t Actual Value: {y_pred_1}")
    
if y_hat_2 < 0.5:
    print(f"Model Prediction: {y_hat_2} 0 \t Actual Value: {y_pred_2}")
    
else:
    print(f"Model Prediction: {y_hat_2} 1 \t Actual Value: {y_pred_2}")

Model Prediction: 0.3331975531912616 0 	 Actual Value: 0
Model Prediction: 0.720357915618094 1 	 Actual Value: 1
