In [None]:
import numpy as np
import matplotlib.pyplot as plt
from utils import *
import copy
import math

%matplotlib inline

2 - Logistic Regression
In this part of the exercise, you will build a logistic regression model to predict whether a student gets admitted into a university.


2.1 Problem Statement
Suppose that you are the administrator of a university department and you want to determine each applicant’s chance of admission based on their results on two exams.

You have historical data from previous applicants that you can use as a training set for logistic regression.
For each training example, you have the applicant’s scores on two exams and the admissions decision.
Your task is to build a classification model that estimates an applicant’s probability of admission based on the scores from those two exams.

2.2 Loading and visualizing the data
You will start by loading the dataset for this task.

The load_dataset() function shown below loads the data into variables X_train and y_train
X_train contains exam scores on two exams for a student
y_train is the admission decision
y_train = 1 if the student was admitted
y_train = 0 if the student was not admitted
Both X_train and y_train are numpy arrays.

In [None]:
# load dataset
X_train, y_train = load_data("data/ex2data1.txt")

2.3 Sigmoid function
Recall that for logistic regression, the model is represented as

𝑓𝐰,𝑏(𝑥)=𝑔(𝐰⋅𝐱+𝑏)
 
where function  𝑔  is the sigmoid function. The sigmoid function is defined as:

𝑔(𝑧)=11+𝑒−𝑧
 
Let's implement the sigmoid function first, so it can be used by the rest of this assignment.


Exercise 1
Please complete the sigmoid function to calculate

𝑔(𝑧)=11+𝑒−𝑧
 
Note that

z is not always a single number, but can also be an array of numbers.
If the input is an array of numbers, we'd like to apply the sigmoid function to each value in the input array.
If you get stuck, you can check out the hints presented after the cell below to help you with the implementation.

In [None]:
# Function for logistic function (sigmoid function)
def sigmoid(z):
  
    g = 1/(1+np.exp(-z))

    return g

In [None]:
# Function for calculating the cost
def compute_cost(X, y, w, b,lambda_=1):
    m, n = X.shape

    total_cost=0
    for i in range(m):
        z=np.dot(w,X[i,:])+b# activation function
        f=sigmoid(z)# sigmoid function
        loss = (-y[i]*np.log(f))-(1-y[i])*np.log(1-f)#loss
        total_cost+=loss
    total_cost=total_cost/m
    return total_cost

In [None]:
# Function for optimizing the parameters using gradiet descent
def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters, lambda_): 

    m = len(X)

    J_history = []
    w_history = []
    
    for i in range(num_iters):

        dj_db, dj_dw = gradient_function(X, y, w_in, b_in, lambda_)   

        w_in = w_in - alpha * dj_dw               
        b_in = b_in - alpha * dj_db              

        if i<100000:      # prevent resource exhaustion 
            cost =  cost_function(X, y, w_in, b_in, lambda_)
            J_history.append(cost)

        if i% math.ceil(num_iters/10) == 0 or i == (num_iters-1):
            w_history.append(w_in)
            print(f"Iteration {i:4}: Cost {float(J_history[-1]):8.2f}   ")
        
    return w_in, b_in, J_history, w_history #return w and J,w history for graphing

In [None]:
# Function for predicting the values using decision boundary as 0.5
def predict(X, w, b): 

    m, n = X.shape   
    p = np.zeros(m)

    for i in range(m):   
        z_wb = 0
        for j in range(n): 
            z_wb += w[j]*X[i,j]
        z_wb += b

        f_wb = sigmoid(z_wb)

        if f_wb >= 0.5:
            p[i] = 1.0
        else:
            p[i] = 0.0

    return p

3 - Regularized Logistic Regression
In this part of the exercise, you will implement regularized logistic regression to predict whether microchips from a fabrication plant passes quality assurance (QA). During QA, each microchip goes through various tests to ensure it is functioning correctly.


3.1 Problem Statement
Suppose you are the product manager of the factory and you have the test results for some microchips on two different tests.

From these two tests, you would like to determine whether the microchips should be accepted or rejected.
To help you make the decision, you have a dataset of test results on past microchips, from which you can build a logistic regression model.

3.2 Loading and visualizing the data
Similar to previous parts of this exercise, let's start by loading the dataset for this task and visualizing it.

The load_dataset() function shown below loads the data into variables X_train and y_train
X_train contains the test results for the microchips from two tests
y_train contains the results of the QA
y_train = 1 if the microchip was accepted
y_train = 0 if the microchip was rejected
Both X_train and y_train are numpy arrays.

While the feature mapping allows us to build a more expressive classifier, it is also more susceptible to overfitting. In the next parts of the exercise, you will implement regularized logistic regression to fit the data and also see for yourself how regularization can help combat the overfitting problem.


3.4 Cost function for regularized logistic regression
In this part, you will implement the cost function for regularized logistic regression.

Recall that for regularized logistic regression, the cost function is of the form
𝐽(𝐰,𝑏)=1𝑚∑𝑖=0𝑚−1[−𝑦(𝑖)log(𝑓𝐰,𝑏(𝐱(𝑖)))−(1−𝑦(𝑖))log(1−𝑓𝐰,𝑏(𝐱(𝑖)))]+𝜆2𝑚∑𝑗=0𝑛−1𝑤2𝑗
 
Compare this to the cost function without regularization (which you implemented above), which is of the form

𝐽(𝐰.𝑏)=1𝑚∑𝑖=0𝑚−1[(−𝑦(𝑖)log(𝑓𝐰,𝑏(𝐱(𝑖)))−(1−𝑦(𝑖))log(1−𝑓𝐰,𝑏(𝐱(𝑖)))]
 
The difference is the regularization term, which is
𝜆2𝑚∑𝑗=0𝑛−1𝑤2𝑗
 
Note that the  𝑏  parameter is not regularized.

In [None]:
# Function for calculating cost with regularization
def compute_cost_reg(X, y, w, b, lambda_ = 1):
    m, n = X.shape

    cost_without_reg = compute_cost(X, y, w, b) 
    
    reg_cost = 0.
    for j in range(n):
        reg_cost+=w[j]**2

    total_cost = cost_without_reg + (lambda_/(2 * m)) * reg_cost #regularization

    return total_cost

3.5 Gradient for regularized logistic regression
In this section, you will implement the gradient for regularized logistic regression.

The gradient of the regularized cost function has two components. The first,  ∂𝐽(𝐰,𝑏)∂𝑏  is a scalar, the other is a vector with the same shape as the parameters  𝐰 , where the  𝑗th  element is defined as follows:

∂𝐽(𝐰,𝑏)∂𝑏=1𝑚∑𝑖=0𝑚−1(𝑓𝐰,𝑏(𝐱(𝑖))−𝑦(𝑖))
 
∂𝐽(𝐰,𝑏)∂𝑤𝑗=(1𝑚∑𝑖=0𝑚−1(𝑓𝐰,𝑏(𝐱(𝑖))−𝑦(𝑖))𝑥(𝑖)𝑗)+𝜆𝑚𝑤𝑗for 𝑗=0...(𝑛−1)
 
Compare this to the gradient of the cost function without regularization (which you implemented above), which is of the form
∂𝐽(𝐰,𝑏)∂𝑏=1𝑚∑𝑖=0𝑚−1(𝑓𝐰,𝑏(𝐱(𝑖))−𝐲(𝑖))(2)
∂𝐽(𝐰,𝑏)∂𝑤𝑗=1𝑚∑𝑖=0𝑚−1(𝑓𝐰,𝑏(𝐱(𝑖))−𝐲(𝑖))𝑥(𝑖)𝑗(3)
As you can see, ∂𝐽(𝐰,𝑏)∂𝑏  is the same, the difference is the following term in  ∂𝐽(𝐰,𝑏)∂𝑤 , which is
𝜆𝑚𝑤𝑗for 𝑗=0...(𝑛−1)

In [None]:
# Function for calculating regularized gradient
def compute_gradient_reg(X, y, w, b, lambda_ = 1): 
    m, n = X.shape
    
    dj_db, dj_dw = compute_gradient(X, y, w, b)
    for j in range(n):
        dj_dw[j]+=lambda_/m*w[j]

    return dj_db, dj_dw