# Logistic Regression

Logistic regression is a process of modeling the probability of a discrete outcome given an input variable. The most common logistic regression models a binary outcome; something that can take two values such as true/false, yes/no, and so on.

In this week you will be doing logistic regression on breast cancer dataset using sklearn library. Feel free to create any new functions required.

In [67]:
#importinf libraries
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import datasets
import numpy as np

Prepare Data

In [68]:
breast_cancer = datasets.load_breast_cancer()
X, y = breast_cancer.data, breast_cancer.target

In [69]:
#spliting data for training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)   #because fit has already been run for the training data in the fit_transform. Using that mean and variance only, we fit the test_data

Implement Logistic Regression here :)

Print the accuracy and cross entropy loss

In [70]:
def sigmoid(z):
  return 1 / (1 + np.exp(-z))


class LogisticRegression:
    def __init__(self, lr=0.01, iters=1000): #lr (learning rate) & iters (iterations) could be anything of your choice
      self.lr, self.iters = lr, iters

    def fit(self, X, y):
      # self.X, self.y = X, y
      #Initialising the weight and bias vectors; m is the number of datapoints, n is the number of parameters
      m, n = X.shape
      # print(m, n)
      w = np.zeros(n)
      # print(w.shape)
      b = 0
      #optimising the cost function
      for i in range(self.iters):
        y_pred = sigmoid(np.dot(X, w) + b)
        dw = np.mean((np.dot((y_pred - y), X)), axis = 0)
        db = np.mean(y_pred - y)
        w = w - dw * self.lr
        b = b - db * self.lr
      #setting the values of w and b for the model
      self.w = w
      self.b = b

    def predict(self, X):
      self.probability = sigmoid(np.dot(X, self.w) + self.b)
      y_pred = np.where(self.probability > 0.5, 1, 0)
      return y_pred

model = LogisticRegression(lr=0.01, iters=50000)
model.fit(X_train, y_train)
# print(model.w, model.b)

In [71]:
print("Predicted values:\n", model.predict(X_test))
print("Actual values:\n", y_test)

# accuracy_array = np.bitwise_xor(y_test, model.predict(X_test))
matches = y_test == model.predict(X_test)
print("Accuracy of the model: ", (np.sum(matches)/len(y_test))*100, "%", sep="")

Predicted values:
 [1 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 1 1 1 0 1 0 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1
 0 1 1 0 1 0 1 1 1 1 1 0 1 0 1 1 0 1 1 0 1 1 0 0 1 1 1 1 1 0 1 1 1 1 0 1 0
 1 1 1 0 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 1 0 1 0 0 1 1 1 1 1 1 0 1 0
 1 1 0]
Actual values:
 [1 1 1 1 1 1 0 1 0 0 0 1 1 1 1 0 1 1 1 0 1 0 0 0 0 1 0 1 1 1 1 1 0 1 1 1 1
 0 0 1 0 1 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 0
 1 1 1 0 1 0 1 1 1 0 0 0 0 0 1 0 1 0 1 0 0 1 1 1 1 1 0 0 1 1 0 1 1 0 0 1 0
 1 0 0]
Accuracy of the model: 81.57894736842105%


Binary cross entropy loss

In [73]:
#This is the calculation of cost function for the test data
def BCELoss(y,y_pred):
    return -np.mean(y * np.log(y_pred) + (1-y) * np.log(1-y_pred))

print("Value of cost function:", BCELoss(y_test, model.probability))

Value of cost function: 0.39320560047650693
