Assignment: Binary Classification with Logistic Regression

In this assignment, you will work with the Iris dataset to perform binary classification using logistic regression. The Iris dataset contains samples from three different species of iris flowers, but for this assignment, you will focus on classifying Iris Setosa (class 0) versus the combination of the other two classes (class 1).

Here are the steps you need to follow for this assignment:

Step 1: Load the Iris dataset

Load the Iris dataset using sklearn.datasets.load_iris().
Extract the feature matrix X and the target vector y.


Step 2: Preprocess the data

To convert this problem into binary classification, create a new target vector y_binary where Iris Setosa (class 0) is labeled as 1, and the other two classes are labeled as 0.


Step 3: Split the dataset

Split the dataset into training and testing sets using train_test_split() from sklearn.model_selection.
Use 80% of the data for training and 20% for testing. Set the random_state to ensure reproducibility.

Step 4: Define the cost function (logistic loss)

Implement the logistic loss function, which calculates the cost of your model's predictions.

Step 5: Define the training function

Implement a training function that uses gradient descent to optimize the logistic regression model.
The function should take input data, learning rate, number of iterations, and regularization parameter as arguments.

Step 6: Train the model

Use the training function to train your logistic regression model on the training data.
Obtain the weight vector W and bias term b.


Step 7: Define the prediction function

Implement a prediction function that takes input data and the trained model's weights and bias.
The prediction function should use the logistic sigmoid function to make binary predictions (0 or 1).


Step 8: Predict on the test set

Use the prediction function to predict the classes for the test set X_test using the obtained weights and bias.


Step 9: Evaluate the model's performance

Calculate the accuracy of your model using accuracy_score() from sklearn.metrics.
Generate the confusion matrix using confusion_matrix() from sklearn.metrics.
Generate the classification report using classification_report() from sklearn.metrics.
Print out the accuracy, confusion matrix, and classification report to evaluate your model's performance.
Make sure to comment your code and provide explanations for each step. This assignment will help you understand the basics of binary classification, logistic regression, and how to evaluate the performance of your model using various metrics.

In [103]:
import numpy as np
from sklearn.datasets import load_iris
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt


In [104]:
# Step 1: Load the Iris dataset
iris = load_iris()

In [105]:
# Read the shape of the data
iris.data.shape

(150, 4)

In [106]:
X = iris.data  # Feature matrix X
y = iris.target  # Target vector y

In [107]:
# Create a binary target vector
y_binary = np.where(y == 0, 1, 0)
y_binary

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [117]:
#Step 3: Split the dataset
#Split the dataset into training and testing sets using train_test_split() from sklearn.model_selection. Use 80% of the data for training and 20% for testing. Set the random_state to ensure reproducibility.

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y_binary, test_size=0.2, random_state=42)

In [1]:
#The binary cross-entropy loss, also known as log loss, is calculated using the following formula:

#\[
#L(y, \hat{y}) = -[y \cdot \log(\hat{y}) + (1 - y) \cdot \log(1 - \hat{y})]
#\]

#Where:
#- \(y\) is the true binary label (0 or 1).
#- \(\hat{y}\) is the predicted probability of class 1.

#To calculate the average binary cross-entropy loss over a set of samples, you can use the mean:

#\[
#\text{Binary Cross-Entropy Loss} = \frac{1}{N} \sum_{i=1}^{N} L(y_i, \hat{y}_i)
#\]

#Where \(N\) is the number of samples.


In [110]:
#Step 5: Define the training function
def train_logistic_regression(X, y, learning_rate, num_iterations, regularization_param):
    # Initialize weights and bias
    W = np.zeros(X.shape[1])
    b = 0

    # Gradient Descent
    for i in range(num_iterations):
        # Compute predictions
        z = np.dot(X, W) + b
        y_pred = 1 / (1 + np.exp(-z))

        # Compute gradients
        gradient_W = (1 / len(y)) * np.dot(X.T, (y_pred - y)) + (regularization_param / len(y)) * W
        gradient_b = (1 / len(y)) * np.sum(y_pred - y)

        # Update weights and bias
        W -= learning_rate * gradient_W
        b -= learning_rate * gradient_b

    return W, b


In [111]:
#Step 6: Train the model
#Use the training function to train your logistic regression model on the training data. Obtain the weight vector W and bias term b
# Define hyperparameters
learning_rate = 0.01
num_iterations = 1000
regularization_param = 0.1

# Train the logistic regression model
W, b = train_logistic_regression(X_train, y_train, learning_rate, num_iterations, regularization_param)


In [112]:
#Step 7: Define the prediction function

#Implement a prediction function that takes input data and the trained model's weights and bias. The prediction function should use the logistic sigmoid function to make binary predictions (0 or 1).

def predict(X, W, b):
    z = np.dot(X, W) + b
    y_pred = 1 / (1 + np.exp(-z))
    return np.round(y_pred)


In [113]:
#Step 8: Predict on the test set

#Use the prediction function to predict the classes for the test set X_test using the obtained weights and bias.

# Make predictions on the test set
y_pred_test = predict(X_test, W, b)
y_pred_test

array([0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1., 1., 0., 0.,
       0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 1., 1.])

Step 9: Evaluate the model's performance

Calculate the accuracy of your model using accuracy_score() from sklearn.metrics. Generate the confusion matrix using confusion_matrix() from sklearn.metrics. Generate the classification report using classification_report() from sklearn.metrics. Print out the accuracy, confusion matrix, and classification report to evaluate your model's performance. Make sure to comment your code and provide explanations for each step. This assignment will help you understand the basics of binary classification, logistic regression, and how to evaluate the performance of your model using various metrics.

In [114]:
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred_test)

# Confusion matrix Generation
confusion = confusion_matrix(y_test, y_pred_test)

# Generate classification report
report = classification_report(y_test, y_pred_test)

# Print out the results
print(f"Accuracy: {accuracy}")
print("Confusion Matrix:\n", confusion)
print("Classification Report:\n", report)

Accuracy: 1.0
Confusion Matrix:
 [[20  0]
 [ 0 10]]
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        20
           1       1.00      1.00      1.00        10

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



In [115]:
# View the Iris data with pandas
iris = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [116]:
#iris.info()