# Logistic Regression 

**A note on this document**
This document is known as a Jupyter notebook; it allows text and executable code to coexist in a very easy-to-read format. Blocks can contain text or executable code. For blocks containing code, press `Shift + Enter`, `Ctrl+Enter`, or click the arrow on the block to run the code. Earlier blocks of code need to be run for the later blocks of code to work.

In our lesson, we delved into the concept of logistic regression for binary classification. The learning rule can be expressed as follows:
$$ \mathbf{w} \leftarrow \mathbf{w} -\alpha \nabla \mathcal{L}(\mathbf{w}) $$


Here, $\nabla \mathcal{L}(\mathbf{w})$ is defined as:
$$\nabla \mathcal{L}(\mathbf{w}) = \sum_{n=1}^N e_n \mathbf{x}_n$$

In this context, $e_n$ is calculated as $t_n - \hat{t}_n$, where the prediction $\hat{t}_n$ is determined by:

$$\hat{t}_n = \sigma(\mathbf{w}^\top \mathbf{x}_n) = \sigma(w_0 + w_1x_{n1} + w_2x_{n2} + \cdots + w_Kx_{nK})$$

The sigmoid function $\sigma(\cdot)$, defined as:

$$ \sigma(\gamma) = \frac{1}{1+e^{-\gamma}}$$

is used to calculate $\hat{t}_n$. 

It's worth noting that in the case of linear regression, the prediction was given simply by:

$$\hat{t}_n = \mathbf{w}^\top \mathbf{x}_n = w_0 + w_1x_{n1} + w_2x_{n2} + \cdots + w_Kx_{nK}$$

This comparison helps illustrate the difference between logistic regression and linear regression in terms of their prediction functions.


Now, let's represent the predictions vector for logistic regression, denoted as $\hat{\mathbf{t}}$, in a concise manner:


\begin{equation*}
\hat{\mathbf{t}} = 
\begin{bmatrix}
\hat{t}_1 \\
\vdots \\
\hat{t}_N
\end{bmatrix}
= 
\begin{bmatrix}
 \sigma(w_0 + w_1x_{11} + \cdots + w_K x_{1K}) \\ 
 \vdots  \\ 
 \sigma(w_0 + w_1x_{N1} + \cdots + w_Kx_{NK})
\end{bmatrix} 
\end{equation*}


In a more compact notation, we can express $\hat{\mathbf{t}}$ as:
\begin{equation*}
\hat{\mathbf{t}} = \sigma \left( \begin{bmatrix} 1 & x_{11} & \cdots & x_{1K} \\ 1 & \vdots & \ddots & \vdots \\ 1 & x_{N1} & \cdots & x_{NK} \end{bmatrix} \begin{bmatrix} w_0 \\ \vdots  \\ w_K \end{bmatrix} \right) = \sigma(X\mathbf{w})
\end{equation*}

This compact notation allows you to implement it in Python as:

`predictions = sigmoid(np.dot(X, w))  # or sigmoid(X@w) `  

In this expression, `X` represents the feature matrix with dimensions $N\times(K+1)$, `w` is the weight vector of size $(K+1)$, and `sigmoid` is the _sigmoid_ function applied element-wise to the result of the dot product. This calculation efficiently computes the predictions for logistic regression in Python.


Next, we'll define a sigmoid function in Python and perform a test. This function will be essential for implementing logistic regression later in this lab.

## Deliverable 1

Complete the `sigmoid` function below.

In [None]:
import numpy as np
import matplotlib.pyplot as plt


# sigmoid (logistic) function
def sigmoid(x):
    """
    Write your code here for Lab3
    """
    return 0


x = np.linspace(-6, 6, 100)
sig = sigmoid(x)

fig = plt.figure(figsize=(5, 3))

plt.plot(x, sig, "b")
plt.grid(True)
plt.xlabel("x")
plt.ylabel("sigmoid")
plt.legend()

We can employ a 2-dimensional sigmoid function for two-dimensional data, like when dealing with $t = w_1x_1 + w_2x_2$. To gain a better understanding of this 2D sigmoid function, we can create a surface plot for visualization.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D


# Create a grid of points in the x and y dimensions
x = np.linspace(-10, 10, 100)  # Adjust the range and granularity as needed
y = np.linspace(-10, 10, 100)
X, Y = np.meshgrid(x, y)

# Compute the sigmoid values for the grid
Z = sigmoid(X + Y)

# Create a 3D surface plot
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection="3d")
ax.plot_surface(X, Y, Z, cmap="viridis")

# Add labels and a title
ax.set_xlabel("X")
ax.set_ylabel("Y")
ax.set_zlabel("Sigmoid Output")
ax.set_title("Two-Dimensional Sigmoid Function (Surface Plot)")

plt.show()

Here is a simple example of a dataset that we can use to test logistic regression. This dataset represents a binary classification problem where we want to predict whether a student will pass (1) or fail (0) an exam based on the number of hours they studied and the number of hours they slept 😁:

We can use this dataset to perform binary classification using logistic regression. The goal is to build a model that predicts whether a student will pass the exam based on the hours they studied and slept. We can split this dataset into a training set and a testing set to evaluate your logistic regression model's performance, but we will not evaluate the performance in this lab. 

Please read the code below carefully. When it comes to your final project, you will be creating everything from the ground up.

In [None]:
import pandas as pd


def read_exam_data():
    # Specify the file name
    file_name = "./data/exam_data.csv"

    # Read the CSV file into a DataFrame
    return pd.read_csv(file_name)


exam_df = read_exam_data()
print(exam_df)

# Separate the data into Passed (Y=1) and Failed (Y=0)
passed = exam_df[exam_df["Passed Exam (Y)"] == 1]
failed = exam_df[exam_df["Passed Exam (Y)"] == 0]

# Create a scatter plot
plt.figure(figsize=(7, 4))
plt.scatter(
    passed["Hours Studied (X1)"],
    passed["Hours Slept (X2)"],
    label="Passed",
    color="r",
    marker="o",
    facecolors="none",
)

plt.scatter(
    failed["Hours Studied (X1)"],
    failed["Hours Slept (X2)"],
    label="Failed",
    color="b",
    marker="o",
    facecolors="none",
)
plt.xlabel("Hours Studied (X1)")
plt.ylabel("Hours Slept (X2)")
plt.title("Exam Data Scatter Plot")
plt.legend()
plt.grid(True)
plt.show()

In [None]:
# Create a scatter plot
plt.figure(figsize=(4, 3))
plt.scatter(
    exam_df["Hours Studied (X1)"],
    exam_df["Passed Exam (Y)"],
    color="b",
    marker="o",
    facecolors="none",
)
plt.xlabel("Hours Studied (X1)")
plt.ylabel("Passed Exam (Y)")
plt.grid(True)
plt.show()

# Create a scatter plot
plt.figure(figsize=(4, 3))
plt.scatter(
    exam_df["Hours Slept (X2)"],
    exam_df["Passed Exam (Y)"],
    color="r",
    marker="o",
    facecolors="none",
)
plt.xlabel("Hours Slept (X2)")
plt.ylabel("Passed Exam (Y)")
plt.grid(True)
plt.show()

## Deliverable 2

Implement the `gradient_descent` function provided below. This gradient descent function is intended for logistic regression and closely resembles the one used for linear regression.

In [None]:
# Define the logistic loss (cross-entropy) function
def logistic_loss(X, y, theta):
    m = len(y)
    h = sigmoid(np.dot(X, theta))
    loss = -(1 / m) * np.sum(y * np.log(h) + (1 - y) * np.log(1 - h))
    return loss


# Define gradient descent function
def gradient_descent(X, t, learning_rate, num_iterations):
    # Initialize model parameters with zeros
    num_samples, num_features = X.shape
    w = np.zeros(num_features)
    losses = []

    # TODO: Perform gradient descent
    for _ in range(num_iterations):
        """
        Write your code here
        """


        print(w)

    return w, losses

In [None]:
x1 = exam_df["Hours Studied (X1)"]
x2 = exam_df["Hours Slept (X2)"]
t = exam_df["Passed Exam (Y)"]

X = np.column_stack([np.ones_like(x1), x1, x2])
print(X.shape)
print(t.shape)
learning_rate = 0.65
num_iterations = 50000

w, losses = gradient_descent(X, t, learning_rate, num_iterations)

print(w)

Use the code below to evaluate your parameters.

In [None]:
predictions = np.where(X @ w > 0.5, 1, 0)
print(np.column_stack([t, predictions]))
match = np.count_nonzero(t == predictions)

# The count of match values shoud be no less than 65
print(f"Count of match is {match} out {len(t)}")

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# TODO: Update the following line with the optimal parameters (weights)
w = np.array([0,0,0])

# Create a scatter plot
plt.figure(figsize=(7, 4))
plt.scatter(
    passed["Hours Studied (X1)"],
    passed["Hours Slept (X2)"],
    label="Passed",
    color="r",
    marker="o",
    facecolors="none",
)

plt.scatter(
    failed["Hours Studied (X1)"],
    failed["Hours Slept (X2)"],
    label="Failed",
    color="b",
    marker="o",
    facecolors="none",
)
plt.xlabel("Hours Studied (X1)")
plt.ylabel("Hours Slept (X2)")

# Decision boundary
x1_vals = np.linspace(min(x1), 4, 100)
x2_vals = -(w[0] + w[1] * x1_vals) / w[2]
plt.plot(x1_vals, x2_vals, color="green", label="Decision Boundary")

plt.legend()
plt.grid(True)
plt.title("Logistic Regression Decision Boundary")
plt.show()