# Lecture 27 – Data 100, Fall 2024

Data 100, Fall 2024

[Acknowledgments Page](https://ds100.org/fa24/acks/)

Neurons pass information from one to another using action potentials. They connect with one another at synapses, which are junctions between one neuron's axon and another's dendrite. Information flows from:

1.  The dendrites,
2.  To the cell body,
3.  Through the axons,
4.  To a synapse connecting the axon to the dendrite of the next neuron.

<img src="https://www.cs.toronto.edu/~lczhang/360/lec/w02/imgs/neuron.png" width="400">

An Artificial Neuron is a mathematical function with the following elements:


1.   Input
2.   Weighted summation of inputs
3.   Processing unit of activation function
4.   Output

<img src="https://www.cs.toronto.edu/~lczhang/360/lec/w02/imgs/neuron_model.jpeg" width="500">

The mathematical equation for an artificial is as follows:

\begin{align}
        \hat{y} = f(\vec{\mathbf{\theta}} \cdot \vec{\mathbf{x}}) &= f(\sum_{i=0}^d \theta_i x_i) \\
        &= f(\theta_0 + \theta_1 x_1 + ... + \theta_dx_d).
\end{align}

Assuming that function $f$ is the logistic or sigmoid function, the output of the neuron has a probability value ($0 \leq p \leq 1$). This probability value can then be used for a binary classification task where $p < 0.5$ is an indication of class $0$, and $p \geq 0.5$ assigns data to class 1. Re-writing the equation above with a sigmoid activation function would give us the following:

\begin{align}
        \hat{y} = σ(\vec{\mathbf{\theta}} \cdot \vec{\mathbf{x}}) &= σ(\sum_{i=0}^d \theta_i x_i) \\
        &= σ(\theta_0 + \theta_1 x_1 + ... + \theta_dx_d).
\end{align}


The code below contains an implementation of AND, OR, and XOR gates. You will be able to generate data for each of the functions and add the desired noise level to the data. Familiarize yourself with the code and answer the following questions.

## Creating the toy dataset


We'll start by generating synthetic data for a logic gate (e.g., AND, OR, XOR) with Gaussian noise. This data will be used for training and testing the logistic regression model.
The function `generate_data_with_noise` allows customization of the number of samples, the logic gate, and the noise level.
            

In [1]:
import numpy as np
import pandas as pd

In [2]:
# Function to generate a dataset with multiple samples per gate location
def generate_data_with_noise(num_samples = 500, gate = "AND", noise_level = 0.05):
    """
    Generate multiple samples per logic gate configuration with added noise.
    """
    if gate == 'AND':
        base_X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
        base_y = np.array([0, 0, 0, 1])
    elif gate == 'OR':
        base_X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
        base_y = np.array([0, 1, 1, 1])
    elif gate == 'XOR':
        base_X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
        base_y = np.array([0, 1, 1, 0])
    else:
        raise ValueError("Gate must be 'AND', 'OR', or 'XOR'.")
    
    # Repeat each base configuration to create multiple samples
    X = np.repeat(base_X, num_samples // len(base_X), axis=0)
    y = np.repeat(base_y, num_samples // len(base_y), axis=0)
    
    # Add Gaussian noise to the inputs
    X = X + np.random.normal(0, noise_level, X.shape)
    
    # Shuffle the dataset to avoid ordered samples
    indices = np.arange(X.shape[0])
    np.random.shuffle(indices)
    X = X[indices]
    y = y[indices]
    
    return X, y

In this lecture we will use interactive visualizations.  These require a python environment so if you are viewing this notebook through the static HTML version you won't be able to use the interactive features.

In [3]:
from ipywidgets import interact, FloatSlider, Dropdown, Checkbox
import plotly.express as px
import plotly.graph_objects as go


X, y = generate_data_with_noise(500, 'AND', 0.05)

# Make an interactive plot
data_fig = go.FigureWidget()
data_fig.add_trace(go.Scatter(x=X[y == 0, 0], y=X[y == 0, 1], mode='markers', marker=dict(color='red'), name='0'))   
data_fig.add_trace(go.Scatter(x=X[y == 1, 0], y=X[y == 1, 1], mode='markers', marker=dict(color='blue'), name='1'))  
data_fig.update_layout(width=800, height=500,
                       xaxis_range=[-1, 2], yaxis_range=[-1, 2])
# The following code defines a set of interactive widgets (sliders)
# and binds them to an update function that will be run whenever
# a slider is changed.
@interact(num_samples=FloatSlider(min=100, max=1000, step=100, value=500, description='Samples'),
          gate=Dropdown(options=['AND', 'OR', 'XOR'], value='AND', description='Gate'),
          noise_level=FloatSlider(min=0.0, max=1.0, step=0.01, value=0.05, description='Noise Level'))
def update_data_plot(num_samples, gate, noise_level):
    X, y = generate_data_with_noise(num_samples, gate, noise_level)
    with data_fig.batch_update():
        data_fig.data[0].x = X[y == 0, 0]
        data_fig.data[0].y = X[y == 0, 1]
        data_fig.data[1].x = X[y == 1, 0]
        data_fig.data[1].y = X[y == 1, 1]   
        data_fig.update_layout(title=f"Dataset for {gate} Gate with Noise Level {noise_level}")
data_fig

interactive(children=(FloatSlider(value=500.0, description='Samples', max=1000.0, min=100.0, step=100.0), Drop…

FigureWidget({
    'data': [{'marker': {'color': 'red'},
              'mode': 'markers',
              'name': '0',
              'type': 'scatter',
              'uid': '0b9744c5-3d28-4ea6-b251-7829bd74736e',
              'x': array([ 0.03250149,  0.01830668, -0.05692431, ..., -0.15004324,  0.07860124,
                           0.03399897]),
              'y': array([ 1.00093641,  0.94820927,  0.97140887, ...,  0.07575058, -0.00405588,
                          -0.0175853 ])},
             {'marker': {'color': 'blue'},
              'mode': 'markers',
              'name': '1',
              'type': 'scatter',
              'uid': 'c45e78ae-90a0-4267-8b1f-e0025c74bd5d',
              'x': array([0.98813952, 0.92049489, 0.97406054, 1.01440182, 0.98185308, 1.0113955 ,
                          1.02575418, 0.91457084, 0.91427198, 1.07061732, 0.9405913 , 1.05711115,
                          1.01023693, 0.95124406, 0.94780585, 1.00560696, 1.02100218, 1.09926252,
                       

---

## Logistic Regression Using Scikit-learn

This section demonstrates how to perform logistic regression using the scikit-learn library. The dataset is divided into training and testing subsets using an 80%-20% ratio with the train_test_split function. A logistic regression model is instantiated and trained on the training dataset using the `.fit()` method. The model's performance is evaluated on both the training and testing data using the `.score()` method, which computes accuracy. 


In [4]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Logistic Regression Using Scikit-learn
def perform_logistic_regression(X, y):
    # Split the data
    x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=140)
    
    # Train a logistic regression model
    model = LogisticRegression().fit(x_train, y_train)

    # Print training and testing scores
    train_error = model.score(x_train, y_train)
    test_error = model.score(x_test, y_test)
    
    return model, train_error, test_error

We can plot a decision boundary or even a decision surface by plotting predictions on a regular grid of points.  This is accomplished using the meshgrid function from numpy.  We can then use the model to predict the class of each point on the grid and plot the results.  This is a useful way to visualize the decision boundary of a classifier.

In [5]:
def plot_decision_boundary(model, xrange, yrange, num_points=100, probs=True):
    # Generate a grid of points
    xx, yy = np.meshgrid(np.linspace(xrange[0], xrange[1], num_points),
                         np.linspace(yrange[0], yrange[1], num_points))
    grid = np.c_[xx.ravel(), yy.ravel()]
    # Get predictions for the grid
    if probs:
        preds = model.predict_proba(grid)[:,1].reshape(xx.shape)
    else:
        preds = model.predict(grid).reshape(xx.shape)
    return go.Contour(x=xx[0], y=yy[:, 0], z=preds, colorscale=[[0, 'red'], [1, 'blue']], 
                      opacity = 0.5, showscale=False)

Again we create an interactive visualization plot:            

In [6]:
pred_fig = go.FigureWidget(data=data_fig.data, layout=data_fig.layout)

model, train_test, test_error = perform_logistic_regression(X, y)
boundary = plot_decision_boundary(model, [-1, 2], [-1, 2], probs=False)
pred_fig.add_trace(boundary)

@interact(num_samples=FloatSlider(min=100, max=1000, step=100, value=500, description='Samples'),
          gate=Dropdown(options=['AND', 'OR', 'XOR'], value='AND', description='Gate'),
          noise_level=FloatSlider(min=0.0, max=1.0, step=0.01, value=0.05, description='Noise Level'),
          show_probs=Checkbox(value=False, description='Show Probabilities'))
def update_pred_fig(num_samples, gate, noise_level, show_probs):
    np.random.seed(42)
    X, y = generate_data_with_noise(num_samples, gate, noise_level)
    model, train_error, test_error = perform_logistic_regression(X, y)
    with pred_fig.batch_update():
        pred_fig.data[0].x = X[y == 0, 0]
        pred_fig.data[0].y = X[y == 0, 1]
        pred_fig.data[1].x = X[y == 1, 0]
        pred_fig.data[1].y = X[y == 1, 1]   
        pred_fig.data[2].z = plot_decision_boundary(model, [-1, 2], [-1, 2], probs=show_probs).z
        pred_fig.update_layout(title=f"Predictions for {gate} Gate with Noise Level {noise_level} (Train: {train_error:.2f}, Test: {test_error:.2f})")

pred_fig

interactive(children=(FloatSlider(value=500.0, description='Samples', max=1000.0, min=100.0, step=100.0), Drop…

FigureWidget({
    'data': [{'marker': {'color': 'red'},
              'mode': 'markers',
              'name': '0',
              'type': 'scatter',
              'uid': '983af166-8c61-49d1-9721-bc533847cef6',
              'x': array([ 0.97386385, -0.03232864, -0.03008533, ...,  0.98181939,  0.91106399,
                           0.96942411]),
              'y': array([ 0.05245046,  0.9459226 ,  0.09261391, ..., -0.00284728,  0.07480222,
                          -0.07033305])},
             {'marker': {'color': 'blue'},
              'mode': 'markers',
              'name': '1',
              'type': 'scatter',
              'uid': 'f436aa5b-7507-4c2f-9d53-98100565e7cc',
              'x': array([0.96364314, 0.97224002, 0.94893836, 1.0246659 , 0.97411944, 1.03409457,
                          1.02798952, 0.93598478, 0.97767832, 1.01445843, 0.99065642, 1.00141592,
                          1.01335251, 1.00955495, 1.04691419, 1.07737526, 0.96936057, 0.96865165,
                       

---

## PyTorch

We can now try to repeat the modeling process using PyTorch.  PyTorch is a popular deep learning library that is widely used in the research community. It is a lower-level library than scikit-learn and requires more code to accomplish the same tasks. However, it is more flexible and can be used to build more complex models.
 

In [8]:
# PyTorch
import torch
# Neural Network Class in pyTorch
import torch.nn as nn 
# Optimizer Library in pyTorch (for SGD)
import torch.optim as optim

### Step 0: Working with Data in Pytorch

At the core of PyTorch is the `torch.Tensor` class.  This class is similar to `numpy` arrays but with some additional features. PyTorch tensors can be used to store data and perform operations on that data. Pytorch tensors can also be used to store gradients, which are used to update the parameters of a model during training.

To use PyTorch, we need to convert our data to tensors. We can do this using the `torch.tensor` function, or we can use the `torch.from_numpy` function to convert `numpy` arrays to tensors.

Notice that tensors are converted to type `float32`.  This is because PyTorch is built around `float32` instead of `float64` (the standard format for `numpy`).  This is because most of the math done on GPUs is in lower precision. The labels (`y_train`) are reshaped using `.unsqueeze(1)`, which adds an additional dimension to match the model's expected output shape.

In [9]:
def make_tensors(X, y):
    from torch.utils.data import random_split, TensorDataset
    data = TensorDataset(torch.tensor(X, dtype=torch.float32), 
                         torch.tensor(y, dtype=torch.float32).unsqueeze(1))
    torch.manual_seed(140)
    train_data, test_data = random_split(data, [0.8, 0.2]) 
    return train_data, test_data


### Step 1: Defining the Logistic Regression Model

The logistic regression model is implemented in PyTorch using the `LogisticRegressionModel` class, which inherits from `nn.Module`. In PyTorch, inheriting from `nn.Module` is essential, as it provides the necessary methods to manage layers and parameters within the model. In the `__init__` method, a single linear layer is defined using `nn.Linear(input_size, 1)`. This layer computes a weighted sum of the input features plus a bias term to form the mathematical basis of logistic regression. The `input_size` specifies the number of features in the dataset. In the `forward` method, the model performs the forward pass by applying the linear transformation followed by the sigmoid activation function (`torch.sigmoid`). The sigmoid activation ensures that the output values are in the range [0,1] so that the output is suitable for binary classification.

In [10]:
class LogisticRegressionModelA(nn.Module):
    def __init__(self, input_size):
        super().__init__()
        self.intercept = nn.Parameter(torch.tensor(1.0))
        self.w = nn.Parameter(torch.ones(input_size, 1))
        
    def forward(self, x):
        intercept = self.intercept
        w = self.w
        z = intercept + torch.matmul(x, w)
        return torch.sigmoid(z)

In [11]:
class LogisticRegressionModelB(nn.Module):
    def __init__(self, input_size):
        super().__init__()
        self.linear = nn.Linear(input_size, 1)
    
    def forward(self, x):
        return torch.sigmoid(self.linear(x))

In [12]:
model = LogisticRegressionModelB(2)

### Step 2: Defining the Loss

Just as with the rest of Data100 and machine learning, we need to define a loss function. For logistic regression, we typically use the binary cross-entropy loss which is a special case of the more general Cross-Entropy Loss. 

The loss function and optimizer are essential components for training a PyTorch model. In this implementation, the binary cross-entropy loss (`nn.BCELoss`) is used as the loss function. It measures the difference between the predicted probabilities and the true labels. 

In [13]:
loss_fn = nn.BCELoss()

### Step 3: Optimize the Loss

The optimizer used is stochastic gradient descent (`optim.SGD`). The optimizer updates the model's parameters during training to minimize the loss. It takes the model's parameters and the learning rate as inputs. We could actually try more advanced optimizers like Adam. Try uncommenting the Adam optimizer line.

The training process in PyTorch is handled within a loop that iterates over the dataset multiple epochs. For each epoch, we iterate over the training data in batches. For each batch, the following steps are performed:
- The gradients from the previous iteration are cleared using `optimizer.zero_grad()`.
- The forward pass is executed by passing the training data through the model, which computes predictions.
- The predictions are compared with the true labels using the loss function, and the loss is calculated.
- Backpropagation is performed using `loss.backward()`, which computes the gradients of the loss with respect to the model’s parameters.
- Finally, the optimizer updates the model’s parameters using these gradients through the `optimizer.step()` method.

In [14]:
def perform_logistic_regression_pytorch(train_dataset, 
                                        test_dataset,
                                        model, loss_fn, 
                                        batch_size=64,
                                        nepochs=20):
    
    from torch.utils.data import DataLoader
    # Create a dataloader for training
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
    # Define the optimizer (this is the update rule)
    optimizer = optim.SGD(model.parameters(), lr=0.5)
    # optimizer = optim.Adam(model.parameters(), lr=0.5)

    for epoch in range(nepochs):
        # Loop through all the batches
        for batch, (X, y) in enumerate(train_loader):
            # Zero the gradients to start the next step
            optimizer.zero_grad()
            # Compute prediction and loss
            pred = model(X)
            loss = loss_fn(pred, y)
            # Backpropagation (compute the gradient)
            loss.backward()
            # Update the parameters using the optimizer's update rule
            optimizer.step()
            

        # Evaluate the model on the test data
        # In practice, we often do this in batches too, since the data is too big to fit in memory
        with torch.no_grad():
            test_loss_sum = 0.0
            for X_test, y_test in test_loader:
                test_pred = model(X_test)
                test_loss = loss_fn(test_pred, y_test)
                test_loss_sum += test_loss.item()
            num_test_batches = len(test_loader)
            print(f"Epoch {epoch}, Loss: {loss.item()}, Test Loss: {test_loss_sum/num_test_batches}")

Let's run the optimizer!

In [15]:
train_dataset, test_dataset = make_tensors(X, y)
model = LogisticRegressionModelA(2)
perform_logistic_regression_pytorch(train_dataset, test_dataset, model, loss_fn)

Epoch 0, Loss: 0.6229490637779236, Test Loss: 0.551544725894928
Epoch 1, Loss: 0.549572229385376, Test Loss: 0.46463291347026825
Epoch 2, Loss: 0.4472154676914215, Test Loss: 0.4199940115213394
Epoch 3, Loss: 0.4101514220237732, Test Loss: 0.38414032757282257
Epoch 4, Loss: 0.3993256986141205, Test Loss: 0.3564014285802841
Epoch 5, Loss: 0.34731197357177734, Test Loss: 0.3294874280691147
Epoch 6, Loss: 0.27998536825180054, Test Loss: 0.3099343776702881
Epoch 7, Loss: 0.25412875413894653, Test Loss: 0.2927774488925934
Epoch 8, Loss: 0.2489190548658371, Test Loss: 0.2768632471561432
Epoch 9, Loss: 0.23951081931591034, Test Loss: 0.2623734176158905
Epoch 10, Loss: 0.24455997347831726, Test Loss: 0.2536010518670082
Epoch 11, Loss: 0.2500861883163452, Test Loss: 0.23901592940092087
Epoch 12, Loss: 0.218313530087471, Test Loss: 0.22828186303377151
Epoch 13, Loss: 0.23970897495746613, Test Loss: 0.21750976890325546
Epoch 14, Loss: 0.30330318212509155, Test Loss: 0.20764698833227158
Epoch 15, 


### Step 4: Crazy Interactive Visualization
After training, the decision boundary of the logistic regression model is visualized. A grid of points covering the feature space is created using `np.mgrid`. These grid points are passed through the trained model to predict probabilities. The predictions are reshaped into a format suitable for contour plotting. 

The `train_and_visualize_pytorch` function integrates the PyTorch logistic regression implementation with an interactive widget. The function dynamically updates the dataset and decision boundary based on the selected logic gate (`AND`, `OR`, `XOR`) and `noise_level`. 


In [16]:
def plot_decision_boundary_pytorch(model, xrange, yrange, num_points=100, probs=True):
    # Generate a grid of points
    xx, yy = torch.meshgrid(torch.linspace(xrange[0], xrange[1], num_points),
                            torch.linspace(yrange[0], yrange[1], num_points),
                            indexing='ij')
    grid = torch.cat([xx.reshape(-1, 1), yy.reshape(-1, 1)], dim=1)
    with torch.no_grad():
        # Get predictions for the grid
        if probs:
            preds = model(grid).reshape(xx.shape)
        else:
            preds = (model(grid) > 0.5).float().reshape(xx.shape)
    return go.Contour(x=xx[:, 0], y=yy[0], z=preds, colorscale=[[0, 'red'], [1, 'blue']], 
                      opacity = 0.5, showscale=False)

In [17]:
# Interactive Widget for Decision Boundary Visualization
pred_fig = go.FigureWidget(data=data_fig.data, layout=data_fig.layout)
model_type = LogisticRegressionModelA

train_dataset, test_dataset = make_tensors(X, y)
model = model_type(2)
perform_logistic_regression_pytorch(train_dataset, test_dataset, model, loss_fn)
boundary = plot_decision_boundary_pytorch(model, [-1, 2], [-1, 2], probs=False)
pred_fig.add_trace(boundary)

display(pred_fig)

@interact(num_samples=FloatSlider(min=100, max=1000, step=100, value=500, description='Samples'),
          gate=Dropdown(options=['AND', 'OR', 'XOR'], value='AND', description='Gate'),
          noise_level=FloatSlider(min=0.0, max=1.0, step=0.01, value=0.05, description='Noise Level'),
          show_probs=Checkbox(value=False, description='Show Probabilities'))
def update_pred_fig(num_samples, gate, noise_level, show_probs):
    np.random.seed(42)
    X, y = generate_data_with_noise(num_samples, gate, noise_level)
    train_dataset, test_dataset = make_tensors(X, y)
    model = model_type(2)
    perform_logistic_regression_pytorch(train_dataset, test_dataset, model, loss_fn)
    boundary = plot_decision_boundary_pytorch(model, [-1, 2], [-1, 2], probs=show_probs)
    with pred_fig.batch_update():
        pred_fig.data[0].x = X[y == 0, 0]
        pred_fig.data[0].y = X[y == 0, 1]
        pred_fig.data[1].x = X[y == 1, 0]
        pred_fig.data[1].y = X[y == 1, 1]   
        pred_fig.data[2].z = boundary.z

Epoch 0, Loss: 0.6229490637779236, Test Loss: 0.551544725894928
Epoch 1, Loss: 0.549572229385376, Test Loss: 0.46463291347026825
Epoch 2, Loss: 0.4472154676914215, Test Loss: 0.4199940115213394
Epoch 3, Loss: 0.4101514220237732, Test Loss: 0.38414032757282257
Epoch 4, Loss: 0.3993256986141205, Test Loss: 0.3564014285802841
Epoch 5, Loss: 0.34731197357177734, Test Loss: 0.3294874280691147
Epoch 6, Loss: 0.27998536825180054, Test Loss: 0.3099343776702881
Epoch 7, Loss: 0.25412875413894653, Test Loss: 0.2927774488925934
Epoch 8, Loss: 0.2489190548658371, Test Loss: 0.2768632471561432
Epoch 9, Loss: 0.23951081931591034, Test Loss: 0.2623734176158905
Epoch 10, Loss: 0.24455997347831726, Test Loss: 0.2536010518670082
Epoch 11, Loss: 0.2500861883163452, Test Loss: 0.23901592940092087
Epoch 12, Loss: 0.218313530087471, Test Loss: 0.22828186303377151
Epoch 13, Loss: 0.23970897495746613, Test Loss: 0.21750976890325546
Epoch 14, Loss: 0.30330318212509155, Test Loss: 0.20764698833227158
Epoch 15, 

FigureWidget({
    'data': [{'marker': {'color': 'red'},
              'mode': 'markers',
              'name': '0',
              'type': 'scatter',
              'uid': '74a487bc-c745-4e8d-a4cf-3f26b60d881f',
              'x': array([ 0.03250149,  0.01830668, -0.05692431, ..., -0.15004324,  0.07860124,
                           0.03399897]),
              'y': array([ 1.00093641,  0.94820927,  0.97140887, ...,  0.07575058, -0.00405588,
                          -0.0175853 ])},
             {'marker': {'color': 'blue'},
              'mode': 'markers',
              'name': '1',
              'type': 'scatter',
              'uid': '8444aba5-9730-4655-8339-04406115bb8f',
              'x': array([0.98813952, 0.92049489, 0.97406054, 1.01440182, 0.98185308, 1.0113955 ,
                          1.02575418, 0.91457084, 0.91427198, 1.07061732, 0.9405913 , 1.05711115,
                          1.01023693, 0.95124406, 0.94780585, 1.00560696, 1.02100218, 1.09926252,
                       

interactive(children=(FloatSlider(value=500.0, description='Samples', max=1000.0, min=100.0, step=100.0), Drop…