SYSEN 5888 Spring 2026

Jonathan Lloyd

Homework 1, Question 1


Goal: Build and train an artificial neural network for a binary classification task. 

Tools: Pytorch

Data: The input data and their corresponding binary labels are provided in the data file, hw1data.dat
The input data contains 1000 two-dimensional data points that lie within a square of area one. The input data and labels should be loaded by reading the data file using any choice of library. Please note that for binary classification, the -1/1 labels need to be converted into 0/1 labels.

In [5]:
# Import necessary libraries
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim

# Import data file
hw1data = np.loadtxt("../HW_01_ANN/hw1data.dat")

# Quick look at the data
print("Shape:", hw1data.shape)
print("First 5 rows:")
print(hw1data[:5])

# Convert data labels -1/1 to 0/1
# if data label in column 3 (index 2) is -1, reset to 0
hw1data[:, 2] = np.where(hw1data[:, 2] == -1, 0, hw1data[:, 2])
print("First 5 rows relabeled:")
print(hw1data[:5])

Shape: (1000, 3)
First 5 rows:
[[ 0.73662472  0.50544176 -1.        ]
 [ 0.71066494  0.56503663 -1.        ]
 [ 0.10533493  0.06889585  1.        ]
 [ 0.95860447  0.16390308  1.        ]
 [ 0.42369288  0.51051878 -1.        ]]
First 5 rows relabeled:
[[0.73662472 0.50544176 0.        ]
 [0.71066494 0.56503663 0.        ]
 [0.10533493 0.06889585 1.        ]
 [0.95860447 0.16390308 1.        ]
 [0.42369288 0.51051878 0.        ]]


Architecture: Define a Sequential model, wherein the layers are stacked sequentially and each layer has exactly one input tensor and one output tensor. Please build an artificial neural network by adding the following layers to the Sequential model using the configuration below.
- Input - Shape 2
- Dense - Units 5
- Dense - Units 1 - Activation Sigmoid

The initial random weights of layers can be defined by specifying weight and bias initializers. For each of the above layers, initialize the kernel weights from a Xavier/Glorot uniform distribution and set the random seed to 99. Additionally, initialize the bias vector as a zero vector. The activation function defines the node output given a set of inputs. An appropriate choice of activation function is required to allow the artificial neural network to learn a non-linear pattern. The activation functions for the first dense layer can be chosen from some of the commonly used activation functions like Rectified Linear Unit (ReLU), Hyperbolic Tangent (tanh), and Sigmoid.

In [None]:
# Define Model Class
# Architecture: Input(2) -> Dense(5) -> [activation] -> Dense(1) -> Sigmoid -> Output
class Q1SequentialModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, input_activations='relu'): # Default ReLU
        super(Q1SequentialModel, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        activations_options = {'relu': nn.ReLU(), 'tanh': nn.Tanh(), 'sigmoid': nn.Sigmoid()}
        self.input_act = activations_options[input_activations]
        self.fc2 = nn.Linear(hidden_size, output_size)
        self.output_activation = nn.Sigmoid()

    def forward(self, x):
        x = self.fc1(x)
        x = self.input_act(x)
        x = self.fc2(x)
        x = self.output_act(x)
        return x

    def _init_weights_biases(self):
        # Initialize kernel weights with Xavier/Glorot uniform, Seed=99
        torch.manual_seed(99)
        for layer in [self.fc1, self.fc2]:
            nn.init.xavier_uniform_(layer.weight) 
            # Initialize biases with zeros
            nn.init.zeros_(layer.bias) 


Training: The model is compiled by specifying the optimizer, the loss function and metrics to be recorded at each step of the training process. For binary classification, it is a common practice to use binary cross-entropy as loss function. Popular deep learning libraries provide support for several optimization algorithms. Some of them are Stochastic gradient descent (SGD), RMSprop, ADAM. Please choose accuracy as a metric during model compilation. Finally, train the artificial neural network by fitting the input data and labels with each of the aforementioned optimizers and their respective configuration as given in the table below. The neural network should be trained until convergence is achieved.

Deliverables: Please report the training accuracy after the training process is carried out for *every combination* of activation function and optimizer.
Plot the loss curves to determine the number of epochs required to achieve convergence.
Report the hyperparameter tuning step.
Predict and report the binary classification results for the data point [0.8, 0.2] with the trained artificial neural network.
Discuss the influence of particular parameters on different optimizers.
It is recommended that the final results be reported in a tabular format as shown below. Please also make sure to submit your working code files along with the final results.

<table>
<thead><tr><th>Optimizer</th><th>Activation function</th><th>Required epochs</th><th>Training accuracy (%)</th><th>Prediction for [0.8, 0.2]</th></tr></thead>
<tbody>
<tr><td rowspan="3">SGD<br>(Learning rate = 0.01, Momentum = [0.0, 0.1, 0.5, 0.9], discuss the impact of momentum values on the convergence behavior of the SGD optimizer)</td><td>ReLU</td><td></td><td></td><td></td></tr>
<tr><td>Tanh</td><td></td><td></td><td></td></tr>
<tr><td>Sigmoid</td><td></td><td></td><td></td></tr>
<tr><td rowspan="3">RMSprop<br>(Learning rate = [0.0001, 0.001, 0.01], discuss the effect of learning rates on learning curves, Epsilon = 10^-6)</td><td>ReLU</td><td></td><td></td><td></td></tr>
<tr><td>Tanh</td><td></td><td></td><td></td></tr>
<tr><td>Sigmoid</td><td></td><td></td><td></td></tr>
<tr><td rowspan="3">ADAM<br>(β₁=[0.85, 0.9], β₂=[0.95, 0.99], discuss the functions of the parameters β₁ and β₂)</td><td>ReLU</td><td></td><td></td><td></td></tr>
<tr><td>Tanh</td><td></td><td></td><td></td></tr>
<tr><td>Sigmoid</td><td></td><td></td><td></td></tr>
</tbody>
</table>


In [7]:
# Useful across models
activation_function_selector = ['relu', 'tanh', 'sigmoid']
loss_function = nn.CrossEntropyLoss()

# Helper function to plot loss curve
def plot_loss(curve, act, opt, lr, mom, eps, beta1, beta2):
    plt.plot(curve)
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    if opt == "SGD":
        plt.title(f"Binary Cross Entropy Loss Curve. Activation {act}, Optimizer {opt}, Learning Rate {lr}, Momentum {mom}")
    elif opt == "RMSprop":
        plt.title(f"Binary Cross Entropy Loss Curve. Activation {act}, Optimizer {opt}, Learning Rate {lr}, Epsilon {eps}")
    elif opt == "ADAM":
        plt.title(f"Binary Cross Entropy Loss Curve. Activation {act}, Optimizer {opt}, β_1 {beta1}, β_2 {beta2}")
    else:
        print("Optimizer not recognized")
    # Ensure the save folder exists
    output_dir = "Plot JPGs"
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    # Compose the filename based on optimizer and parameters
    if opt == "SGD":
        filename = f"{output_dir}/LossCurve_{opt}_{act}_LR{lr}_Momentum{mom}.jpg"
    elif opt == "RMSprop":
        filename = f"{output_dir}/LossCurve_{opt}_{act}_LR{lr}_Eps{eps}.jpg"
    elif opt == "ADAM":
        filename = f"{output_dir}/LossCurve_{opt}_{act}_B1{beta1}_B2{beta2}.jpg"
    else:
        filename = f"{output_dir}/LossCurve_{opt}_{act}.jpg"
    plt.savefig(filename)
    plt.close()

In [None]:
# Test training sequence
sgdlr = 0.01
sgdmom = 0.0
testmodel = Q1SequentialModel(2, 5, 1, activation_function_selector[0])
testmodel._init_weights_biases
sgdoptimizer = optim.SGD(testmodel.parameters(), sgdlr, sgdmom)
train(testmodel, hw1data, loss_function, sgdoptimizer)
# accuracy
# prediction
# plot
# add line to dataframe

In [None]:
# SGD Optimizer
# Learning rate = 0.01
# Momentum = [0.0, 0.1, 0.5, 0.9]
# Activation function = [ReLU, Tanh, Sigmoid]
# For each run, report required epochs, training accuracy, and prediction for [0.8, 0.2]

# Define model settings 
sgd_learning_rate = 0.01
sgd_momentum_selector = [0.0, 0.1, 0.5, 0.9]


# Training Loops
for i in activation_function_selector:

    # Initialize model, weights, biases
    model_i = Q1SequentialModel(2, 5, 1, activation_function_selector[i])
    model_i._init_weights_biases()

    # Optimize using SGD with learning rate and momentum parameters
    for j in sgd_momentum_selector: 
        SGD_i_j = optim.SGD(model_i.parameters(), sgd_learning_rate, sgd_momentum_selector[j])
        train(model_i, hw1data, loss_function, SGD_i_j)

        # Save plot as jpg

        # Training Accuracy

        # Predict classification for [0.8, 0.2]

        # Save results to dataframe








In [None]:
# RMSprop Optimizer 
# Learning rates (Learning rate = [0.0001, 0.001, 0.01]
# Epsilon = 10E-6
# Activation function = [ReLU, Tanh, Sigmoid]
# For each run, report required epochs, training accuracy, and prediction for [0.8, 0.2]

# Define model settings
rms_learning_rate_selector = [0.0001, 0.001, 0.01]
epsilon = 10e-6


# Save results to dataframe


In [None]:
# ADAM Optimizer
# beta_1 = [0.85, 0.9] 
# beta_2 = [0.95, 0.99] 
# Activation function = [ReLU, Tanh, Sigmoid]
# For each run, report required epochs, training accuracy, and prediction for [0.8, 0.2]

# Define model settings
beta_1 = [0.85, 0.9] 
beta_2 = [0.95, 0.99] 


# Save results to dataframe

In [None]:
# Plot Loss Curves



In [None]:
# Predict and Report Results Table

# Combine dataframes and clean up 

Discussion Section
