# **CSCE 5218 / CSCE 4930 Deep Learning**

## **The Perceptron Assignment**

## **Question 1**
### Why don't we have the following code snippet instead?
```python
output = predict(weights, instance)
error = instance[-1] - output
```
### **Answer:**
Using `predict(weights, instance)` directly returns a binary output (0 or 1) since it applies a threshold at 0.5, whereas `output = sigmoid(in_value)` provides a continuous value between 0 and 1. This is important for training since the perceptron uses gradient descent, which requires a differentiable function (the sigmoid function). If we used `predict`, the updates would be less precise, and learning would not be as smooth.

## **Question 2**
### Training the Perceptron with different hyperparameters

In [None]:

import math
import re

# Read dataset
def read_data(file_name):
    with open(file_name, 'r') as f:
        data = []
        f.readline()  # Skip header line
        for instance in f.readlines():
            if not re.search(r'\t', instance):
                continue
            instance = list(map(int, instance.strip().split('\t')))
            instance = [-1] + instance  # Add bias term
            data.append(instance)
    return data

# Perceptron helper functions
def dot_product(array1, array2):
    return sum(a * b for a, b in zip(array1, array2))

def sigmoid(x):
    return 1 / (1 + math.exp(-x))

def output(weights, instance):
    return sigmoid(dot_product(weights, instance))

def predict(weights, instance):
    return 1 if output(weights, instance) >= 0.5 else 0

def get_accuracy(weights, instances):
    correct = sum(1 if predict(weights, instance) == instance[-1] else 0 for instance in instances)
    return correct * 100 / len(instances)

def train_perceptron(instances, lr, epochs):
    weights = [0] * (len(instances[0]) - 1)
    for _ in range(epochs):
        for instance in instances:
            in_value = dot_product(weights, instance)
            out = sigmoid(in_value)
            error = instance[-1] - out
            for i in range(len(weights)):
                weights[i] += lr * error * out * (1 - out) * instance[i]
    return weights

# Load dataset
train_data = read_data("train.txt")
test_data = read_data("test_small.txt")

# Hyperparameters
tr_percent = [5, 10, 25, 50, 75, 100]
num_epochs = [5, 10, 20, 50, 100]
lr_values = [0.005, 0.01, 0.05]

# Train perceptron and store results
results = []
for tr_p in tr_percent:
    tr_size = int(len(train_data) * (tr_p / 100))
    train_subset = train_data[:tr_size]
    for epochs in num_epochs:
        for lr in lr_values:
            weights = train_perceptron(train_subset, lr, epochs)
            accuracy = get_accuracy(weights, test_data)
            results.append(f"# tr: {tr_size}, epochs: {epochs}, learning rate: {lr:.3f}; Accuracy (test, {len(test_data)} instances): {accuracy:.1f}")

# Display results
for res in results:
    print(res)


## **Question 3**
### Interpretation of results

In [None]:

import pandas as pd
import matplotlib.pyplot as plt

# Convert results to DataFrame
data = []
for res in results:
    parts = res.replace("# tr:", "").replace("epochs:", "").replace("learning rate:", "").replace("Accuracy (test, 14 instances):", "").split(";")
    tr_size, epochs, lr, accuracy = map(str.strip, parts)
    data.append([int(tr_size), int(epochs), float(lr), float(accuracy)])

df = pd.DataFrame(data, columns=["Training Size", "Epochs", "Learning Rate", "Accuracy"])

# Plot accuracy trends
plt.figure(figsize=(10, 6))
for lr in df["Learning Rate"].unique():
    subset = df[df["Learning Rate"] == lr]
    plt.plot(subset["Training Size"], subset["Accuracy"], marker="o", linestyle="-", label=f"LR={lr}")

plt.xlabel("Training Size")
plt.ylabel("Accuracy (%)")
plt.title("Accuracy vs. Training Size for Different Learning Rates")
plt.legend()
plt.grid(True)
plt.show()


### **Analysis and Answers**

- **A. Do you need to train with all the training dataset to get the highest accuracy with the test dataset?**  
  Not necessarily. In some cases, a subset of the data provides similar accuracy. Adding too much data may introduce noise.

- **B. Why does training the second run obtain worse accuracy than the first one, even with more data?**  
  This could be due to overfitting or suboptimal learning rates. More data does not always mean better generalization.

- **C. Can you get higher accuracy with additional hyperparameters (higher than 80.0)?**  
  Yes, adjusting learning rates or using different activation functions might improve performance.

- **D. Is it always worth training for more epochs?**  
  No. Sometimes more epochs lead to overfitting, decreasing generalization on test data.