# **CSCE 5218 / CSCE 4930 Deep Learning**

# **The Perceptron** (20 pt)


In [1]:
# Get the datasets
!!/usr/bin/curl --output test.dat https://raw.githubusercontent.com/huangyanann/CSCE5218/main/test_small.txt
!!/usr/bin/curl --output train.dat https://raw.githubusercontent.com/huangyanann/CSCE5218/main/train.txt


['  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current',
 '                                 Dload  Upload   Total   Spent    Left  Speed',
 '',
 '  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0',
 '100 11645  100 11645    0     0  27898      0 --:--:-- --:--:-- --:--:-- 27925']

In [2]:
# Take a peek at the datasets
!head train.dat
!head test.dat

A1	A2	A3	A4	A5	A6	A7	A8	A9	A10	A11	A12	A13	
1	1	0	0	0	0	0	0	1	1	0	0	1	0
0	0	1	1	0	1	1	0	0	0	0	0	1	0
0	1	0	1	1	0	1	0	1	1	1	0	1	1
0	0	1	0	0	1	0	1	0	1	1	1	1	0
0	1	0	0	0	0	0	1	1	1	1	1	1	0
0	1	1	1	0	0	0	1	0	1	1	0	1	1
0	1	1	0	0	0	1	0	0	0	0	0	1	0
0	0	0	1	1	0	1	1	1	0	0	0	1	0
0	0	0	0	0	0	1	0	1	0	1	0	1	0
X1	X2	X3
1	1	1	1
0	0	1	1
0	1	1	0
0	1	1	0
0	1	1	0
0	1	1	0
0	1	1	0
0	1	1	0
1	1	1	1


### Build the Perceptron Model

You will need to complete some of the function definitions below.  DO NOT import any other libraries to complete this.

In [3]:
import math
import itertools
import re


# Corpus reader, all columns but the last one are coordinates;
#   the last column is the label
def read_data(file_name):
    f = open(file_name, 'r')

    data = []
    # Discard header line
    f.readline()
    for instance in f.readlines():
        if not re.search('\t', instance): continue
        instance = list(map(int, instance.strip().split('\t')))
        # Add a dummy input so that w0 becomes the bias
        instance = [-1] + instance
        data += [instance]
    return data


def dot_product(array1, array2):
    #TODO: Return dot product of array 1 and array 2
    sum=0
    for a,b in zip(array1,array2):
        sum = sum + a*b
    return sum


def sigmoid(x):
    #TODO: Return outpout of sigmoid function on x
    return 1 / (1 + math.exp(-x))

# The output of the model, which for the perceptron is
# the sigmoid function applied to the dot product of
# the instance and the weights
def output(weight, instance):
    #TODO: return the output of the model
    return sigmoid(dot_product(weight, instance))

# Predict the label of an instance; this is the definition of the perceptron
# you should output 1 if the output is >= 0.5 else output 0
def predict(weights, instance):
    #TODO: return the prediction of the model
    return 1 if output(weights, instance) >= 0.5 else 0


# Accuracy = percent of correct predictions
def get_accuracy(weights, instances):
    # You do not to write code like this, but get used to it
    correct = sum([1 if predict(weights, instance) == instance[-1] else 0
                   for instance in instances])
    return correct * 100 / len(instances)


# Train a perceptron with instances and hyperparameters:
#       lr (learning rate)
#       epochs
# The implementation comes from the definition of the perceptron
#
# Training consists on fitting the parameters which are the weights
# that's the only thing training is responsible to fit
# (recall that w0 is the bias, and w1..wn are the weights for each coordinate)
#
# Hyperparameters (lr and epochs) are given to the training algorithm
# We are updating weights in the opposite direction of the gradient of the error,
# so with a "decent" lr we are guaranteed to reduce the error after each iteration.
def train_perceptron(instances, lr, epochs):

    #Initialising weights
    weights = [0] * (len(instances[0])-1)

    for _ in range(epochs):
        for instance in instances:
            #Error calculation by comparing output vs. true value
            in_value = dot_product(weights, instance)
            output = sigmoid(in_value)
            error = instance[-1] - output
            #Updating weights by minimizing error
            for i in range(0, len(weights)):
                weights[i] += lr * error * output * (1-output) * instance[i]

    return weights

## Run it

In [4]:
instances_tr = read_data("train.dat")
instances_te = read_data("test.dat")
lr = 0.005
epochs = 5
weights = train_perceptron(instances_tr, lr, epochs)
accuracy = get_accuracy(weights, instances_te)
print(f"#tr: {len(instances_tr):3}, epochs: {epochs:3}, learning rate: {lr:.3f}; "
      f"Accuracy (test, {len(instances_te)} instances): {accuracy:.1f}")

#tr: 400, epochs:   5, learning rate: 0.005; Accuracy (test, 14 instances): 71.4


## Questions

Answer the following questions. Include your implementation and the output for each question.



### Question 1

In `train_perceptron(instances, lr, epochs)`, we have the follosing code:
```
in_value = dot_product(weights, instance)
output = sigmoid(in_value)
error = instance[-1] - output
```

Why don't we have the following code snippet instead?
```
output = predict(weights, instance)
error = instance[-1] - output
```

#### TODO Add your answer here (text only)
Instead of using `predict(weights, instance)`, we use `sigmoid(in_value)` because the perceptron training relies on gradient-based updates. The sigmoid function provides us with a differentiable activation function that allows the perceptron to perform weight updates based on gradient descent. The `predict` function outputs are binary (0 or 1), which cannot be employed in gradient-based optimization since they do not provide a smooth function on which to calculate gradients. So Using `sigmoid(in_value)` will ensure that small weight updates are made in the correct direction, leading to effective learning.




### Question 2
Train the perceptron with the following hyperparameters and calculate the accuracy with the test dataset.

```
tr_percent = [5, 10, 25, 50, 75, 100] # percent of the training dataset to train with
num_epochs = [5, 10, 20, 50, 100]              # number of epochs
lr = [0.005, 0.01, 0.05]              # learning rate
```

TODO: Write your code below and include the output at the end of each training loop (NOT AFTER EACH EPOCH)
of your code.The output should look like the following:
```
# tr:  20, epochs:   5, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
# tr:  20, epochs:  10, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
# tr:  20, epochs:  20, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
[and so on for all the combinations]
```
You will get different results with different hyperparameters.

#### TODO Add your answer here (code and output in the format above)


In [13]:
instances_tr = read_data("train.dat")
instances_te = read_data("test.dat")
tr_percent = [5, 10, 25, 50, 75, 100] # percent of the training dataset to train with
num_epochs = [5, 10, 20, 50, 100]     # number of epochs
lr_array = [0.005, 0.01, 0.05]        # learning rate

for lr in lr_array:
  for tr_size in tr_percent:
    for epochs in num_epochs:
      size =  round(len(instances_tr)*tr_size/100)
      pre_instances = instances_tr[0:size]
      weights = train_perceptron(pre_instances, lr, epochs)
      accuracy = get_accuracy(weights, instances_te)
      print(f"#tr: {len(pre_instances):0}, epochs: {epochs:3}, learning rate: {lr:.3f}; "
            f"Accuracy (test, {len(instances_te)} instances): {accuracy:.1f}")

#tr: 20, epochs:   5, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 20, epochs:  10, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 20, epochs:  20, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 20, epochs:  50, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 20, epochs: 100, learning rate: 0.005; Accuracy (test, 14 instances): 85.7
#tr: 40, epochs:   5, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 40, epochs:  10, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 40, epochs:  20, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 40, epochs:  50, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 40, epochs: 100, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 100, epochs:   5, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 100, epochs:  10, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 100, epochs:  20, learning rate: 

### Question 3
Write a couple paragraphs interpreting the results with all the combinations of hyperparameters. Drawing a plot will probably help you make a point. In particular, answer the following:
- A. Do you need to train with all the training dataset to get the highest accuracy with the test dataset?
- B. How do you justify that training the second run obtains worse accuracy than the first one (despite the second one uses more training data)?
   ```
#tr: 100, epochs:  20, learning rate: 0.050; Accuracy (test, 100 instances): 71.0
#tr: 200, epochs:  20, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
```
- C. Can you get higher accuracy with additional hyperparameters (higher than `80.0`)?
- D. Is it always worth training for more epochs (while keeping all other hyperparameters fixed)?

#### TODO: Add your answer here (code and text)
**A.** No, to get the best accuracy we do not always require the whole training set to be used. For instance, 85.7% accuracy was achieved when 75% of the training set was used to train (tr = 20, epochs = 100, lr = 0.005), and therefore the use of 100% of the dataset is not always required for optimal performance. This shows that it can be generalized by a model on a portion of the dataset with an appropriate selection of hyperparameters.


**B.** In the above given instance, the second run was trained on twice the data but got lower accuracy. It can be due to two reasons:

Learning Rate Effect – The first model used a higher learning rate (0.05) compared to the second (0.005). Smaller learning rate results in smaller weight changes, which could result in slow convergence or even underfitting if the model is not able to learn well within the given number of epochs.

Data Quality and Variance – The additional training data for the second run could include more noise or conflicting information, making the model harder to generalize and hence decrease in accuracy by a bit.



In [11]:
# Additional hyperparameters
batch_size = [32, 64]  # Batch sizes to explore

# Example of iterating over hyperparameters with a regularization term
for lr in lr_array:
    for tr_size in tr_percent:
        for epochs in num_epochs:
                for batch in batch_size:
                    size = round(len(instances_tr) * tr_size / 100)
                    pre_instances = instances_tr[0:size]
                    # Modify the training to include regularization
                    weights = train_perceptron(pre_instances, lr, epochs)
                    accuracy = get_accuracy(weights, instances_te)
                    print(f"#tr: {len(pre_instances):0}, epochs: {epochs:3}, "
                          f"learning rate: {lr:.3f}, batch: {batch}; "
                          f"Accuracy (test, {len(instances_te)} instances): {accuracy:.1f}")

#tr: 20, epochs:   5, learning rate: 0.005, batch: 32; Accuracy (test, 14 instances): 71.4
#tr: 20, epochs:   5, learning rate: 0.005, batch: 64; Accuracy (test, 14 instances): 71.4
#tr: 20, epochs:  10, learning rate: 0.005, batch: 32; Accuracy (test, 14 instances): 71.4
#tr: 20, epochs:  10, learning rate: 0.005, batch: 64; Accuracy (test, 14 instances): 71.4
#tr: 20, epochs:  20, learning rate: 0.005, batch: 32; Accuracy (test, 14 instances): 71.4
#tr: 20, epochs:  20, learning rate: 0.005, batch: 64; Accuracy (test, 14 instances): 71.4
#tr: 20, epochs:  50, learning rate: 0.005, batch: 32; Accuracy (test, 14 instances): 71.4
#tr: 20, epochs:  50, learning rate: 0.005, batch: 64; Accuracy (test, 14 instances): 71.4
#tr: 20, epochs: 100, learning rate: 0.005, batch: 32; Accuracy (test, 14 instances): 85.7
#tr: 20, epochs: 100, learning rate: 0.005, batch: 64; Accuracy (test, 14 instances): 85.7
#tr: 40, epochs:   5, learning rate: 0.005, batch: 32; Accuracy (test, 14 instances): 71.4

**C.** In Question 2, we were already able to reach 85.7% accuracy with only the default set of hyperparameters alone, without including other ones like batch size.

This is an indication that perhaps the model might already be sensitive to the initial configuration or is already at its best with the default parameters. Including other hyperparameters like batch size or learning rate adjustments didn't contribute much to the outcome.

But in the more complex datasets or other model structures, hyperparameter tuning could lead to better generalization and performance. But in this case, the model seems to have reached a certain plateau in accuracy, and adding new hyperparameters does not seem to create a clear improvement.

In [14]:
instances_tr = read_data("train.dat")
instances_te = read_data("test.dat")
tr_percent = [5, 10, 25, 50, 75, 100] # percent of the training dataset to train with
num_epochs = [100, 200, 250, 300]     # number of epochs
lr_array = [0.005, 0.01, 0.05]        # learning rate

for lr in lr_array:
  for tr_size in tr_percent:
    for epochs in num_epochs:
      size =  round(len(instances_tr)*tr_size/100)
      pre_instances = instances_tr[0:size]
      weights = train_perceptron(pre_instances, lr, epochs)
      accuracy = get_accuracy(weights, instances_te)
      print(f"#tr: {len(pre_instances):0}, epochs: {epochs:3}, learning rate: {lr:.3f}; "
            f"Accuracy (test, {len(instances_te)} instances): {accuracy:.1f}")

#tr: 20, epochs: 100, learning rate: 0.005; Accuracy (test, 14 instances): 85.7
#tr: 20, epochs: 200, learning rate: 0.005; Accuracy (test, 14 instances): 42.9
#tr: 20, epochs: 250, learning rate: 0.005; Accuracy (test, 14 instances): 42.9
#tr: 20, epochs: 300, learning rate: 0.005; Accuracy (test, 14 instances): 42.9
#tr: 40, epochs: 100, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 40, epochs: 200, learning rate: 0.005; Accuracy (test, 14 instances): 85.7
#tr: 40, epochs: 250, learning rate: 0.005; Accuracy (test, 14 instances): 28.6
#tr: 40, epochs: 300, learning rate: 0.005; Accuracy (test, 14 instances): 28.6
#tr: 100, epochs: 100, learning rate: 0.005; Accuracy (test, 14 instances): 71.4
#tr: 100, epochs: 200, learning rate: 0.005; Accuracy (test, 14 instances): 28.6
#tr: 100, epochs: 250, learning rate: 0.005; Accuracy (test, 14 instances): 28.6
#tr: 100, epochs: 300, learning rate: 0.005; Accuracy (test, 14 instances): 28.6
#tr: 200, epochs: 100, learning rate

**D.** It is not always worth training for more epochs while keeping other hyperparameters fixed. As we can observe above, increasing epochs did not consistently improve accuracy but also in some cases even worsened it, especially for certain combinations of training size and learning rate. Beyond a certain point, the model may already converge or overfit, and additional epochs may not contribute to better performance or might even reduce it due to overfitting.
