# HW1a The Perceptron (20 pt)

In [3]:
import os

# Download train.dat
os.system("curl -O http://huang.eng.unt.edu/CSCE-5218/train.dat")

# Download test.dat
os.system("curl -O http://huang.eng.unt.edu/CSCE-5218/test.dat")

0

# Build the Perceptron Model

You will need to complete some of the function definitions below. DO NOT import any other libraries to complete this.

In [2]:
import math
import itertools
import re


# Corpus reader, all columns but the last one are coordinates;
#   the last column is the label
def read_data(file_name):
    f = open(file_name, 'r')

    data = []
    # Discard header line
    f.readline()
    for instance in f.readlines():
        if not re.search('\t', instance): continue
        instance = list(map(int, instance.strip().split('\t')))
        # Add a dummy input so that w0 becomes the bias
        instance = [-1] + instance
        data += [instance]
    return data


def dot_product(array1, array2):
    # Computing the dot product of two arrays
    dot_prod = sum([array1[i] * array2[i] for i in range(len(array1))])
    return dot_prod


def sigmoid(x):
    # Computing the output of sigmoid function
    return 1 / (1 + math.exp(-x))


# The output of the model, which for the perceptron is 
# the sigmoid function applied to the dot product of 
# the instance and the weights
def output(weight, instance):
    # Computing the output of the model
    return sigmoid(dot_product(weight, instance))


# Predict the label of an instance; this is the definition of the perceptron
# you should output 1 if the output is >= 0.5 else output 0
def predict(weights, instance):
    # Predict the label of an instance
    return 1 if output(weights, instance) >= 0.5 else 0


# Accuracy = percent of correct predictions
def get_accuracy(weights, instances):
    # You do not to write code like this, but get used to it
    correct = sum([1 if predict(weights, instance) == instance[-1] else 0
                   for instance in instances])
    return correct * 100 / len(instances)


# Train a perceptron with instances and hyperparameters:
#       lr (learning rate) 
#       epochs
# The implementation comes from the definition of the perceptron
#
# Training consists on fitting the parameters which are the weights
# that's the only thing training is responsible to fit
# (recall that w0 is the bias, and w1..wn are the weights for each coordinate)
#
# Hyperparameters (lr and epochs) are given to the training algorithm
# We are updating weights in the opposite direction of the gradient of the error,
# so with a "decent" lr we are guaranteed to reduce the error after each iteration.
def train_perceptron(instances, lr, epochs):

    # Initialize the weights to zero
    weights = [0] * (len(instances[0])-1)

    for _ in range(epochs):
        for instance in instances:
            # finding the input value to the sigmoid function
            in_value = dot_product(weights, instance)
            # finding the output of the sigmoid function
            output = sigmoid(in_value)
            # finding the error between the predicted output and the actual label
            error = instance[-1] - output
            # Update the weights
            for i in range(0, len(weights)):
                weights[i] += lr * error * output * (1-output) * instance[i]

    return weights


# Run it

In [4]:
instances_tr = read_data("train.dat")
instances_te = read_data("test.dat")
lr = 0.005
epochs = 5
weights = train_perceptron(instances_tr, lr, epochs)
accuracy = get_accuracy(weights, instances_te)
print(f"#tr: {len(instances_tr):3}, epochs: {epochs:3}, learning rate: {lr:.3f}; "
      f"Accuracy (test, {len(instances_te)} instances): {accuracy:.1f}")

#tr: 400, epochs:   5, learning rate: 0.005; Accuracy (test, 100 instances): 68.0


### Question 1

In `train_perceptron(instances, lr, epochs)`, we have the follosing code:
```
in_value = dot_product(weights, instance)
output = sigmoid(in_value)
error = instance[-1] - output
```

Why don't we have the following code snippet instead?
```
output = predict(weights, instance)
error = instance[-1] - output
```

#### TODO Add your answer here (text only)


In the first code snippet, we calculated the input value of the perceptron by taking the dot product of the weights and the input instance, and then we pass this input value through the sigmoid function to obtain the output value. We then calculate the error by subtracting the actual target value from the output value.

In the second code snippet, we use the predict function to obtain the output value, but we do not apply the sigmoid function. Therefore, the error calculation would not be correct since the perceptron is not outputting a probability score. The error would be calculated based on the raw output value, which would not be in the range of 0 to 1.

so we use the first code snippet to calculate the input and output values using the dot product and sigmoid function, respectively, in order to obtain the correct error calculation for the perceptron algorithm.



### Question 2
Train the perceptron with the following hyperparameters and calculate the accuracy with the test dataset.

```
tr_percent = [5, 10, 25, 50, 75, 100] # percent of the training dataset to train with
num_epochs = [5, 10, 20, 50, 100]              # number of epochs
lr = [0.005, 0.01, 0.05]              # learning rate
```

TODO: Write your code below and include the output at the end of each training loop (NOT AFTER EACH EPOCH)
of your code.The output should look like the following:
```
# tr:  20, epochs:   5, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
# tr:  20, epochs:  10, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
# tr:  20, epochs:  20, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
[and so on for all the combinations]
```
You will get different results with different hyperparameters.

#### TODO Add your answer here (code and output in the format above) 


In [8]:
instances_tr = read_data("train.dat")
instances_te = read_data("test.dat")
tr_percent = [5, 10, 25, 50, 75, 100] 
num_epochs = [5, 10, 20, 50, 100]
lr = [0.005, 0.01, 0.05]   

for tr in tr_percent:
    num_instances = int(len(instances_tr) * tr / 100)
    instances = instances_tr[:num_instances]
    for learning_rate in lr:
        for epoch in num_epochs:
            weights = train_perceptron(instances, learning_rate, epoch)
            correct = 0
            for instance in instances_te:
                prediction = predict(weights, instance)
                if prediction == instance[-1]:
                    correct += 1      
            accuracy = (correct / len(instances_te)) * 100
            
            print("tr: %3d, epochs: %3d, learning rate: %.3f; Accuracy (test, 100 instances): %.1f" % (tr, epoch, learning_rate, accuracy))

tr:   5, epochs:   5, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
tr:   5, epochs:  10, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
tr:   5, epochs:  20, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
tr:   5, epochs:  50, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
tr:   5, epochs: 100, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
tr:   5, epochs:   5, learning rate: 0.010; Accuracy (test, 100 instances): 68.0
tr:   5, epochs:  10, learning rate: 0.010; Accuracy (test, 100 instances): 68.0
tr:   5, epochs:  20, learning rate: 0.010; Accuracy (test, 100 instances): 68.0
tr:   5, epochs:  50, learning rate: 0.010; Accuracy (test, 100 instances): 68.0
tr:   5, epochs: 100, learning rate: 0.010; Accuracy (test, 100 instances): 68.0
tr:   5, epochs:   5, learning rate: 0.050; Accuracy (test, 100 instances): 68.0
tr:   5, epochs:  10, learning rate: 0.050; Accuracy (test, 100 instances): 68.0
tr:   5, epochs:  20, learni

In [9]:
instances_tr = read_data("train.dat")
instances_te = read_data("test.dat")
tr_percent = [5, 10, 25, 50, 75, 100] # percent of the training dataset to train with
num_epochs = [5, 10, 20, 50, 100]     # number of epochs
lr_array = [0.005, 0.01, 0.05]        # learning rate

for lr in lr_array:
  for tr_size in tr_percent:
    for epochs in num_epochs:
      size =  round(len(instances_tr)*tr_size/100)
      pre_instances = instances_tr[0:size]
      weights = train_perceptron(pre_instances, lr, epochs)
      accuracy = get_accuracy(weights, instances_te)
    print(f"#tr: {len(pre_instances):0}, epochs: {epochs:3}, learning rate: {lr:.3f}; "
            f"Accuracy (test, {len(instances_te)} instances): {accuracy:.1f}")


#tr: 20, epochs: 100, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
#tr: 40, epochs: 100, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
#tr: 100, epochs: 100, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
#tr: 200, epochs: 100, learning rate: 0.005; Accuracy (test, 100 instances): 74.0
#tr: 300, epochs: 100, learning rate: 0.005; Accuracy (test, 100 instances): 78.0
#tr: 400, epochs: 100, learning rate: 0.005; Accuracy (test, 100 instances): 77.0
#tr: 20, epochs: 100, learning rate: 0.010; Accuracy (test, 100 instances): 68.0
#tr: 40, epochs: 100, learning rate: 0.010; Accuracy (test, 100 instances): 68.0
#tr: 100, epochs: 100, learning rate: 0.010; Accuracy (test, 100 instances): 71.0
#tr: 200, epochs: 100, learning rate: 0.010; Accuracy (test, 100 instances): 78.0
#tr: 300, epochs: 100, learning rate: 0.010; Accuracy (test, 100 instances): 80.0
#tr: 400, epochs: 100, learning rate: 0.010; Accuracy (test, 100 instances): 80.0
#tr: 20, epochs: 100

### Question 3
Write a couple paragraphs interpreting the results with all the combinations of hyperparameters. Drawing a plot will probably help you make a point. In particular, answer the following:
- A. Do you need to train with all the training dataset to get the highest accuracy with the test dataset?
- B. How do you justify that training the second run obtains worse accuracy than the first one (despite the second one uses more training data)?
   ```
#tr: 100, epochs:  20, learning rate: 0.050; Accuracy (test, 100 instances): 71.0
#tr: 200, epochs:  20, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
```
- C. Can you get higher accuracy with additional hyperparameters (higher than `80.0`)?
- D. Is it always worth training for more epochs (while keeping all other hyperparameters fixed)?

#### TODO: Add your answer here (code and text)





-A. Do you need to train with all the training dataset to get the highest accuracy with the test dataset?
-B. How do you justify that training the second run obtains worse accuracy than the first one (despite the second one uses more training data)?

tr: 100, epochs: 20, learning rate: 0.050; Accuracy (test, 100 instances): 71.0
tr: 200, epochs: 20, learning rate: 0.005; Accuracy (test, 100 instances): 68.0

-C. Can you get higher accuracy with additional hyperparameters (higher than `80.0`)?
-D. Is it always worth training for more epochs (while keeping all other hyperparameters fixed)?

#### TODO: Add your answer here (code and text)

3a. Do you need to train with all the training dataset to get the highest accuracy with the test dataset?

To get the best accuracy with the test dataset, it is not required to train using all of the training data. The test dataset is used to assess the performance of the trained model after it has been trained using the training dataset. If the model is correctly trained on a representative sample of the training data, it ought to be able to generalize effectively to new, unexplored data, including the test dataset.

3b How do you justify that training the second run obtains worse accuracy than the first one (despite the second one uses more training data)?

Despite utilizing additional training data, there might be a number of reasons why the second run had worse accuracy than the first. The bigger training dataset could include more noise, outliers, or irrelevant samples, which the perceptron may have learnt to fit. This is one reason overfitting could be the cause. Unbalanced classes, subpar hyperparameters, or worse training data in the second run are some more potential causes.

3c. Can you get higher accuracy with additional hyperparameters (higher than 80.0)?

t's possible to achieve higher accuracy than 80.0 with additional hyperparameters, but there are no guarantees. The accuracy of a perceptron depends on several factors, including the quality and size of the training data, the complexity of the problem, the learning rate, and the number of epochs. Optimizing hyperparameters may lead to improved accuracy, but it requires careful experimentation and tuning.

In [11]:
instances_tr = read_data("train.dat")
instances_te = read_data("test.dat")
lr = 0.01
epochs = 100
weights = train_perceptron(instances_tr, lr, epochs)
accuracy = get_accuracy(weights, instances_te)
print(f"#tr: {len(instances_tr):3}, epochs: {epochs:3}, learning rate: {lr:.3f}; "
      f"Accuracy (test, {len(instances_te)} instances): {accuracy:.1f}")

#tr: 400, epochs: 100, learning rate: 0.010; Accuracy (test, 100 instances): 80.0


3d. Is it always worth training for more epochs (while keeping all other hyperparameters fixed)?

No, it is not always efficient to train a perceptron for further epochs without taking other hyperparameters into account. More epochs of training can enhance performance, but it can also cause overfitting, when the perceptron becomes overly specialized to the training data and struggles on fresh, untrained input. Dependent on the task, data, and other hyperparameters, the ideal number of epochs is determined. It's critical to keep an eye on the training and validation errors and to halt training as soon as the perceptron exhibits signs of overfitting.