# **CSCE 5218 / CSCE 4930 Deep Learning**

# **HW1a The Perceptron** (20 pt)


In [18]:
# Get the datasets
!wget http://huang.eng.unt.edu/CSCE-5218/test.dat
!wget http://huang.eng.unt.edu/CSCE-5218/train.dat


--2023-02-20 17:54:31--  http://huang.eng.unt.edu/CSCE-5218/test.dat
Resolving huang.eng.unt.edu (huang.eng.unt.edu)... 129.120.123.155
Connecting to huang.eng.unt.edu (huang.eng.unt.edu)|129.120.123.155|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2844 (2.8K)
Saving to: ‘test.dat.2’


2023-02-20 17:54:32 (264 MB/s) - ‘test.dat.2’ saved [2844/2844]

--2023-02-20 17:54:32--  http://huang.eng.unt.edu/CSCE-5218/train.dat
Resolving huang.eng.unt.edu (huang.eng.unt.edu)... 129.120.123.155
Connecting to huang.eng.unt.edu (huang.eng.unt.edu)|129.120.123.155|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11244 (11K)
Saving to: ‘train.dat.2’


2023-02-20 17:54:32 (239 MB/s) - ‘train.dat.2’ saved [11244/11244]



In [19]:
# Take a peek at the datasets
!head train.dat
!head test.dat

A1	A2	A3	A4	A5	A6	A7	A8	A9	A10	A11	A12	A13	
1	1	0	0	0	0	0	0	1	1	0	0	1	0
0	0	1	1	0	1	1	0	0	0	0	0	1	0
0	1	0	1	1	0	1	0	1	1	1	0	1	1
0	0	1	0	0	1	0	1	0	1	1	1	1	0
0	1	0	0	0	0	0	1	1	1	1	1	1	0
0	1	1	1	0	0	0	1	0	1	1	0	1	1
0	1	1	0	0	0	1	0	0	0	0	0	1	0
0	0	0	1	1	0	1	1	1	0	0	0	1	0
0	0	0	0	0	0	1	0	1	0	1	0	1	0
A1	A2	A3	A4	A5	A6	A7	A8	A9	A10	A11	A12	A13
1	1	1	1	0	0	1	1	0	0	0	1	1	0
0	0	0	1	0	0	1	1	0	1	0	0	1	0
0	1	1	1	0	1	1	1	1	0	0	0	1	0
0	1	1	0	1	0	1	1	1	0	1	0	1	0
0	1	0	0	0	1	0	1	0	1	0	0	1	0
0	1	1	0	0	1	1	1	1	1	1	0	1	0
0	1	1	1	0	0	1	1	0	0	0	1	1	0
0	1	0	0	1	0	0	1	1	0	1	1	1	0
1	1	1	1	0	0	1	1	0	0	0	0	1	0


### Build the Perceptron Model

You will need to complete some of the function definitions below.  DO NOT import any other libraries to complete this. 

In [20]:
import math
import itertools
import re


# Corpus reader, all columns but the last one are coordinates;
#   the last column is the label
def read_data(file_name):
    f = open(file_name, 'r')

    data = []
    # Discard header line
    f.readline()
    for instance in f.readlines():
        if not re.search('\t', instance): continue
        instance = list(map(int, instance.strip().split('\t')))
        # Add a dummy input so that w0 becomes the bias
        instance = [-1] + instance
        data += [instance]
    return data


def dot_product(array1, array2):
    return sum(x * y for x, y in zip(array1, array2))


def sigmoid(x):
    return 1 * (1 + math.exp( -x))

# The output of the model, which for the perceptron is 
# the sigmoid function applied to the dot product of 
# the instance and the weights
def output(weight, instance):
    linear_combination = dot_product(weight, instance[:-1])
    output = sigmoid(linear_combination)
    return output

# Predict the label of an instance; this is the definition of the perceptron
# you should output 1 if the output is >= 0.5 else output 0
def predict(weights, instance):
    prediction = sigmoid(dot_product(weights, instance))
    if prediction >= 0.5:
        return 1
    else:
        return 0


# Accuracy = percent of correct predictions
def get_accuracy(weights, instances):
    # You do not to write code like this, but get used to it
    correct = sum([1 if predict(weights, instance) == instance[-1] else 0
                   for instance in instances])
    return correct * 100 / len(instances)


# Train a perceptron with instances and hyperparameters:
#       lr (learning rate) 
#       epochs
# The implementation comes from the definition of the perceptron
#
# Training consists on fitting the parameters which are the weights
# that's the only thing training is responsible to fit
# (recall that w0 is the bias, and w1..wn are the weights for each coordinate)
#
# Hyperparameters (lr and epochs) are given to the training algorithm
# We are updating weights in the opposite direction of the gradient of the error,
# so with a "decent" lr we are guaranteed to reduce the error after each iteration.
def train_perceptron(instances, lr, epochs):
    weights = [0] * (len(instances[0])-1)  # Initialize weights to all zeros
    for _ in range(epochs):
        for instance in instances:
            in_value = dot_product(weights, instance[:-1])  # Calculate input value
            output = sigmoid(in_value)  # Apply sigmoid activation function
            error = instance[-1] - output  # Calculate prediction error
            # Update weights according to Perceptron learning rule
            for i in range(0, len(weights)):
                weights[i] += lr * error * output * (1 - output) * instance[i]
    return weights


## Run it

In [21]:
instances_tr = read_data("train.dat")
instances_te = read_data("test.dat")
lr = 0.005
epochs = 5
weights = train_perceptron(instances_tr, lr, epochs)
accuracy = get_accuracy(weights, instances_te)
print(f"#tr: {len(instances_tr):3}, epochs: {epochs:3}, learning rate: {lr:.3f}; "
      f"Accuracy (test, {len(instances_te)} instances): {accuracy:.1f}")

#tr: 400, epochs:   5, learning rate: 0.005; Accuracy (test, 100 instances): 32.0


## Questions

Answer the following questions. Include your implementation and the output for each question.



### Question 1

In `train_perceptron(instances, lr, epochs)`, we have the follosing code:
```
in_value = dot_product(weights, instance)
output = sigmoid(in_value)
error = instance[-1] - output
```

Why don't we have the following code snippet instead?
```
output = predict(weights, instance)
error = instance[-1] - output
```

#### TODO Add your answer here (text only)




1A)The only variation between the two code fragments is a small one. In the first code snippet, the output is produced using the sigmoid function after the value is calculated as the dot product of the weights and the instance. The error is determined as the difference between the real label (instance [-1]) and the predicted label in the second code snippet, where the output is produced using the predict function (output). These lines of code are meant to figure out the discrepancy between the real label and the anticipated label so that the weights may be adjusted to lessen the mistake. The difference between the actual label and the anticipated label is used to determine the error in both scenarios, which is the same thing. Therefore, the only difference between the two code snippets is whether the prediction is done using the predict function or the sigmoid function. The specific implementation and model's design would determine which one should be used. The application of the sigmoid function is followed by the prediction being generated using the dot product of the weights and the instance in the first code snippet, which is more specifically designed for a perceptron model.

### Question 2
Train the perceptron with the following hyperparameters and calculate the accuracy with the test dataset.

```
tr_percent = [5, 10, 25, 50, 75, 100] # percent of the training dataset to train with
num_epochs = [5, 10, 20, 50, 100]              # number of epochs
lr = [0.005, 0.01, 0.05]              # learning rate
```

TODO: Write your code below and include the output at the end of each training loop (NOT AFTER EACH EPOCH)
of your code.The output should look like the following:
```
# tr:  20, epochs:   5, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
# tr:  20, epochs:  10, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
# tr:  20, epochs:  20, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
[and so on for all the combinations]
```
You will get different results with different hyperparameters.

#### TODO Add your answer here (code and output in the format above) 


In [22]:
instances_tr = read_data("train.dat")
instances_te = read_data("test.dat")
tr_percent = [5, 10, 25, 50, 75, 100] # percent of the training dataset to train with
num_epochs = [5, 10, 20, 50, 100]     # number of epochs
lr_array = [0.005, 0.01, 0.05]        # learning rate

for lr in lr_array:
  for tr_size in tr_percent:
    for epochs in num_epochs:
      size =  round(len(instances_tr)*tr_size/100)
      pre_instances = instances_tr[0:size]
      weights = train_perceptron(pre_instances, lr, epochs)
      accuracy = get_accuracy(weights, instances_te)
    print(f"#tr: {len(pre_instances):0}, epochs: {epochs:3}, learning rate: {lr:.3f}; "
            f"Accuracy (test, {len(instances_te)} instances): {accuracy:.1f}")

#tr: 20, epochs: 100, learning rate: 0.005; Accuracy (test, 100 instances): 32.0
#tr: 40, epochs: 100, learning rate: 0.005; Accuracy (test, 100 instances): 32.0
#tr: 100, epochs: 100, learning rate: 0.005; Accuracy (test, 100 instances): 32.0
#tr: 200, epochs: 100, learning rate: 0.005; Accuracy (test, 100 instances): 32.0
#tr: 300, epochs: 100, learning rate: 0.005; Accuracy (test, 100 instances): 32.0
#tr: 400, epochs: 100, learning rate: 0.005; Accuracy (test, 100 instances): 32.0
#tr: 20, epochs: 100, learning rate: 0.010; Accuracy (test, 100 instances): 32.0
#tr: 40, epochs: 100, learning rate: 0.010; Accuracy (test, 100 instances): 32.0
#tr: 100, epochs: 100, learning rate: 0.010; Accuracy (test, 100 instances): 32.0
#tr: 200, epochs: 100, learning rate: 0.010; Accuracy (test, 100 instances): 32.0
#tr: 300, epochs: 100, learning rate: 0.010; Accuracy (test, 100 instances): 32.0
#tr: 400, epochs: 100, learning rate: 0.010; Accuracy (test, 100 instances): 32.0
#tr: 20, epochs: 100

### Question 3
Write a couple paragraphs interpreting the results with all the combinations of hyperparameters. Drawing a plot will probably help you make a point. In particular, answer the following:
- A. Do you need to train with all the training dataset to get the highest accuracy with the test dataset?
- B. How do you justify that training the second run obtains worse accuracy than the first one (despite the second one uses more training data)?
   ```
#tr: 100, epochs:  20, learning rate: 0.050; Accuracy (test, 100 instances): 71.0
#tr: 200, epochs:  20, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
```
- C. Can you get higher accuracy with additional hyperparameters (higher than `80.0`)?
- D. Is it always worth training for more epochs (while keeping all other hyperparameters fixed)?

#### TODO: Add your answer here (code and text)



3A) The accuracy of a machine learning model is affected by several factors, including the size of the training dataset, the complexity of the model, the quality of features, and the presence of overfitting or excessive. -adjustment. However, in general, a larger training dataset can lead to higher accuracy on the test dataset, assuming the model is well designed and can generalize well to new data, yet see.
However, increasing the size of the training dataset does not always lead to improved accuracy. In some cases, this can even lead to redundant pages, where the model becomes too complex and performs well on the training data but poorly on the test data. In such cases, other techniques such as regularization or early stopping can be used to solve the redundancy problem. In summary, the relationship between the size of the training dataset and the accuracy of the test dataset is complex and depends on a number of factors. Using only a large training dataset does not guarantee high accuracy, and a well-designed model that can generalize well to the new data is important for achieving high accuracy.

3B) There can be several reasons why a model trained on a larger training dataset (200 instances) can have worse accuracy (68.0%) compared to a model trained on a smaller training dataset (100 instances) with an accuracy of 71.0%. Some of the reasons can be: OVERFITTING: More data increases the likelihood that the model will memorize the training set and underperform on the test set. Overfitting is what is happening here, and it can make tests less accurate.
LEARNING RATE: In comparison to the first run, the second run shows a lower learning rate (0.005). (0.050). The model may converge too slowly and be less efficient at recognizing patterns in the data if the learning rate is lower. A higher learning rate, on the other hand, may lead the model to fluctuate and fail to reach the best outcome. It can be difficult to strike the perfect balance between these two extremes.
NUMBER OF EPOCHS: The model is trained for 20 epochs in each run. To make sure that the model has seen every example in the training data, a higher number of epochs can be required for a bigger training dataset. On the other side, overfitting can also result from a large increase in the number of epochs. In conclusion, a machine learning model's accuracy is impacted by a variety of parameters, and merely enlarging the training dataset won't result in an increase in accuracy. For reaching high accuracy, other factors including a well-designed model, a good learning rate, and an acceptable number of epochs are crucial.

3C) Yes, adding more hyperparameters can lead to improved accuracy, but this is not a given. The quantity and quality of the training dataset, the architecture of the model, the features used, and the values of the hyperparameters are some of the variables that affect how accurate a machine learning model is.
In some situations, adding more hyperparameters might increase the model's accuracy by enabling it to learn more intricate data representations. This raises the possibility of overfitting, a situation in which a model is too sophisticated and performs well on training data but badly on test data.
A mix of methods, such as feature engineering, model selection, hyperparameter tweaking, and regularization, can be utilized to increase accuracy. For instance, increasing accuracy may be achieved by optimizing the hyperparameters using methods like cross-validation, grid search, or random search.
In conclusion, accuracy higher than 80.0% can be attained by tuning the hyperparameters, but it also depends on the quality of the data, the model, and the features chosen.

3D) No, it isn't always beneficial to train for additional epochs. The size and quality of the training dataset, the model architecture used, the features selected, and the values of the hyperparameters, including the number of epochs, all affect how accurate a machine learning model is.
Underfitting, when the model has not seen enough samples to learn the patterns in the data, can occur from training for too few epochs. On the other side, overfitting, which occurs when the model grows too complicated and memorizes the training data instead of generalizing to new, unknown data, may be caused by training for too many epochs.
The size of the training dataset, the model's complexity, the caliber of the features, and the existence of overfitting or underfitting are some of the variables that affect the ideal number of epochs. In general, increasing the number of epochs can reduce training loss, but after a given number of epochs, test accuracy may begin to plateau.
The ideal number of epochs varies on a variety of parameters and preparing for additional epochs is not always worthwhile. The ideal number of epochs for a particular dataset and model may be determined by keeping an eye on the training loss and test accuracy, utilizing methods like early stopping, and monitoring the results.