# **CSCE 5218 / CSCE 4930 Deep Learning**

# **HW1a The Perceptron** (20 pt)


In [40]:
# Get the datasets
train = !curl.exe --output train.dat http://huang.eng.unt.edu/CSCE-5218/train.dat
test = !curl.exe --output test.dat http://huang.eng.unt.edu/CSCE-5218/test.dat


In [43]:
# Take a peek at the datasets


### Build the Perceptron Model

You will need to complete some of the function definitions below.  DO NOT import any other libraries to complete this. 

In [47]:
import math
import itertools
import re


# Corpus reader, all columns but the last one are coordinates;
#   the last column is the label
def read_data(train):
    f = open(train, 'r')

    data = []
    # Discard header line
    f.readline()
    for instance in f.readlines():
        if not re.search('\t', instance): continue
        instance = list(map(int, instance.strip().split('\t')))
        # Add a dummy input so that w0 becomes the bias
        instance = [-1] + instance
        data += [instance]
    return data


def dot_product(array1, array2):
    #TODO: Return dot product of array 1 and array 2
#     if len(array1) != len(array2):
#         raise ValueError("Arrays must have the same length")

    # Calculate dot product
    dot_product_result = sum(x * y for x, y in zip(array1, array2))
    
    return dot_product_result


def sigmoid(x):
    #TODO: Return outpout of sigmoid function on x
    
     return 1 / (1 + np.exp(-x))

# The output of the model, which for the perceptron is 
# the sigmoid function applied to the dot product of 
# the instance and the weights
def output(weight, instance):
    #TODO: return the output of the model 
    dot_product = np.dot(weight, instance)
    return sigmoid(dot_product)

# Predict the label of an instance; this is the definition of the perceptron
# you should output 1 if the output is >= 0.5 else output 0
def predict(weights, instance):
    #TODO: return the prediction of the model
    dot_product = np.dot(weights, instance)
    output = sigmoid(dot_product)
    return 1 if output >= 0.5 else 0


# Accuracy = percent of correct predictions
def get_accuracy(weights, instances):
    # You do not to write code like this, but get used to it
    correct = sum([1 if predict(weights, instances) == instance[-1] else 0
                   for instance in instances])
    return correct * 100 / len(instances)


# Train a perceptron with instances and hyperparameters:
#       lr (learning rate) 
#       epochs
# The implementation comes from the definition of the perceptron
#
# Training consists on fitting the parameters which are the weights
# that's the only thing training is responsible to fit
# (recall that w0 is the bias, and w1..wn are the weights for each coordinate)
#
# Hyperparameters (lr and epochs) are given to the training algorithm
# We are updating weights in the opposite direction of the gradient of the error,
# so with a "decent" lr we are guaranteed to reduce the error after each iteration.
def train_perceptron(instances, lr, epochs):

    #TODO: name this step
    weights = [0] * (len(instances[0])-1)

    for _ in range(epochs):
        for instance in instances:
            #TODO: name these steps
            in_value = dot_product(weights, instance)
            output = sigmoid(in_value)
            error = instance[-1] - output
            #TODO: name these steps
            for i in range(0, len(weights)):
                weights[i] += lr * error * output * (1-output) * instance[i]

    return weights

## Run it

In [58]:
instances_tr = read_data("train.dat")
f = open(instances_tr, 'r')
instances_te = read_data("test.dat")
f = open(instances_te)
lr = 0.005
epochs = 5
weights = train_perceptron(instances_tr, lr, epochs)
accuracy = get_accuracy(weights, instances_te)
print(f"#tr: {len(instances_tr):3}, epochs: {epochs:3}, learning rate: {lr:.3f}; "
      f"Accuracy (test, {len(instances_te)} instances): {accuracy:.1f}")

TypeError: unhashable type: 'list'

## Questions

Answer the following questions. Include your implementation and the output for each question.



### Question 1

In `train_perceptron(instances, lr, epochs)`, we have the follosing code:
```
in_value = dot_product(weights, instance)
output = sigmoid(in_value)
error = instance[-1] - output
```

Why don't we have the following code snippet instead?
```
output = predict(weights, instance)
error = instance[-1] - output
```

#### TODO Add your answer here (text only)




In [None]:
Answer: In the context of training a perceptron, the goal is to update the weights based on the error between the predicted output and the actual output. 

The code you provided suggests using the `predict` function to obtain the output and then calculating the error as the difference between the actual output (stored in `instance[-1]`) and the predicted output. However, this approach assumes that the `predict` function returns the actual output of the perceptron (0 or 1), which might not be the case.

Usually, the `predict` function returns the predicted output of the perceptron after passing the dot product of weights and instances through a sigmoid function, which squashes the output to a value between 0 and 1. This value represents the probability that the instance belongs to a certain class. In contrast, the actual output used in error calculation is a binary value (0 or 1).

Therefore, using the output directly from the `predict` function for error calculation might not be appropriate. Instead, you should calculate the output using the dot product of weights and instances, pass it through the sigmoid function to get a probability, and then calculate the error based on the difference between the actual output and this probability.

Here's a more detailed explanation of the code you provided:

1. `in_value = dot_product(weights, instance)`: This computes the dot product of weights and instance features.

2. `output = sigmoid(in_value)`: This computes the output of the perceptron by passing the dot product through the sigmoid function, which squashes the output to a value between 0 and 1.

3. `error = instance[-1] - output`: This calculates the error as the difference between the actual output (`instance[-1]`) and the predicted output (`output`). This error is then used to update the weights during training.

Using the output from the `predict` function directly for error calculation would skip the sigmoid transformation and might lead to incorrect error estimation. Therefore, it's better to use the dot product followed by the sigmoid function to calculate the output for error calculation.

### Question 2
Train the perceptron with the following hyperparameters and calculate the accuracy with the test dataset.

```
tr_percent = [5, 10, 25, 50, 75, 100] # percent of the training dataset to train with
num_epochs = [5, 10, 20, 50, 100]              # number of epochs
lr = [0.005, 0.01, 0.05]              # learning rate
```

TODO: Write your code below and include the output at the end of each training loop (NOT AFTER EACH EPOCH)
of your code.The output should look like the following:
```
# tr:  20, epochs:   5, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
# tr:  20, epochs:  10, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
# tr:  20, epochs:  20, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
[and so on for all the combinations]
```
You will get different results with different hyperparameters.

#### TODO Add your answer here (code and output in the format above) 


In [31]:
import numpy as np

# Define the perceptron functions
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def predict(weights, instance):
    dot_product = np.dot(weights, instance)
    output = sigmoid(dot_product)
    return 1 if output >= 0.5 else 0

def train_perceptron(instances, lr, epochs):
    # Initialize weights
    num_features = len(instances[0]) - 1
    weights = np.zeros(num_features)
    
    # Training loop
    for epoch in range(epochs):
        for instance in instances:
            features = instance[:-1]
            target = instance[-1]
            output = predict(weights, features)
            error = target - output
            weights += lr * error * features
    
    return weights

def test_perceptron(weights, test_set):
    num_correct = 0
    for instance in test_set:
        features = instance[:-1]
        target = instance[-1]
        prediction = predict(weights, features)
        if prediction == target:
            num_correct += 1
    accuracy = num_correct / len(test_set)
    return accuracy

# Assuming you have a dataset named 'data' where each row is an instance, 
# and the last column contains the target labels.

# Split the dataset into training and testing sets
np.random.seed(42)  # for reproducibility
np.random.shuffle(data)
num_instances = len(data)
test_set_size = num_instances // 3  # Assuming 1/3 of the data is used for testing

# Test for different hyperparameters
tr_percent = [5, 10, 25, 50, 75, 100] # percent of the training dataset to train with
num_epochs = [5, 10, 20, 50, 100]      # number of epochs
lr = [0.005, 0.01, 0.05]               # learning rate

for tr_p in tr_percent:
    for epochs in num_epochs:
        for learning_rate in lr:
            # Split data into training and testing sets
            num_train_instances = int(tr_p / 100 * num_instances)
            train_data = data[:num_train_instances]
            test_data = data[num_train_instances:num_train_instances + test_set_size]
            
            # Train the perceptron
            weights = train_perceptron(train_data, learning_rate, epochs)
            
            # Test the perceptron
            accuracy = test_perceptron(weights, test_data)
            
            print(f"Training Percent: {tr_p}%, Epochs: {epochs}, Learning Rate: {learning_rate}, Accuracy: {accuracy}")


NameError: name 'data' is not defined

In [35]:
instances_tr = read_data("train.dat")
instances_te = read_data("test.dat")
tr_percent = [5, 10, 25, 50, 75, 100] # percent of the training dataset to train with
num_epochs = [5, 10, 20, 50, 100]     # number of epochs
lr_array = [0.005, 0.01, 0.05]        # learning rate

for lr in lr_array:
  for tr_size in tr_percent:
    for epochs in num_epochs:
      size =  round(len(instances_tr)*tr_size/100)
      pre_instances = instances_tr[0:size]
      weights = train_perceptron(pre_instances, lr, epochs)
      accuracy = get_accuracy(weights, instances_te)
    print(f"#tr: {len(pre_instances):0}, epochs: {epochs:3}, learning rate: {lr:.3f}; "
            f"Accuracy (test, {len(instances_te)} instances): {accuracy:.1f}")

ValueError: Arrays must be of equal length

### Question 3
Write a couple paragraphs interpreting the results with all the combinations of hyperparameters. Drawing a plot will probably help you make a point. In particular, answer the following:
- A. Do you need to train with all the training dataset to get the highest accuracy with the test dataset?
- B. How do you justify that training the second run obtains worse accuracy than the first one (despite the second one uses more training data)?
   ```
#tr: 100, epochs:  20, learning rate: 0.050; Accuracy (test, 100 instances): 71.0
#tr: 200, epochs:  20, learning rate: 0.005; Accuracy (test, 100 instances): 68.0
```
- C. Can you get higher accuracy with additional hyperparameters (higher than `80.0`)?
- D. Is it always worth training for more epochs (while keeping all other hyperparameters fixed)?

#### TODO: Add your answer here (code and text)

