# Neural Network

The last model we have chosen was Neural Networks

The reason why we used this model is because since the data set is composed of several features, this model can handle complex relationships between the different features.

# Libraries

Import matplotlib, csv, numpy, and torch.

In [None]:
from DataLoader import DataLoader
from neural_network import NeuralNetwork

from sklearn.model_selection import train_test_split
from sklearn.model_selection import RandomizedSearchCV

import matplotlib.pyplot as plt
import torch.optim as optim
import torch.nn as nn
import numpy as np
import torch
import csv
import math

from sklearn.metrics import precision_score
from sklearn.metrics import accuracy_score

%matplotlib inline

# set default size of plots
plt.rcParams['figure.figsize'] = (8.0, 8.0)
plt.rcParams['image.interpolation'] = 'nearest'

torch.manual_seed(0)


%load_ext autoreload
%autoreload 2

### Metric

The metric we will be using is the `precision_score`, since we are mostly just interested in how well the model performs in making predictions.

Gauging the actual Positive/Negative is not as important as precision is a metric that isolates the performance of positive  predictions created by the model. It focuses more on the ratio of correctly predicted positive instances to the total predicted positive instances.

In [None]:
def get_Score(model, X_train, X_test, y_train, y_test, verbose = 1):
    
    #Retrieve the precision score of test & train via sklearn's precision_score
    precision_train = precision_score(model.predict(X_train), y_train, average=None, zero_division = 1)
    precision_test = precision_score(model.predict(X_test), y_test, average=None, zero_division = 1)
    
    #Graph the scores using the graph_Score function if verbose is set to TRUE
    if verbose:
        graph_Scores((precision_train, precision_test))
        
        print(f"Train Avg Precision : {precision_train.mean():.4f}")
        print(f"Test Avg Precision : {precision_test.mean():.4f}")
        
    return precision_train, precision_test

### Graphing & plots
To visualize how our data looks like, we will use matplotlib, for bar and point graphs.

In [None]:
def graph_Scores(scores, title='', ranges = None):
    #Determines the width of each bar in the graph
    width = 0.35

    #X-axis positions for the training and test scores
    br1 = range(len(scores[0])) if ranges is None else ranges
    br2 = [x + width for x in br1] 

    #Create a histogram to visualize the precision scores for the model
    plt.bar(br1, scores[0], width=width, edgecolor='black', label='Train Score')
    plt.bar(br2, scores[1], width=width, edgecolor='black', label='Test Score')
    plt.title(title)
    plt.ylabel('Precision Score')
    plt.xlabel('Class / Genre')

    #Show integers in the x axis accordingly
    plt.xticks(np.arange(0, 11, 1))
    
    plt.legend()
    plt.show()  

## Feature Set up & Splits

### Music Dataset
We will use the music dataset as out dataset. Each instance represents distinct features with the song:
- `Artist Name`       - Name of artist
- `Track Name`        - Name of song
- `Popularity`        - a value between 0 and 100, with 100 being the most popular
- `danceability`      - describes how suitable a track is for dancing
- `energy`            - perceptual measure of intensity and activity
- `key`               - The key the track is in
- `loudness`          - The overall loudness of a track in decibels (dB)
- `mode`              - indicates the modality (major or minor) of a track
- `speechiness`       - detects the presence of spoken words in a track
- `acousticness`      - A measure for whether the track is acoustic
- `instrumentalness`  - Predicts whether a track contains no vocals
- `liveness`          - Detects the presence of an audience in the recording
- `valence`           - Describes the musical positiveness conveyed by a track
- `tempo`             - The overall estimated tempo of a track in beats per minute (BPM)
- `duration_inmin/ms` - Duration in ms
- `time_signature`    - Specifies how many beats are in each bar (or measure)

The songs can be divided into 11 different genres. Upon searching the different songs of the same classes in Google, I as able to determine the 11 classes.
- `Pop`        - class 0
- `Hip-hop`    - class 1
- `Blues`      - class 2
- `Indian-pop` - class 3
- `Country`    - class 4
- `Rap`        - class 5
- `Rock`       - class 6
- `Ambient`    - class 7
- `Metal`      - class 8
- `R&B`        - class 9
- `Indie`      - class 10


### Dataset

In [None]:
df = DataLoader('Dataset 6 - Music Dataset/music.csv', True, True).df['raw']
df

We will make the results reproducible by assigning the `random_state` with a random arbitrary value of **42**.

In [None]:
random_state = 42
np.random.seed(random_state)

# 5.0 Setup the features and splits

### Getting batches
This would be used when creating batches whenever we train the network.

In [None]:
def get_batch(X, y, batch_size, mode='train'):
    assert mode in ['train', 'test'], "Mode must be 'train' or 'test'."
    
    indices = np.arange(X.shape[0])
    
    if mode == 'train':
        indices = np.random.permutation(indices)  # Use permutation instead of shuffle to avoid a None assignment
    
    X_batches = []
    y_batches = []
    
    for i in range(0, len(indices), batch_size):
        batch_indices = indices[i:i + batch_size]
        X_batch = X[batch_indices]
        y_batch = y[batch_indices]
        
        X_batches.append(X_batch)
        y_batches.append(y_batch)
    
    return X_batches, y_batches


In [None]:
X = df.drop(columns='Class')
y = df['Class']

print(f'X [{X.shape}] : {X.columns}')
print(f'y [{y.shape}]')

### Perform train test split
Set `test_size` to 0.2 and `stratify` to y to ensure proportional sampling.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=random_state)

print(f'X_train {X_train.shape}')
print(f'y_train {y_train.shape}')
print(f'X_test {X_test.shape}')
print(f'y_test {y_test.shape}')

plt.hist(y_train, label='Train')
plt.hist(y_test, label='Test')
plt.ylabel('Samples Taken')
plt.xlabel('Class / Genre')

plt.legend()
plt.show()

Convert the DataFrame arrays to torch.Tensor. We use torch.Tensor in PyTorch.

In [None]:
# Convert DataFrame to NumPy array
X_train = X_train.to_numpy()
y_train = y_train.to_numpy()
X_test = X_test.to_numpy()
y_test = y_test.to_numpy()

# Convert NumPy array into torch.Tensor array
X_train = torch.Tensor(X_train)
y_train = torch.Tensor(y_train)
X_test = torch.Tensor(X_test)
y_test = torch.Tensor(y_test)

# 3.1 Neural Network using ReLU

Instantiation of the Neural Network with the following parameters:
- `Input` - 13
- `Output` - 11
- `Hidden layers` - 2
- `list_hidden` - (50, 100)
- `activation` - relu
- `weight initialization` - default
- `Verbose` - 1

In [None]:
relu_network = NeuralNetwork(13, 11, list_hidden=(50, 100), activation='relu')
relu_network.create_network()
relu_network.init_weights()
relu_network.forward(X_train, verbose = True)

### Getting the predictions


In [None]:
np.random.seed(random_state)
random_indices = np.random.randint(X_train.shape[0], 
                                   size=10)

relu_scores, probabilities = relu_network.forward(X_train[random_indices])

Well use Adam as the optimizer with the following parameters:
- `params` - the parameters of the network
- `lr` = 0.001

In [None]:
optimizer = optim.Adam(relu_network.parameters(), 0.001)
target_classes = torch.Tensor(y_train[random_indices]).long()

Then instantiated a nn.CrossEntropyLoss() object to get the loss.

In [None]:
criterion = nn.CrossEntropyLoss()
loss = criterion(relu_scores, target_classes)

Then we update the gradients and weights of the nerwork.

In [None]:
# Clear gradients
optimizer.zero_grad()

# Using Backwards Propagation get the gradients
loss.backward()

# Update the weights
optimizer.step()

## Training the network

In [None]:
e = 0
max_epochs = 300
is_converged = False
previous_loss = 0
losses = []
total_correct_predictions = 0
total_samples = 0


# For each epoch
while e < max_epochs and is_converged is not True:
    
    current_epoch_loss = 0
    
    # Seperate training set into batches
    X_batch, y_batch = get_batch(X_train, y_train, 128, mode='train')
    
    # For each batch
    for X, y in zip(X_batch, y_batch):
        X = torch.Tensor(X)
        y = torch.Tensor(y).to(torch.long)

        optimizer.zero_grad()

        relu_scores, probabilities = relu_network.forward(X)
        
        loss = criterion(relu_scores, y)
        
        loss.backward()
        
        optimizer.step()
        
        current_epoch_loss += loss.item()
        
        # Get predicted class indices
        _, predicted_classes = torch.max(probabilities, 1)
        
        # Calculate the number of correct predictions
        total_correct_predictions += (predicted_classes == y).sum().item()
        total_samples += y.size(0)
    
    average_loss = current_epoch_loss / len(X_batch)
    losses.append(average_loss)
    
    # Display the average loss per epoch
    print('Epoch:', e + 1, '\tLoss: {:.6f}'.format(average_loss))
    
    if abs(previous_loss - loss) < 0.00000005:
        is_converged = True
    else:
        previous_loss = loss
        e += 1

Visualizing the lost per training epoch. 

In [None]:
x_values = [i for i in range(len(losses))]
y_values = losses

plt.plot(x_values, y_values)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss for each training epoch')

### Accuracy of the Training Data

In [None]:
relu_training_accuracy = total_correct_predictions / total_samples
relu_training_accuracy

### Trying out the trained network on the test data
Setting it to eval mode first to avoid updating the weights.

In [None]:
relu_network.eval()

Then perform forwad propagation on the test data and get the prediction results.

In [None]:
relu_scores,probabilities = relu_network.forward(X_test);
predictions = relu_network.predict(probabilities)
print("Predictions: ", predictions)

Get the accuracy of the network

In [None]:
relu_accuracy = accuracy_score(y_test,predictions)
print("Accuracy: ", relu_accuracy)
print("Test Accuracy: ", relu_training_accuracy)

Get the precision of the network.

In [None]:
precision = get_Score(relu_network, X_train, X_test, y_train, y_test, verbose = 0)

# 3.2 ReLU vs. ELU
We do the same steps as the previous network but we change the activation function to `ELU` and then compare which is better. 

Instantiation of the Neural Network with the following parameters:
- `Input` - 13
- `Output` - 11
- `Hidden layers` - 2
- `list_hidden` - (50, 100)
- `activation` - ELU
- `weight initialization` - default
- `Verbose` - 1

In [None]:
elu_network = NeuralNetwork(13, 11, list_hidden=(50, 100), activation='elu')
elu_network.create_network()
elu_network.init_weights()
elu_network.forward(X_train, verbose = True)

In [None]:
np.random.seed(random_state)
random_indices = np.random.randint(X_train.shape[0], 
                                   size=10)

elu_scores, probabilities = elu_network.forward(X_train[random_indices])

In [None]:
optimizer = optim.Adam(elu_network.parameters(), 0.001)
target_classes = torch.Tensor(y_train[random_indices]).long()

In [None]:
criterion = nn.CrossEntropyLoss()
loss = criterion(elu_scores, target_classes)

In [None]:
optimizer.zero_grad()
loss.backward()
optimizer.step()

In [None]:
e = 0
max_epochs = 300
is_converged = False
previous_loss = 0
losses = []
total_correct_predictions = 0
total_samples = 0


# For each epoch
while e < max_epochs and is_converged is not True:
    
    current_epoch_loss = 0
    
    # Seperate training set into batches
    X_batch, y_batch = get_batch(X_train, y_train, 128, mode='train')
    
    # For each batch
    for X, y in zip(X_batch, y_batch):
        X = torch.Tensor(X)
        y = torch.Tensor(y).to(torch.long)

        optimizer.zero_grad()

        elu_scores, probabilities = elu_network.forward(X)
        
        loss = criterion(elu_scores, y)
        
        loss.backward()
        
        optimizer.step()
        
        current_epoch_loss += loss.item()
        
        # Get predicted class indices
        _, predicted_classes = torch.max(probabilities, 1)
        
        # Calculate the number of correct predictions
        total_correct_predictions += (predicted_classes == y).sum().item()
        total_samples += y.size(0)
    
    average_loss = current_epoch_loss / len(X_batch)
    losses.append(average_loss)
    
    # Display the average loss per epoch
    print('Epoch:', e + 1, '\tLoss: {:.6f}'.format(average_loss))
    
    if abs(previous_loss - loss) < 0.00000005:
        is_converged = True
    else:
        previous_loss = loss
        e += 1

In [None]:
x_values = [i for i in range(len(losses))]
y_values = losses

plt.plot(x_values, y_values)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss for each training epoch')

In [None]:
elu_training_accuracy = total_correct_predictions / total_samples
elu_training_accuracy

In [None]:
elu_network.eval()

In [None]:
elu_scores,probabilities = elu_network.forward(X_test)
elu_predictions = elu_network.predict(probabilities)
print("Predictions: ", predictions)

In [None]:
elu_accuracy = accuracy_score(y_test,elu_predictions)
print("Accuracy: ", elu_accuracy)

In [None]:
precision = get_Score(elu_network, X_train, X_test, y_train, y_test, verbose = 0)

### Comparison for ReLU and ELU

Here we can see that the accuracy of the network got worse by changing the activation fuction from `ReLU` to `ELU`. Because of this we will continue with ReLU as our activation function.

From 52.08% to 51.42%

In [None]:
print("Accuracy ReLU: ", relu_accuracy)
print("Accuracy ELU: ", elu_accuracy)

# 3.3 Neural Network using ReLU with Weight Initialization
To see if the loss has really reached the lowest value for convergence, we tested if weight initialization would be able to bring it lower. Since ELU had a better accuracy compared to ReLU, we will using it for this section. 

Instantiation of the Neural Network with the following parameters:
- `Input` - 13
- `Output` - 11
- `Hidden layers` - 2
- `list_hidden` - (50, 100)
- `activation` - ReLU
- `weight initialization` - xavier
- `Verbose` - 1

In [None]:
xavier_network = NeuralNetwork(13, 11, list_hidden=(50, 100), activation='relu', init_method='xavier')
xavier_network.create_network()
xavier_network.init_weights()
xavier_network.forward(X_train, verbose = True)

In [None]:
np.random.seed(random_state)
random_indices = np.random.randint(X_train.shape[0], 
                                   size=10)

xavier_scores, probabilities = xavier_network.forward(X_train[random_indices])

In [None]:
optimizer = optim.Adam(xavier_network.parameters(), 0.001)
target_classes = torch.Tensor(y_train[random_indices]).long()

In [None]:
criterion = nn.CrossEntropyLoss()
loss = criterion(xavier_scores, target_classes)

In [None]:
optimizer.zero_grad()
loss.backward()
optimizer.step()

In [None]:
e = 0
max_epochs = 300
is_converged = False
previous_loss = 0
losses = []
total_correct_predictions = 0
total_samples = 0


# For each epoch
while e < max_epochs and is_converged is not True:
    
    current_epoch_loss = 0
    
    # Seperate training set into batches
    X_batch, y_batch = get_batch(X_train, y_train, 128, mode='train')
    
    # For each batch
    for X, y in zip(X_batch, y_batch):
        X = torch.Tensor(X)
        y = torch.Tensor(y).to(torch.long)

        optimizer.zero_grad()

        xavier_scores, probabilities = xavier_network.forward(X)
        
        loss = criterion(xavier_scores, y)
        
        loss.backward()
        
        optimizer.step()
        
        current_epoch_loss += loss.item()
        
        # Get predicted class indices
        _, predicted_classes = torch.max(probabilities, 1)
        
        # Calculate the number of correct predictions
        total_correct_predictions += (predicted_classes == y).sum().item()
        total_samples += y.size(0)
    
    average_loss = current_epoch_loss / len(X_batch)
    losses.append(average_loss)
    
    # Display the average loss per epoch
    print('Epoch:', e + 1, '\tLoss: {:.6f}'.format(average_loss))
    
    if abs(previous_loss - loss) < 0.00000005:
        is_converged = True
    else:
        previous_loss = loss
        e += 1

In [None]:
x_values = [i for i in range(len(losses))]
y_values = losses

plt.plot(x_values, y_values)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss for each training epoch')

In [None]:
xavier_training_accuracy = total_correct_predictions / total_samples
xavier_training_accuracy

In [None]:
xavier_network.eval()

In [None]:
xavier_scores,probabilities = xavier_network.forward(X_test);
xavier_predictions = xavier_network.predict(probabilities)
print("Predictions: ", xavier_predictions)

In [None]:
xavier_accuracy = accuracy_score(y_test,xavier_predictions)
print("Accuracy: ", xavier_accuracy)

In [None]:
precision = get_Score(xavier_network, X_train, X_test, y_train, y_test, verbose = 0)

### Comparison for ReLU,  ELU, and Xavier + ReLU 
Here we can see that by adding weight initialization the accuracy of the predictions got worse compared to having a default weight initialization. Because of this we will use the default weigh initialization values which are mean=0.0 and std=0.01.

From 52.08% to 51.83%

In [None]:
print('ReLU Accuracy: ', relu_accuracy)
print('ELU Accuracy: ', elu_accuracy)
print('Xavier + ReLU Accuracy: ', xavier_accuracy)

### Additional
When experimenting with another weight initialization technique, to be specific `Kaiming` initialization. The accuracy was 00.91% lower than when we used `Xavier` initialization. 

For reference this was the computed values for both Xavier and Kaiming initialization.

`Xavier`: 0.5183333333333333

`Kaiming`: 0.5091666666666667

# 3.4 Neural Network with additional tweaking

**Changes:**
Here we adjusted certain parts to try to increase the accuracy of the ReLU network. 
This was done by adjusting the following:
- `epochs during training` - from 300 -> 500
- `batch sizes` - from 128 -> 64

The reason why we chose 500 epochs is because when testing different values for the epochs, any value above 500 tends to have either the same or similar results to 500 epochs. With the diminishing returns in mind, we used 500 epochs since that was the value where the outputs stop having significant changes.  

As for the reason why we decreased the batch sizes, this is because by decreasing the batch sizes we can have a more stable and more generalizable updates. 

Instantiation of the Neural Network with the following parameters:
- `Input` - 13
- `Output` - 11
- `Hidden layers` - 5
- `list_hidden` - (50, 100)
- `activation` - relu
- `weight initialization` - default
- `Verbose` - 1

In [None]:
network = NeuralNetwork(13, 11, list_hidden=(50, 100), activation='relu')
network.create_network()
network.init_weights()
network.forward(X_train, verbose = True)

**Note:** When changing the number of hidden layers, we noticed that the network also performed worse. With that in mind, we opted to keep it at two hidden layers with (50, 100) as its values.

### Getting the predictions

In [None]:
np.random.seed(random_state)
random_indices = np.random.randint(X_train.shape[0], 
                                   size=10)

scores, probabilities = network.forward(X_train[random_indices])

In [None]:
optimizer = optim.Adam(network.parameters(), 0.001)
target_classes = torch.Tensor(y_train[random_indices]).long()

In [None]:
criterion = nn.CrossEntropyLoss()
loss = criterion(scores, target_classes)

In [None]:
# Clear gradients
optimizer.zero_grad()

# Using Backwards Propagation get the gradients
loss.backward()

# Update the weights
optimizer.step()

### Training the network

In [None]:
e = 0
max_epochs = 300
is_converged = False
previous_loss = 0
losses = []
total_correct_predictions = 0
total_samples = 0


# For each epoch
while e < max_epochs and is_converged is not True:
    
    current_epoch_loss = 0
    
    # Seperate training set into batches
    X_batch, y_batch = get_batch(X_train, y_train, 128, mode='train')
    
    # For each batch
    for X, y in zip(X_batch, y_batch):
        X = torch.Tensor(X)
        y = torch.Tensor(y).to(torch.long)

        optimizer.zero_grad()

        scores, probabilities = network.forward(X)
        
        loss = criterion(relu_scores, y)
        
        loss.backward()
        
        optimizer.step()
        
        current_epoch_loss += loss.item()
        
        # Get predicted class indices
        _, predicted_classes = torch.max(probabilities, 1)
        
        # Calculate the number of correct predictions
        total_correct_predictions += (predicted_classes == y).sum().item()
        total_samples += y.size(0)
    
    average_loss = current_epoch_loss / len(X_batch)
    losses.append(average_loss)
    
    # Display the average loss per epoch
    print('Epoch:', e + 1, '\tLoss: {:.6f}'.format(average_loss))
    
    if abs(previous_loss - loss) < 0.00000005:
        is_converged = True
    else:
        previous_loss = loss
        e += 1

Visualizing the lost per training epoch. 

In [None]:
x_values = [i for i in range(len(losses))]
y_values = losses

plt.plot(x_values, y_values)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss for each training epoch')

In [None]:
training_accuracy = total_correct_predictions / total_samples
training_accuracy

### Trying out the trained network on the test data

In [None]:
network.eval()

In [None]:
scores,probabilities = network.forward(X_test);
predictions = network.predict(probabilities)
print("Predictions: ", predictions)

In [None]:
accuracy = accuracy_score(y_test,predictions)
print("Accuracy: ", relu_accuracy)

In [None]:
precision = get_Score(network, X_train, X_test, y_train, y_test, verbose = 0)

### Previous Network vs Tweaked Network
Here we can see that by adding another layer to the hidden layers the accuracy of the predictions got slightly better compared to having only two hidden layers.

From 52.08% to 52.19%

In [None]:
print('ReLU Accuracy: ', relu_accuracy)
print('ReLU + Adjustments: ', accuracy)

# Summary
In creating these models, we learned the following...
<ol>
    <li>Neural Networks have many ways to improve the performance of the network. This includes optimization techniques, batch size adjustments, and weight initialization.
    <li>To improve the accuracy, extensive testing is needed to find the right values for its hyperparameters.
    <li>Neural Networks can improve by adjusting the number of hidden layers.
    <li>The activation function plays a major role in the performance of the network. So finding the right one will greatly improve the networks performance.
    <li>Although there are improvements when adjusting the hyperparameters and optimization techniques of the network there is not much change in the accuracy of the model.    

In [None]:
print(f'ReLU + Adjustments Training: {training_accuracy:.4f}')
print(f'ReLU + Adjustments Test: {accuracy:.4f}')

print(f'ReLU Training: {relu_training_accuracy:.4f}')
print(f'ReLU Test: {relu_accuracy:.4f}')

print(f'Xavier + ReLU Training: {xavier_training_accuracy:.4f}')
print(f'Xavier + ReLU Test: {xavier_accuracy:.4f}')

print(f'ELU Training: {elu_training_accuracy:.4f}')
print(f'ELU Test: {elu_accuracy:.4f}')

**Best Scores:**

Train Accuracy: ReLU + Adjustments ~52.50%

Test Accuracy: ReLU + Adjustments ~52.08%

**Worst Scores:**

Train Accuracy: ELU ~51.83%

Test Accuracy: ELU ~51.67%

In [None]:
# Read the ver of the dataframe you want to use
# df = DataLoader('Dataset 6 - Music Dataset/music.csv', True, True).df['raw']