# 8. Feed-forward Neural Network with PyTorch

## 1. About Feed-forward Networks

### 1.1 Logistic Regression Transition to Neural Networks

**Logistic Regression Review**
(Same graphic illustrating the 4 major steps of log. regression i.e. linear regression to get logits, applied to softmax function, which generates probabilities of targets belonging to any number of classes, which then gives us a label to assign to the target. 


In [1]:
import torch
import torch.nn as nn

In [2]:
class LogisticRegressionModel(nn.Module):
    def __init__(self, input_size, num_classes):
        super(LogisticRegressionModel, self).__init__()
        self.linear = nn.Linear(input_dim, output_dim)
    
    def forward(self, x):
        out = self.linear(x)
        return out

In [3]:
input_dim = 28*28
output_dim = 10

model = LogisticRegressionModel(input_dim, output_dim)

In [4]:
print(model)

LogisticRegressionModel (
  (linear): Linear (784 -> 10)
)


Printing the model shows that the above model consists of a single linear layer that takes a vector of size 784 and outputs a vector of 10 (our digit classes 0-9)

**Logistic Regression Pitfalls**
* Can represent linear functions well
    * $y=2x+3$
    * $y=x_1+x_2$
    * $y=x_1+3x_2+4x_3$
* Can **not** represent **non-linear** functions
    * $y=4x_1+2x_2^2+3x_3^2$
    * $y=x_1x_2$

### 1.2 Introducing a Non-Linear Function

In this section, he starts with a graphic that shows how a feed-forward neural network squeezes in a hidden layer between the input layer and the linear function layer of logistic regression. The group of layers from logistic regression that consists of the logits and the softmax layer are considered the readout layer. 

### 1.3 Non-linear Function In-depth
* Function: takes a number and performs a mathematical operation
* Common Types of Non-linearity:
    * ReLU (Rectified Linear Unit)
    * Sigmoid
    * Tanh
    
**Sigmoid (Logistic)**
* $\sigma(x) = \frac1{1+e^{-x}}$
* Input number $\to$ [0,1]
    * Large negative number $\to$ 0
    * Large positive number $\to$ 1
* Cons:
    1. Activation saturates at 0 or 1 with **gradients $\approx$ 0**
        * No signal to update weights $\to$ **cannot learn**
        * Solution: have to carefully initialize weights to prevent this
    - Outputs not centered around 0
        * If output is always positive $\to$ gradients always positive or negative $\to$ bad for gradient updates
        
**Tanh**
* $tanh(x)=2\sigma2x-1$
    * A scaled sigmoid function (see above section
* Input number $\to$ [-1,1]
    * Cons:
        1. Activation saturates at 0 or 1 with **gradients $\approx$ 0**
        * No signal to update weights $\to$ **cannot learn**
        * Solution: have to carefully initialize weights to prevent this
        
**ReLU**
* $f(x)=max(0,1)$
* Pros:
    1. Accelerates convergence $\to$ train **faster**
    - **Less computationally expensive operation** compared to Sigmoid/Tanh exponentials
* Cons:
    1. Many ReLUs "die" $\to$ gradients = 0 forever
        * Solution: Careful learning rate choice

## 2. Building a Feed-forward Neural Network with PyTorch

### Model A: 1 Hidden Layer Feed-forward Neural Network (Sigmoid Activation)

Back to graphic showing the single layer neural network with the readout function using softmax / logistic regression

### Steps
* Step 1: Load Dataset
* Step 2: Make Dataset Iterable
* Step 3: Create Model Class
* Step 4: Instantiate Model Class
* Step 5: Instantiate Loss Class
* Step 6: Instantiate Optimizer Class
* Step 7: Train Model!

### Step 1: Load MNIST Dataset

Images from 0 to 9

In [5]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

In [6]:
train_dataset = dsets.MNIST(root="./data",
                            train=True,
                            transform=transforms.ToTensor(), 
                            download=True)

test_dataset = dsets.MNIST(root='./data',
                           train=False,
                           transform=transforms.ToTensor())

### Step 2: Make Dataset Iterable

In [7]:
batch_size = 100
n_iters = 3000
num_epochs = int(n_iters / (len(train_dataset) / batch_size))  #Return as type INT

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

### Step 3: Create Model Class

In [9]:
class FeedForwardNeuralNetModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(FeedForwardNeuralNetModel, self).__init__()
        #Linear Function
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        #Non-Linearity
        self.sigmoid = nn.Sigmoid()
        #Linear Function (readout)
        self.fc2 = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        #Linear Function
        out = self.fc1(x)
        #Non-Linearity
        out = self.sigmoid(out)
        #Linear Func. (readout)
        out = self.fc2(out)
        return out

### Step 4: Instantiate Model Class

* **Input** Dimension: **784**
    * Size of image
    * $28\times28 = 784$
* **Output** Dimension: **10**
    * 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
* **Hidden** Dimension: **100**
    * Can be any number
    * Similar term
        * Number of neurons
        * Number of non-linear activation functions

In [10]:
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedForwardNeuralNetModel(input_dim, hidden_dim, output_dim)

### Step 5: Instantiate Loss Class

* Feed Forward Neural Network: **Cross-Entropy Loss**
    * Logistic Regression: **Cross-Entropy Loss**
    * Linear Regression: **MSE**

In [11]:
criterion = nn.CrossEntropyLoss() # This calculates our softmax automatically, which is why we don't have it in the model def.

### Step 6: Instantiate Optimizer Class

* Simplified Equation
    * $\theta = \theta - \eta \cdot \nabla_\theta$
        * $\theta$: Parameters (our variables)
        * $\eta$: Learning Rate
        * $\nabla_\theta$: Our Parameters' Gradients
* Even Simpler Equation
    * parameters = parameters - learning rate * parameters' gradients
    * **At every iteration, we update our model's parameters**

In [12]:
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

### Model Parameters In-depth

In [13]:
print(model.parameters())

print(len(list(model.parameters())))

<generator object Module.parameters at 0x1108fed00>
4


In [14]:
# FC 1 Parameters
print(list(model.parameters())[0].size())

# FC 1 Bias Parameters
print(list(model.parameters())[1].size())

# FC 2 Parameters
print(list(model.parameters())[2].size())

# FC 2 Bias Parameters
print(list(model.parameters())[3].size())

torch.Size([100, 784])
torch.Size([100])
torch.Size([10, 100])
torch.Size([10])


He explains the above using another graphic. Basically, he walks through the matrix math that allows us to arrive at the solution. Matrix of 100 images (784,1) going in. 10 probabilities coming out the other end. 

### Step 7: Train the model
* Process
    1. Convert inputs/labels to variables
    - Clear gradient buffers
    - Get outputs given inputs
    - Get Loss
    - Get gradients w.r.t. parameters
    - Update parameters using gradients
        * parameters = parameters - learning_rate * parameters_gradients
    - REPEAT

In [15]:
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images as Variable
        images = Variable(images.view(-1,28*28))
        labels = Variable(labels)
        
        #Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        #Forward pass to get outputs / logits
        outputs = model(images)
        
        #Calculate Loss: softmax --> Cross Entropy Loss
        loss = criterion(outputs, labels)
        
        #Get gradients w.r.t. parameters
        loss.backward()
        
        #Update parameters
        optimizer.step()
        
        iter += 1
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0 
            total = 0
            #Iterate through the test dataset
            for images, labels in test_loader:
                # Load images to Torch Variable
                images = Variable(images.view(-1,28*28))
                
                # Forward pass only to get outputs/logits
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
                
            accuracy = 100 * correct / total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data[0], accuracy))

Iteration: 500. Loss: 0.5383222103118896. Accuracy: 85.51
Iteration: 1000. Loss: 0.3562361001968384. Accuracy: 89.39
Iteration: 1500. Loss: 0.2724452316761017. Accuracy: 90.46
Iteration: 2000. Loss: 0.2643623650074005. Accuracy: 91.16
Iteration: 2500. Loss: 0.2329949140548706. Accuracy: 91.52
Iteration: 3000. Loss: 0.3233594000339508. Accuracy: 91.89


Not to shabby using sigmoid activation. However, as we'll see later, using the sigmoid function for activation is pretty good, but not the best. Below, we explore some of the other types of activation functions ans compare their effectiveness

### Model B: 1 Hidden Layer Feed Forward Neural Network (Tanh Activation)

### Steps

* Step 1: Load Dataset
* Step 2: Make Dataset Iterable
* Step 3: **Create Model Class**
* Step 4: Instantiate Model Class
* Step 5: Instantiate Loss Class
* Step 6: Instantiate Optimizer Class
* Step 7: Train Model!

In [18]:
# Instructor had all the code from the beginning (even imports) recreated. I'm just going to change the model
# definition part and run through to the end from there. I think I understand enough so I don't have to waste 
# time / space rewriting each section, when I can change just the relevant section

'''
Step 3: Create Model Class (tanh)
'''

class FeedForwardNeuralNetModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(FeedForwardNeuralNetModel, self).__init__()
        #Linear Function
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        #Non-Linearity
        self.tanh = nn.Tanh()
        #Linear Function (readout)
        self.fc2 = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        #Linear Function
        out = self.fc1(x)
        #Non-Linearity
        out = self.tanh(out)
        #Linear Func. (readout)
        out = self.fc2(out)
        return out

'''
Step 4: Instantiate Model Class
'''
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedForwardNeuralNetModel(input_dim, hidden_dim, output_dim)

'''
Step 5: Instantiate Loss Class
'''

criterion = nn.CrossEntropyLoss()

'''
Step 6: Instantiate Optimizer Class
'''

learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
Step 7: Train the tanh model
'''
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images as Variable
        images = Variable(images.view(-1,28*28))
        labels = Variable(labels)
        
        #Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        #Forward pass to get outputs / logits
        outputs = model(images)
        
        #Calculate Loss: softmax --> Cross Entropy Loss
        loss = criterion(outputs, labels)
        
        #Get gradients w.r.t. parameters
        loss.backward()
        
        #Update parameters
        optimizer.step()
        
        iter += 1
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0 
            total = 0
            #Iterate through the test dataset
            for images, labels in test_loader:
                # Load images to Torch Variable
                images = Variable(images.view(-1,28*28))
                
                # Forward pass only to get outputs/logits
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
                
            accuracy = 100 * correct / total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data[0], accuracy))

Iteration: 500. Loss: 0.3427676856517792. Accuracy: 91.35
Iteration: 1000. Loss: 0.2686253786087036. Accuracy: 92.57
Iteration: 1500. Loss: 0.18530148267745972. Accuracy: 93.35
Iteration: 2000. Loss: 0.1924872249364853. Accuracy: 94.05
Iteration: 2500. Loss: 0.2858823537826538. Accuracy: 94.74
Iteration: 3000. Loss: 0.16669431328773499. Accuracy: 95.18


### Model C: 1 Hidden Layer Feed Forward Neural Network (ReLU Activation)

In [19]:
# Instructor had all the code from the beginning (even imports) recreated. I'm just going to change the model
# definition part and run through to the end from there. I think I understand enough so I don't have to waste 
# time / space rewriting each section, when I can change just the relevant section

'''
Step 3: Create Model Class (relu)
'''

class FeedForwardNeuralNetModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(FeedForwardNeuralNetModel, self).__init__()
        #Linear Function
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        #Non-Linearity
        self.relu = nn.ReLU()
        #Linear Function (readout)
        self.fc2 = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        #Linear Function
        out = self.fc1(x)
        #Non-Linearity
        out = self.relu(out)
        #Linear Func. (readout)
        out = self.fc2(out)
        return out

'''
Step 4: Instantiate Model Class
'''
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedForwardNeuralNetModel(input_dim, hidden_dim, output_dim)

'''
Step 5: Instantiate Loss Class
'''

criterion = nn.CrossEntropyLoss()

'''
Step 6: Instantiate Optimizer Class
'''

learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
Step 7: Train the ReLU model
'''
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images as Variable
        images = Variable(images.view(-1,28*28))
        labels = Variable(labels)
        
        #Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        #Forward pass to get outputs / logits
        outputs = model(images)
        
        #Calculate Loss: softmax --> Cross Entropy Loss
        loss = criterion(outputs, labels)
        
        #Get gradients w.r.t. parameters
        loss.backward()
        
        #Update parameters
        optimizer.step()
        
        iter += 1
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0 
            total = 0
            #Iterate through the test dataset
            for images, labels in test_loader:
                # Load images to Torch Variable
                images = Variable(images.view(-1,28*28))
                
                # Forward pass only to get outputs/logits
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
                
            accuracy = 100 * correct / total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data[0], accuracy))

Iteration: 500. Loss: 0.3681732416152954. Accuracy: 91.41
Iteration: 1000. Loss: 0.2608332931995392. Accuracy: 92.89
Iteration: 1500. Loss: 0.14521712064743042. Accuracy: 94.03
Iteration: 2000. Loss: 0.1494460254907608. Accuracy: 94.79
Iteration: 2500. Loss: 0.13547107577323914. Accuracy: 95.41
Iteration: 3000. Loss: 0.3660094141960144. Accuracy: 95.72


Nice! ReLU activation seems faster, and it also returns the best accuracy so far. I made quite a bit of progress today. Also, it seems that After defining the new model, it is necessary to re-do each of the subsequent steps in order. Once I did that, I was able to recreate the Instructor's results.

### Model D: 2 Hidden Layer Feed Forward Neural Network (ReLU Activation)

We can keep stacking layers with Feed Forward NN's. Now, we're starting to get to the "deep" part of deep learning. Here's an example. Note the difference in step 3, creating the model class.

### Steps

* Step 1: Load Dataset
* Step 2: Make Dataset Iterable
* Step 3: **Create Model Class**
* Step 4: Instantiate Model Class
* Step 5: Instantiate Loss Class
* Step 6: Instantiate Optimizer Class
* Step 7: Train Model!

In [21]:
'''
Step 3: Create Model Class (relu)
'''

class FeedForwardNeuralNetModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(FeedForwardNeuralNetModel, self).__init__()
        #Linear Function 1: 785 --> 100
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        #Non-Linearity 1
        self.relu1 = nn.ReLU()
        
        #Linear Function 2: 100 --> 100
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        #Non-Linearity 2
        self.relu2 = nn.ReLU()
        
        #Linear Function 3: 100 --> 10 (readout)
        self.fc3 = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        #Linear Function 1
        out = self.fc1(x)
        #Non-Linearity 1
        out = self.relu1(out)
        
        #Linear Function 2
        out = self.fc2(out)
        #Non-Linearity 2
        out = self.relu2(out)
        
        #Linear Func. (readout)
        out = self.fc3(out)
        return out

'''
Step 4: Instantiate Model Class
'''
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedForwardNeuralNetModel(input_dim, hidden_dim, output_dim)

'''
Step 5: Instantiate Loss Class
'''

criterion = nn.CrossEntropyLoss()

'''
Step 6: Instantiate Optimizer Class
'''

learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
Step 7: Train the ReLU model
'''
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images as Variable
        images = Variable(images.view(-1,28*28))
        labels = Variable(labels)
        
        #Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        #Forward pass to get outputs / logits
        outputs = model(images)
        
        #Calculate Loss: softmax --> Cross Entropy Loss
        loss = criterion(outputs, labels)
        
        #Get gradients w.r.t. parameters
        loss.backward()
        
        #Update parameters
        optimizer.step()
        
        iter += 1
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0 
            total = 0
            #Iterate through the test dataset
            for images, labels in test_loader:
                # Load images to Torch Variable
                images = Variable(images.view(-1,28*28))
                
                # Forward pass only to get outputs/logits
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
                
            accuracy = 100 * correct / total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data[0], accuracy))

Iteration: 500. Loss: 0.29816877841949463. Accuracy: 91.45
Iteration: 1000. Loss: 0.3057694137096405. Accuracy: 93.86
Iteration: 1500. Loss: 0.12594181299209595. Accuracy: 94.52
Iteration: 2000. Loss: 0.19790372252464294. Accuracy: 95.87
Iteration: 2500. Loss: 0.09370218217372894. Accuracy: 96.53
Iteration: 3000. Loss: 0.056080807000398636. Accuracy: 96.73


### Model E: 3 Hidden Layer Feed Forward Neural Network (ReLU Activation)

Let's stack another layer into our model. This reminds me of the "Sequential" model building in Keras. Very intuitive - the bonus with PyTorch is that you can see everything in front of your face (at least so far). Clean and simple.

This example may show that adding layers doesn't necessarily mean better accuracy. There's a tradeoff to weigh between accuracy and extraneous parameters to train & computation cost/time. 

In [22]:
'''
Step 3: Create Model Class (relu)
'''

class FeedForwardNeuralNetModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(FeedForwardNeuralNetModel, self).__init__()
        #Linear Function 1: 785 --> 100
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        #Non-Linearity 1
        self.relu1 = nn.ReLU()
        
        #Linear Function 2: 100 --> 100
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        #Non-Linearity 2
        self.relu2 = nn.ReLU()
        
        #Linear Function 3: 100 --> 100
        self.fc3 = nn.Linear(hidden_dim, hidden_dim)
        #Non-Linearity 3
        self.relu3 = nn.ReLU()
        
        #Linear Function 4: 100 --> 10 (readout)
        self.fc4 = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        #Linear Function 1
        out = self.fc1(x)
        #Non-Linearity 1
        out = self.relu1(out)
        
        #Linear Function 2
        out = self.fc2(out)
        #Non-Linearity 2
        out = self.relu2(out)
        
        #Linear Function 3
        out = self.fc3(out)
        #Non-Linearity 3
        out = self.relu3(out)
        
        #Linear Func. 4 (readout)
        out = self.fc4(out)
        return out

'''
Step 4: Instantiate Model Class
'''
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedForwardNeuralNetModel(input_dim, hidden_dim, output_dim)

'''
Step 5: Instantiate Loss Class
'''

criterion = nn.CrossEntropyLoss()

'''
Step 6: Instantiate Optimizer Class
'''

learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
Step 7: Train the ReLU model
'''
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images as Variable
        images = Variable(images.view(-1,28*28))
        labels = Variable(labels)
        
        #Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        #Forward pass to get outputs / logits
        outputs = model(images)
        
        #Calculate Loss: softmax --> Cross Entropy Loss
        loss = criterion(outputs, labels)
        
        #Get gradients w.r.t. parameters
        loss.backward()
        
        #Update parameters
        optimizer.step()
        
        iter += 1
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0 
            total = 0
            #Iterate through the test dataset
            for images, labels in test_loader:
                # Load images to Torch Variable
                images = Variable(images.view(-1,28*28))
                
                # Forward pass only to get outputs/logits
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
                
            accuracy = 100 * correct / total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data[0], accuracy))

Iteration: 500. Loss: 0.27787747979164124. Accuracy: 91.3
Iteration: 1000. Loss: 0.2696903944015503. Accuracy: 94.33
Iteration: 1500. Loss: 0.12281842529773712. Accuracy: 95.68
Iteration: 2000. Loss: 0.05858999863266945. Accuracy: 95.73
Iteration: 2500. Loss: 0.11681151390075684. Accuracy: 96.47
Iteration: 3000. Loss: 0.03441615030169487. Accuracy: 96.99


Notice that adding another layer barely registered on the overall accuracy of the model. No extra value added from the extra complications of a third layer. 

### Summary: Deep Learning
* 2 ways to expand a neural network
    * more non-linear activation units (neurons) i.e. make it wider
    * more hidden layers i.e. make it deeper
* Cons
    * More layers / neurons require more input data, or else benefits diminish quickly
        * "Curse of dimensionality"
    * Does not necessarily mean higher accuracy (see model E)