What is a fully connected network (FCN for short)?

A fully connected network is a type of artifical neural network whose structure consists of layers connected to other layers by the neurons/nodes in each layer. For example, if there is a FCN that has 3 layers with layer 1 having 10 neurons, layer 2 having 26 neurons, and layer 3 having 14 neurons. In this example, each one of the neurons in layer 1 has a connection to every single neuron in layer 2 and the same concept holds true for the relation between layer 2 and layer 3. What is also interesting is that you can figure out the amount of connections that are there between two layers. The equation for that the N1*(N1*N2) where N1 is the number of neurons in the layer on the left and N2 is the number of neurons on the right.


How do Fully Connected Networks work?

FCNs work by having a neuron/perceptron apply a linear transformation to the input vector through a weight matrix. Then a non-linear transformation is applied to the product of the input vector and weight matrix through a non linear activation function (show equation image).

Basically, we are taking the dot product of the weight matrix W and the input vector x. Then the bias term W0 will be added inside the non linear function. (A Bias term is a disproportionate weight in favor or against an idea or thing). In even simpler terms, we are doing vector multiplication. For example, we have an input vector of 1x9 and a weight matrix of 9x4. We will take the dot product of (1x9) and (9x4) and then apply the non-linear transformation with the activation function f to get an output vector of (1x4) (show the second image before the example and the third fcl image after the example).

How does a FCN differ from a CNN?

The biggest difference from what I seen online in doing my own research is that FCNs are structurally agnostic meaning that they don't make any special assumption about the input given whereas a CNN is designed to assume that the input are specifically images. 

This broad assumption that FCNs have can be quite useful if one wants to train different data. However, due to the broad assumption, the performance of a FCN is not a great compared to a neural network that is designed for a specific kind of input, like a CNN. Another advantage is that FCNs have more expressive power compared to CNNs due to convolution being linear.

The specific focus that a CNN is designed for is quite useful as one can process the input of images quite quickly compared to a FCN. However, the main disadvantage of this type of neural network is that you can only train the network on images and nothing else which is where the FCN comes in. Another advantage is that CNNs seem to be more efficient in utilizing their parameters. FCNs tend to require a greater number of parameters to compete to an equivalent CNN.

So, overall, depending on your needs, a FCN can be better than a CNN and vice versa.

What are some ways that FCNs can be used?

In searching for good real world examples of FCNs as a way to better explain FCNs, I came across three different papers that peaked my interest. 

The first one is called Intra Prediction using Fully Connected Network for Video Coding which is by Jihao Li, Bin Li, Jizheng Xu, and Ruiqin Xiong. 

The second one is called Fully Connected Network on Noncompact Symmetric Space and Ridgelet Transform based on Helgason Fourier Analysis which is by Sho Sonoda, Isao Ishikawa, and Masahiro Ikeda.

The third one is called How Far Can We go Without Convolution: Improving Fully Connected Networks which is by Zhouhan Lin, Roland Memisevic, and Kishore Konda.

I will go over the three world examples after my example code.

The goal of my example code is to train a FCN to classify handwritten digits from 0-9. For my code, I am using the Keras library mainly instead of PyTorch because I have tried to get PyTorch to work on my machine and it seems to not work (don't know what went wrong, but it probably is user error).

In [31]:
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim

In [32]:
# Define the fully connected neural network model
class NeuralNetwork(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

In [None]:

# Set the device for training
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
# Define hyperparameters
input_size = 784  # Input size of MNIST dataset (28x28 pixels)
hidden_size = 500
num_classes = 10
learning_rate = 0.001
batch_size = 100
num_epochs = 5

In [None]:
# Load the MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=torchvision.transforms.ToTensor(), download=True)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, transform=torchvision.transforms.ToTensor())

# Create data loaders
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

In [33]:

# Initialize the neural network
model = NeuralNetwork(input_size, hidden_size, num_classes).to(device)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

In [34]:
# Train the neural network
total_steps = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Reshape images to (batch_size, input_size)
        images = images.reshape(-1, input_size).to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i+1) % 100 == 0:
            print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{total_steps}], Loss: {loss.item():.4f}')

In [35]:
# Test the neural network
model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, input_size).to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print(f'Accuracy of the network on the 10000 test images: {(100 * correct / total):.2f}%')

TypeError: 'NeuralNetwork' object is not iterable