In [6]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class LinearNeuron(nn.Module):
    def __init__(self):
        super(LinearNeuron, self).__init__()
        self.fc1 = nn.Linear(3, 1)

    def forward(self, x):
        x = self.fc1(x)
        return x

model = LinearNeuron()

x = torch.randn(1, 3)
y = model(x)

onnx_program = torch.onnx.dynamo_export(model, x)
onnx_program.save("models/linear_neuron.onnx")



Here's a simple neuron that we're well accustomed to by this point. It effectively performs linear regression and outputs a single prediction with any set of features that we want. To visualize this file you can visit [Netron App](https://netron.app/) and load the onnx model file in the current directory. Here's how the graph looks like when we display it with Netron. 

![Linear Neuron Graph](images/linear_neuron.png)

So how would logistic regression look like? Let's assume that it can only handle binary classifications for now for simplicity. 

In [7]:
class LogisticNeuron(nn.Module):
    def __init__(self):
        super(LogisticNeuron, self).__init__()
        self.fc1 = nn.Linear(3, 1)

    def forward(self, x):
        x = self.fc1(x)
        x = F.sigmoid(x)
        return x

model = LogisticNeuron()

x = torch.randn(1, 3)
y = model(x)

onnx_program = torch.onnx.dynamo_export(model, x)
onnx_program.save("models/logistic_neuron.onnx")



It's essentially the same thing as a linear node but now we have sigmoid activation applied on the linear node. 

![Logistic Neuron Graph](images/logistic_neuron.png)

Next, in order to handle multiple classes for logistic regression we introduced the idea of softmax probabilities. There are many outputs now in the network since we can classify multiple things. 

In [8]:
class SoftmaxLayer(nn.Module):
    def __init__(self):
        super(SoftmaxLayer, self).__init__()
        self.fc1 = nn.Linear(3, 10)
        self.softmax = nn.Softmax(dim=0)
        
    def forward(self, x):
        x = self.fc1(x)
        x = self.softmax(x)
        return x

model = SoftmaxLayer()

x = torch.randn(1, 3)
y = model(x)

onnx_program = torch.onnx.dynamo_export(model, x)
onnx_program.save("models/softmax_layer.onnx")



![Softmax Layer Graph](images/softmax_layer.png)

Now we actually produce 10 logistic neurons since there can be 10 outputs in our custom model. After that we apply softmax on the outputs to convert the inputs to probabilities. Regardless, this can also be done with linear regression. In neural net schemes, this would be called a layer since there are multiple neurons now that depend on the same input. Afterwards, a very natural extension is to add many more layers in sequential order and this makes neural nets learn powerful non-linear relationships of the data. Of course, each layer connects to the previous layer. This structure is also known as a feedforward network. If we focus on the mnist dataset, a typical feedforward net is to use 256 neurons in the 1st layer, then add a hidden layer with 128 neurons and then finally the softmax layer at the end. Let's visualize how such a network might look. 

In [9]:
class FeedForwardNet(nn.Module):
    def __init__(self):
        super(FeedForwardNet, self).__init__()
        self.fc1 = nn.Linear(3, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 10)
        self.softmax = nn.Softmax(dim=0)
        
    def forward(self, x):
        x = F.sigmoid(self.fc1(x))
        x = F.sigmoid(self.fc2(x))
        x = self.fc3(x)
        x = self.softmax(x)
        return x

model = FeedForwardNet()

x = torch.randn(1, 3)
y = model(x)

onnx_program = torch.onnx.dynamo_export(model, x)
onnx_program.save("models/feedforward_net.onnx")



![FeedForward Network](images/feedforward_net.png)

It's typical to use a non linear activation function between the layers. RELU is a famous one that's used today. For now however, let's just stick with sigmoid. Here the network is a lot more complex than the previous ones we've working with. This one in particular has 2 hidden layers with many number of neurons for the layers and a softmax layer for the output. In the previous examples, there was only one layer which was the softmax layer. These hidden layers are key to the entire network and allows it to learn powerful representations for datasets like the mnist. Visualizing this network in such a way helps a lot as we'll be writing this along with backpropagation completely using numpy and nothing else. 