# Single neuron with PyTorch
In the following code we will define and train a single neuron model in order to predict the y value of a linear function given an x value.

## Define the problem we want to solve
First, we define the linear function that the neuron will try to learn

In [1]:
def f(x):
    return 2*x+1

## Define the neural network
Now we need to define the neuron. We will do this by using the **torch.nn** library. This library contains the **Module** class. Our model (in this case a neuron) will always be a subclass of this class. The **torch.nn** library also contains definitions for different types of layers, such as fully connected linear layers or convolutional layers. It also contains different loss criterions and activation functions

In [2]:
import numpy as np
import torch
import torch.nn as nn

# Here we define a neural network class called Net. When defining a neural network, it always has to be a subclass of nn.Module
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # Here we define the layers of our neural network.
        # Since we want only one neuron, we use a linear layer with one input and one output
        self.fc = nn.Linear(1, 1)
    
    # All the neural networks that we define must have a forward function.
    # This function propagates the input through the neural network and produce the output
    def forward(self, x):
        x = self.fc(x)
        return x

Now that we have a class **Net()** with the desired structure, we can define an instance of this class

In [3]:
neuron = Net()

We can access the weights and biases of our neural network by using **state_dict()**

In [4]:
print(neuron.state_dict())

OrderedDict([('fc.weight', tensor([[0.0838]])), ('fc.bias', tensor([-0.0069]))])


Of course, since the parameters of the neuron are initiated randomly, the neuron will not be able to predict the correct output given an input

In [5]:
x = np.random.uniform(5)
target = f(x)
output = neuron(torch.Tensor([x]))
print('Real value: {:.2f} || Prediction: {:.2f}'.format(target, output.detach().numpy()[0]))

Real value: 7.56 || Prediction: 0.27


Now that we are able to produce an (incorrect) output given an input, we need a criterion to tell the neural network how good or bad was the prediction that it made. There are several loss functions available, but we will start by using the simplest one which is the mean squared error loss, given by
\begin{equation}
\text{MSE}(\text{output}, \text{target}) = (\text{output}-\text{target})^2
\end{equation}
**torch.nn** includes the definition of this loss function as **nn.MSELoss()**. We will define criterion as the **nn.MSELoss()** function to make it more explicit that when we are using this function, it is because that is our criterion of how good/bad was the output produced.

In [6]:
criterion = nn.MSELoss()

When defining a neural network with PyTorch, the tensors used keep track of all the operations that have been performed on them. Thanks to this information we can use the loss function obtained (which has all the information of what operations have been performed by the neural network to finally obtain the loss) in order to perform all the gradients that will be needed for training of the neural network.

At this point we know how to produce an output with our neural network and we have a criterion of how good was the output obtained with a loss function. Now we need to backpropagate the loss through the neural network. For this, we will use the standard gradient descent, which will calculate derivatives in order to change the parameters in the direction that reduces the value of the loss
\begin{equation}
\text{parameter} = \text{parameter} - \gamma\frac{\partial L}{\partial (\text{parameter})}
\end{equation}
where $\gamma$ is the so-called learning rate. Selecting an appropiate value for the learning rate is important since a low value would result in a slow training and a high value would yield parameters corrections that make the loss value jump around instead of getting closer to a minimum.

**torch.optim** has different optimizers, but we will use the SGD for this example.

In [7]:
import torch.optim as optim
# Define the optimizer to be used.
#The SGD takes as arguments the parameters of our neural network and a value for the learning rate.
optimizer = optim.SGD(neuron.parameters(), lr=0.01)

## Training the neuron
We can train the neuron with a simple loop over the number of points that we want to sample for the training process. The loop will perform the following steps:
1. Since the backward gradient calculation method adds the newly calculated gradient to the previous gradient and we do not want this, we will start each step by setting the gradients to zero.
2. We select a random value of $x$ and evaluate the function and the neuron for that input.
3. We obtain the loss by comparing the target value and the output of our neuron by using our criterion (in this case MSELoss).
4. Once we have the loss, we can call the **backward()** function to calculate all the gradients based on the loss obtained.
5. With the gradients calculated, we can use the **step()** function of our optimizer to perform one step of the gradient descent based on the gradients and our learning parameter.
6. With the previous steps we have performed a small correction of the parameters of the neuron. Repeating these steps several times should converge to the correct weight and bias that models the target linear function.

In [8]:
# We sample for 3000 points
for i in range(3000):
    # The first step is to set to zero all the previously calculated gradients
    optimizer.zero_grad()
    # We will sample points from the [0,5] interval
    input = np.random.uniform(5)
    # Evaluate f(x) for our current point
    target = f(input)
    # Pass our current point through the neuron to obtain the output
    output = neuron(torch.Tensor([input]))
    # Compare the output with the target to obtain the loss
    loss = criterion(output, torch.Tensor([target]))
    # Calculate the gradients based on the obtained loss
    loss.backward()
    # Update the parameters by SGD based on the gradients obtained
    optimizer.step()
    # Print the parameters every 200 samples
    if i%200 == 0:
        print('Iteration {}|| weight:{}, bias:{}'.format(i, neuron.state_dict()['fc.weight'][0], neuron.state_dict()['fc.bias']))

Iteration 0|| weight:tensor([0.2515]), bias:tensor([0.0839])
Iteration 200|| weight:tensor([2.0809]), bias:tensor([0.7398])
Iteration 400|| weight:tensor([2.0548]), bias:tensor([0.8467])
Iteration 600|| weight:tensor([2.0325]), bias:tensor([0.9118])
Iteration 800|| weight:tensor([2.0143]), bias:tensor([0.9453])
Iteration 1000|| weight:tensor([2.0104]), bias:tensor([0.9674])
Iteration 1200|| weight:tensor([2.0074]), bias:tensor([0.9788])
Iteration 1400|| weight:tensor([2.0036]), bias:tensor([0.9864])
Iteration 1600|| weight:tensor([2.0029]), bias:tensor([0.9912])
Iteration 1800|| weight:tensor([2.0016]), bias:tensor([0.9945])
Iteration 2000|| weight:tensor([2.0010]), bias:tensor([0.9967])
Iteration 2200|| weight:tensor([2.0006]), bias:tensor([0.9980])
Iteration 2400|| weight:tensor([2.0004]), bias:tensor([0.9987])
Iteration 2600|| weight:tensor([2.0002]), bias:tensor([0.9993])
Iteration 2800|| weight:tensor([2.0001]), bias:tensor([0.9996])
