# Training a Simple Model with PyTorch

This is "boilerplate" code to train a simple model with PyTorch.
 
You can adapt this code for a variety of tasks.

There are four major parts:

  1.  Read training data
  2.  Construct a model
  3.  Train the model
  4.  Output the model

These steps are described in more detail below.

As a concrete example, this code reads training data from the file named "example.csv", constructs a simple model, and trains that model to fit the training data.


## Read Training Data

Training data consists of many training examples.  Each training example specifies inputs to the model and the desired output.

The training data is stored in a CSV file.  CSV stands for "comma separated values".  As the name suggests, a CSV file consists of a sequence of lines, where each line contains several values.  Here is an example:

    0.682,1.704,9.740
    0.150,1.252,4.331
    1.741,0.159,12.838
    1.575,1.586,16.1647

Each row in the CSV file represents one training example.  The last number is the desired model output, and the preceding numbers are the model inputs. So, for example, the first row of the CSV above means:

>   "When the model gets the numbers 0.682 and 1.704 as inputs, I want the model to emit 9.740 as output."

Model inputs are often represented by the variable x, and desired outputs by the variable y.  Similarly, the variable xs (the plural of x) represents a list of model inputs, and ys is a list of model outputs.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import csv

training_data = "example.csv"  # You'll probably want to change this!

xs = []  # will hold model inputs from all the training examples
ys = []  # will hold model outputs from all the training examples


for row in csv.reader(open(training_data, "r", encoding="utf-8-sig")):
    # Convert the entries in the row from strings to numbers
    row = list(map(float, row))

    x = row[:-1]  # model inputs, all but the last number in the row
    y = row[-1:]  # desired model output, the last number in the row

    xs.append(x)
    ys.append(y)


# Print the first five model inputs desired model outputs.
print("Example model inputs and outputs:")
for i in range(5):
    print(xs[i], ys[i])


# Machine learning can involve a TON of computation.  To make this more
# efficient, PyTorch needs training data in a special "tensor" format.
# Here, we do the conversion into that format.


xs = torch.tensor(xs)
ys = torch.tensor(ys)

Example model inputs and outputs:
[0.0, 1.0] [-0.020806968]
[3.0, 3.0] [-0.020843983]
[4.0, 6.0] [0.01165545]
[6.0, 8.0] [-0.010773897]
[8.0, 11.0] [-0.021383047]


## Construct a Model

We want to construct a model of that training data.  Given an input similar to one in the training data, the model should output a value close to the desired output.

We do part of this job, and PyTorch does the rest.

One of our responsibilities is to define the "model architecture", which is the rough form of the solution that PyTorch should seek.  But we can leave some parameters undetermined.  Later, PyToch will help us find good settings for those parameters.

PyTorch provides lots of mathematical building blocks that you can use to define a model architecture.  In our concrete example, we will use the most common one, which is called a "linear layer".

In our training examples, we specify two model inputs.  Let calls them x1 and x2.  A linear layer takes compute the function:

    A * x1 + B * x2 + C

where A, B, and C are parameters.  Later, we'll use PyTorch to figure out which values of these model parameters gives the best approximation of our training data.

To model more complicated phenomena, you will will need to experiment with more powerful architectures.  The goal is to find an architecture complex enough to accurately model the phenomenon (presumably something related to FRC robotics), but otherwise as simple as possible so that the model is easy and inexpensive to use.

In [2]:
class Model(nn.Module):

    def __init__(self):
        super().__init__()

        # Create building blocks of the model here.  If you want to
        # model a more complicated phenomenon, you will need to a more
        # complicated model architecture wit more building blocks.
        self.linear1 = nn.Linear(2, 1)

    def forward(self, xs):
        # Now we take a training example input and compute the corresponding
        # model output.  In an untrained model, this will likely be far
        # from the desired output, specified in the training data.  But as
        # the model trains, this output will get closer and closer to the
        # desired output.
        #
        # Confusingly, we can work with either a single training example or
        # a "batch" consisting of many training examples all at once.
        return self.linear1(xs)


model = Model()

## Train the Model

Training a machine learning model involves several steps:

  1. Shove one (or more) training example inputs into the model.
  2. Compare the model's output to the desired output, expressing the difference as a quantity, called the "loss".
  3. Use the loss to adjust the parameters inside the model.

When the loss (difference between the model's actual output and desired
output) get small enough, our model can come out of the oven!

All of the steps above are done with mysterious PyTorch incantations.

First we create an "optimizer", which is used to adjust the model parameters so that model outputs more closely match desired outputs specified in the training data.  We'll use a popular algorithm called "Adam".

There is an important number below called "LR", which you might want to fuss with, trying different values.

LR stands for "learning rate".  This specifies how aggressively the optimizer should update model parameters.  To understand LR, imagine you are playing a game of "getting warmer, getting colder".  Suppose you take a step and hear, "You're getting warmer!"  Now, how many more steps should you take in the same direction?  You could be cautious and take just 1 step.  Or you could be bold and take 10.  Going 10 steps might get you to your destination faster, but you might overshoot the target.

The optimizer faces the same challenge as it wanders around (varies model parameters) in search of the target (the best model, the one that most accurately reproduces the training data).  By setting the learning rate, you are advising the optimizer what strategy to use.


In [3]:
optimizer = optim.Adam(model.parameters(), lr=0.001)

For each training example, the model will produce some output.  This output will differ somewhat from the desired output specified in the training data.  We need some way to quantify this difference: how bad is the discrepancy?  PyTorch supports many choices, and you can define your own.

In our concrete example, we will use a simple loss function called "mean squared error" or MSE for short.  Here is how it works.

The loss is:

    (model's actual output - desired output) ^ 2

Here, "^ 2" means squared or multiplied by itself.  So, for example, if the model actually outputs 4, but the desired output is 6, then the MSE loss is (4 - 6) ^ 2 = (-2) ^ 2 = 4.  Usually, the loss is averaged over many training examples, which explains the word "mean" in the name.

The MSE loss function says small errors are not bad, but big errors are super bad.  Arguably, this could give too much influence to a few faulty training examples produced by some sensor glitch. So this might be a choice worth reconsidering.

In [4]:
loss_function = nn.MSELoss()

The actual machine learning happens in this next bit!  This is called the training loop.  We'll repeatedly push training examples forward through the model, compute the loss (discrepancy between actual model output and desired model output), and then work backward through the model, updating the model parameters in hopes of reducing the loss next time.

You might trying changing the number of times we go through the loop.  For example, you could try a higher learning rate and lower number of rounds or vice-versa.

In [5]:
for round in range(20000):

    # Prepare the optimizer for a pass through the training data.
    optimizer.zero_grad()

    # Compute the actual model output for each training example.
    # When people describe this process on paper, they often use the
    # variable y with a little hat (^) on top.
    yhats = model(xs)

    # Use the loss function to quantify the discrepancy between the actual
    # model outputs (yhats) and the desired outputs specified in the
    # training data (ys).
    loss = loss_function(yhats, ys)

    # Work backward through the model, determining the best direction
    # to modify each model parameter to reduce the lose, e.g. increase
    # this variable a little, decrease that variable by twice as much, etc.
    # This is called a "gradient".
    loss.backward()

    # Tell the optimizer to actually change the model parameters in hopes
    # of reducing the loss.  The magnitude of these changes is affected
    # by the learning rate and details of the optimization algorithm.
    optimizer.step()

    # Print to loss every now and then so we can track progress in
    # training the model.  Machine learning engineers spend a ridiculous
    # amount of time and emotional energy watching loss numbers drop
    # (and sometimes rise!) over time.  Welcome to the party!
    if round % 1000 == 0:
        print("round", round, "loss =", loss.item())

round 0 loss = 31.97720718383789
round 1000 loss = 0.23800192773342133
round 2000 loss = 0.08449573069810867
round 3000 loss = 0.016991127282381058
round 4000 loss = 0.004862884525209665
round 5000 loss = 0.004318441264331341
round 6000 loss = 0.004315678961575031
round 7000 loss = 0.004315678961575031
round 8000 loss = 0.004315678961575031
round 9000 loss = 0.004315678961575031
round 10000 loss = 0.004315678961575031
round 11000 loss = 0.004315678961575031
round 12000 loss = 0.004315678961575031
round 13000 loss = 0.004315839149057865
round 14000 loss = 0.004315678495913744
round 15000 loss = 0.004316328093409538
round 16000 loss = 0.0043157050386071205
round 17000 loss = 0.004315678495913744
round 18000 loss = 0.004315678961575031
round 19000 loss = 0.004315714351832867


## Output and Use the Model

All the hard work is done!  Now we need to display the model on the screen or store the model to a file.  Here, we'll just print out the parameters. The interpretation of these parameters depends on the model architecture you defined above.

In [6]:
for name, parameters in model.named_parameters():
    print(name, parameters)

linear1.weight Parameter containing:
tensor([[ 0.0056, -0.0053]], requires_grad=True)
linear1.bias Parameter containing:
tensor([-0.0047], requires_grad=True)



For the concrete example, the output is something like:

    linear1.weight Parameter containing:
    tensor([[6.9722, 2.9863]], requires_grad=True)
    linear1.bias Parameter containing:
      tensor([4.0468], requires_grad=True)

This means that for input x1 and x2, the model outputs the value:

    6.9722 * x1 + 2.9863 + 4.0468

The example training data was generated using this formula:

    output = 6 * x1 + 3 * x2 + 4 + random noise

So the model has successfully rediscovered the principle used to produce the training data, despite the addition of random noise to make it harder.

This is a simple model with a simple expression that represents it. But if it were more complex, rather than infering the underlying expression in order to compute output values and plugging in values, we could instead just ask the model:

In [7]:
# Set the model to evaluation mode
model.eval()

# Create some sample input data
sample_input = torch.tensor([1.0, 1.0])

# Make predictions (_no_grad() turns off learning)
with torch.no_grad():
    predictions = model(sample_input)

print("Prediction:", predictions.item())

Prediction: -0.004366518463939428


The results of our training (the parameter values) are stored in memory as part of the running program.  When this program ends, we lose that memory.  To save the parameters for future use, such as on the actual robot, we can save the parameters to a file:

In [8]:
torch.save(model.state_dict(), "controlbot.pth")

Then, during a separate program execution, we can load the parameters back into the model and run predictions as before:

In [9]:
model.load_state_dict(torch.load("controlbot.pth", weights_only=True))

<All keys matched successfully>

## That's all!

Hopefully, you can use machine learing to discover physical principles at play in FRC robotics, despite noisy sensors, motors, and occasionally crashing into stuff!

Good luck!