## Supervised Learning Training Examples

The majority of practical machine learning uses supervised learning. Supervised learning is used to learn the relationships (mapping function) from existing input and output data.  The goal is to be able to use the learned relationships to predict the output data from new input data.

### Model Training Flow

Training a model involves looping through several fundamental steps:

* Define model.
* Prepare input and target (label) data in a format that can be consumed by the model.
* Run the data through the computations defined by the model.
* Get the prediction (output).
* Compute loss by comparing prediction to target.
* Minimize loss by using an optimization algorithm to adjust the learned variables (weights, biases, ...)
![](img/trainingFlow.png)

### Simple Linear Regression

Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables:

* One variable, denoted x, is regarded as the predictor, explanatory, or independent variable.
* The other variable, denoted y, is regarded as the response, outcome, or dependent variable.

Linear regression uses a linear equation of the form:
$
   Y = WX + b
$

Where:
* **X**: input data
* **Y**: predicted data
* **W**: weight to be learned during training
* **b**: bias to be learned

For simplicity, in our linear regression example below, we will ignore the bias term.  So the relationship between our data (X, Y) is just Y = WX. We will try to determine (learn) the value of W.

We will create our data by letting X be a tensor with random values and Y is just double of X.

To begin with, all training math operations will be performed manually. These include: gradient and loss calculations, weight adjustment, etc.

In [None]:
# Linear Regression
import torch

# N is batch size; D_in is input dimension;
# D_out is output dimension.
N, D_in, D_out = 64, 1, 1

# Prepare sample data
x = torch.randn(N, D_in)
y = 1.1*x

# Randomly initialize weights
w = torch.randn(D_in, D_out)
print("Before learning w=", w, w.size())

learning_rate = 1e-4
for t in range(1000):    
    # Forward pass: compute predicted y
    y_pred = x.mm(w)
    
    # Compute and print loss; loss is a scalar, and is stored in a PyTorch Tensor
    # of shape (); we can get its value as a Python number with loss.item()
    loss=(y_pred - y).pow(2).sum()
    if (t%100 ==0):
        print(t, " loss=",loss.item(), " weight=", w.item())

    # Backprop to compute gradients of with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w = x.t().mm(grad_y_pred)
  
    # Update weights using gradient descent
    w -= learning_rate * grad_w
      
print("After learning weight=", w)        

### Simple Linear Regression using PyTorch

Next, we will use PyTorch provides functions to replace all training math operations that were performed manually in the previous example.

* *Define Model*

  Pytorch `nn` package defines a set of Modules, which you can think of as a neural network layer that produces output from input and may have some trainable weights. In the following example, we will use the `torch.nn.Linear` module.


* *Calculate loss*

   Although gradients of the loss function can be calculated manually, the operations are tedious and error-prone,   especially with complex neural networks. In the following example, we will replace the loss calculations with a  PyTorch pre-defined loss function.


* *Adjust learning variable(s)*

  PyTorch includes a number of optimization algorithms for trainable parameter adjustment. In the next example, we will use one of those optimizers and calling its .step() function to adjust the weights


In [None]:
# Linear regression using torch.nn.model
import torch

# N is batch size; D_in is input dimension;
# D_out is output dimension.
N, D_in, D_out = 64, 1, 1

# Prepare data
x = torch.randn(N, D_in)
y = 2*x

# Use PyTorch pre-defined loss function
# loss_fn = torch.nn.MSELoss(reduction='sum')  # for PyTorch 0.4.1
loss_fn = torch.nn.MSELoss(size_average=False) # for PyTorch 0.4.0

# Linear model
model=torch.nn.Linear(D_in, D_out, bias=False)

w = []
for p in model.parameters():
    w.append(p.data)
print("Before learning w=", w)

learning_rate = 1e-4
# Use PyTorch pre-defined optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

for t in range(1000):
    
    # Forward pass: compute predicted y
    y_pred = model(x)
    
    # Compute and print loss; loss is a scalar, and is stored in a PyTorch Tensor
    # of shape (); we can get its value as a Python number with loss.item().
    loss = loss_fn(y_pred, y)  
    
    w = []
    if (t%100 ==0):
        #w = []
        for p in model.parameters():
            w.append(p.data)
        print(t, " loss=",loss.item(), " weight=", w)
 
    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
for p in model.parameters():
    print("parameter:",p.data.size(), p.data)
       
x = torch.randn(2, D_in)
print('Input', x)
print('Predict', model.forward(x))    

## References

* https://github.com/jcjohnson/pytorch-examples
* https://machinelearningmastery.com/linear-regression-for-machine-learning/
* https://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/
* https://pytorch.org/tutorials/beginner/examples_nn/two_layer_net_module.htm