# Notes on Lesson 4


<b>Tensors</b>

  <u>Tensors</u> are like arrays. Pytorch tensors are better for deep learning than numpy arrays because they are faster and support GPUs, and they support calculating gradients.
  
  
  <u>Rank</u> is the number of axes or dimensions in a tensor.  Types of tensors:
  
    - rank zero: scalar
    - rank one: vector
    - rank two: matrix

  
  <u>Shape</u> is the size of each axis of a tensor.
  

In [2]:
#hide
from fastai2.vision.all import *
from utils import *

In [3]:
data = [[1,2,3],[4,5,6]]
arr = array (data)
tns = tensor(data)

In [4]:
tns  # pytorch

tensor([[1, 2, 3],
        [4, 5, 6]])

In [5]:
tns[0] # select a row of the tensor array

tensor([1, 2, 3])

In [6]:
tns[:,2] # select a column, or dimension of the tensor array

tensor([3, 6])

In [7]:
tns+5 # you can perform computations easily on tensors

tensor([[ 6,  7,  8],
        [ 9, 10, 11]])

We need to see how good our model is, so we calulate the metric using a validation data set.  We can do this by writing a function that calulates the error rate by computing the mean distance between the valid images and our image that we're testing.

Broadcasting is a feature where PyTorch will automatically expand the tensor with the smaller rank to have the same size as the one with the larger rank.




## Stochastic Gradient Descent (SGD)

Here are the 7 steps of SGD to turn this function into a machine learning classifier:

1.  **Initialize** the weights: we need to initialize to random values
2.  For each image, use these weights to predict whether it appears to be a three or a seven
3.  **Loss**: Based on these predictions, calculate how good the model is (its loss).  We can see if we need to adjust the weights of a model.
4.  Calculate the **Gradient**, which measures for each weight, how changing that weight would change the loss.  
The **gradient** only tells us the **slope** of our function, it doesn't actually tell us exactly how far to adjust the parameters. But it gives us some idea of how far; if the slope is very large, then that may suggest that we have more adjustments to do, whereas if the slope is very small, that may suggest that we are close to the optimal value.
5.  **Step** (that is, change) all weights based on that calculation.  You can try increasing and decreasing the weights to see the results.  The **learning rate (LR)** is multiplying the gradient by a small number, and use optimization. We can use mini batches, or batch size to do step optimization so that it's more efficient and doesn't take as long.  We can use a `DataLoader` can take any Python collection, and turn it into an iterator over many batches.

   w -= gradient(w) * lr
   

6.  Go back to the second step, and repeat the process
7.  ...until you decide to **stop** the training process (for instance because the model is good enough, or you don't want to wait any longer).  You would typically stop when the accuracy of the model starts to get worse.

**Sigmoid** is a function that is used to make sure values of 0 to 1 are returned when calculating the gradient.

**Remember**

- *activations*: numbers that are calculated (both by linear and non-linear layers)
- *parameters*: numbers that are randomly initialised, and optimised (that is, the numbers that define the model)

```python
for x,y in dl:
    pred = model(x)
    loss = loss_func(pred, y)
    loss.backward()
    parameters -= parameters.grad * lr
```

## Optimizing the Model

- Create a function that can solve any problem to any level of accuracy (the neural network) given the correct set of parameters

- Find the best set of parameters for any function (stochastic gradient descent)

- `learner.fit`:  We can create a **learner** by passing it in the data loader collection along with all the validation data set.

        learn = Learner(dls, nn.Linear(28*28,1), opt_func=SGD,
        loss_func=mnist_loss, metrics=batch_accuracy)