## The Loss Function

In the last two exercises, we predicted the rent of some apartments by feeding them through neural networks. The networks were untrained, so the predictions weren’t very good.

Let’s start improving those predictions!

To make better predictions, we first need to know how bad our current predictions are and how to track improvement as we train our model.

This is the role of the **loss function**. The loss function is a mathematical formula used to measure the error (also known as loss values) between the model predictions and the actual target values (sometimes called labels) the model is trying to predict.

### Difference

Suppose we run a neural network on an apartment with $1000/mo rent, but the network predicts $500/mo. The simplest way to measure the loss in this case is to calculate the difference

$$
500−1000=−500
$$

The difference indicates that the prediction was `500` dollars below the actual rent.

For just one data point, the difference seems reasonable. But imagine we have a second apartment with rent $1500, but the model **overestimates** the rent as $2000. The difference in this case is

$$
2000−1500=500
$$

With one loss of `500` and another of `-500`, the average loss for the model is actually `0`! But that doesn’t make sense, since this network isn’t perfectly accurate.

The problem is that the negative loss cancelled out the positive loss. To fix this, we need to force the difference to always be positive.

### Mean Squared Error

One of the most common loss functions is Mean Squared Error (MSE). MSE makes differences positive by squaring them. To calculate MSE on our two example apartments, we would

- calculate the diffferences: `500`, `-500`
- square both: `500^2`, `(-500)^2`
- take the average:
$$
\frac{(500 - 1000)^2 + (1500 - 1000)^2}{2} = 250{,}000
$$
A loss of `250,000` seems very high, but remember that we’ve squared all the individual differences. To help interpret MSE, we’ll sometimes take the square root of the MSE:

$$
\sqrt{250000} = 500
$$

An average loss of `500` makes a lot of sense in this case!

### MSE in PyTorch

PyTorch has already implemented most of the common loss functions. To use PyTorch’s implementation of MSE, we’d run

```pyhton
loss = nn.MSELoss()
```

Now that we’ve instantiated `loss`, we can compute the mean squared error by passing two inputs:
- the predicted values
- the actual target values

Just as we use `X` to stand for input features in a neural network, it is common in machine learning to use the variable `y` to stand for the target values. In this case, our target isn’t two-dimensional, so we use lowercase `y`.

Let’s calculate MSE for our two example apartments:

```python
predictions = torch.tensor([500,2000],dtype=torch.float)
y = torch.tensor([1000,1500],dtype=torch.float)
print(loss(predictions,y))
```

### Choosing a Loss Function

The loss function plays a key role in training, so it is important to select the right one. Sometimes it is worth experimenting with a few different loss functions, to see how each behaves.

For example, the squaring process in MSE emphasizes the largest differences between predicted and target values. Sometimes, this is helpful, but in other cases it can lead to overfitting. In those cases, instead of *squaring* differences we might choose to take the *absolute value* to produce positive values (this is called the **Mean Absolute Error**).