# What is a loss function?

It is the deviation between the predicted value to the actual value.

Let's dive deeper...
$$
x: \; input \\
y: \; expected\,output \\
\hat{y}: \; current\,predicted\,output
$$


When Δ is 0, our prediction is accurate.
$$
\Delta = y - \hat{y}\\
$$

But how do we get there? What can we change in the equation above?
$$
foo(x) \; \rightarrow \; y \\
\Delta = foo(x) - \hat{y}\\
$$

We can change the params in the `foo(x)` function!

```
foo(x, params)
    x * params[0] + params[1]
```

#### What does this mean in terms of machine learning?

$$
\Delta = Loss\,Function
$$
___
**Exercise 1**: Given the following data points, find the param values and write an algorithm to achieve 0.

In [4]:
data = [(0, 100), (1, 150), (2, 200), (3, 250)]
params = (50, 100)

def pred(x, params):
    p0, p1 = params
    return p0 * x + p1

def loss(data, pred, params):
    loss = 0
    for datum in data:
        x, y = datum
        loss += (y - pred(x, params))**2 # Why square? Keeps the loss range between 0 to ∞.
        
    return loss

loss(data, pred, params)

0

___

#### What does this mean in terms of math?
$$
X: \; A\,set\,of\,input,\; x_{i} \\
Y: \; A\,set\,of\,expected\,output,\; y_{i} \\
P_{0}: \; params[0] \\
P_{1}: \; params[1] \\
$$

> *i subscipt is equivalent to the index of the array*

$$
\hat{y}_{i} = P_{0}x_{i}+P_{1} \\
L = \Sigma_{i=0}^{3} (P_{0}x_{i} + P_i - y_{i})^{2}
$$

Loss as a function:
$$
L (P_{0},P_{1})
$$

To find L we need to find the params values. How do we solve this? Derivative!



# How to take derivative?

## Power rule
$$
nx^{n-1}
$$

Example:
$$
f(x) = 3x^{2} \\
\frac{\partial f}{\partial x} = 6x
$$


## Chain rule
$$
f(g(x)) \\
\frac{\partial f}{\partial x} = \frac{\partial g}{\partial x} \times  \frac{\partial f}{\partial g}
$$

Example:
$$
g(x)= 3x^{3} \\
f(g)=g^2 \\
9x^2 \times 2g \\
9x^2 \times 6x^3 \\
\Rightarrow 54x^5
$$
