# Deriving parameter estimates from first principles

In this problem, we are going to move away from using inbuilt methods for fitting regression models and, instead, write our methods for doing this. We are then going to investigate how different loss functions result in different parameter estimates.

This problem is optional as it is more mathematically involved. If you don't feel like doing this maths, you can skip ahead to the next section.

We are first going to estimate these parameters using the "least-squares" method. This method selects values for $(a,b)$ by minimising the sum of square errors (SSE) between model predictions and observations:

\begin{equation}
\text{SSE} = \sum_{i=1}^{n} (\text{price}_i - (a + b *\text{points}_i))^2
\end{equation}

Write a function which takes a single two-element vector argument ("params", with "params[0]=a" and "params[1]=b") as an argument and returns the SSE.

We are now going to use Scipy's optimiser to find the parameters which minimise the SSE. To do so use:

`import scipy.optimize as optimize`

then use the `optimize.minimize` function to find them. Note, one of the function arguments is an initial guess of the parameter values. For this choose: $a=-400$ and $b=4$. Note also that the estimates are given as a dictionary and the "x" key corresponds to the parameter values.

Confirm that these estimates are the same as were obtained via the sklearn approach.

We are now going to use the sum of absolute errors (SAE) as our distance function and use it to determine new estimates:

\begin{equation}
\text{SAE} = \sum_{i=1}^{n} |\text{price}_i - (a + b *\text{points}_i)|
\end{equation}

Write a function which returns the SAE. Then use it as an input to Scipy's optimiser to find the new parameter estimates. Compare the regression line from these estimates with the least-squares line.

Compare the estimates of the $b$ parameter from the least-squares and absolute deviation loss functions. Why are they different?