# Fitting

Taking some material from https://github.com/klieret/HEPFittingTutorial/

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize, curve_fit

## Fit a line to data by minimizing the squared distance

The basic idea of fitting consists of minimizing a cost function by adjusting parameters of some function template.

Let's define some random data:

In [None]:
x_data = np.linspace(-1, 1, 10)
y_data = -1 + 3 * x_data + 2 * np.random.random_sample(len(x_data))

And take a quick look at it:

In [None]:
plt.plot(x_data, y_data, 'ko')

This looks like a line, so this is what we want to fit:

In [None]:
def line(x, params):
    a, b = params
    return a * x + b

The line is defined as $f(\vec x) = a \vec x + b$ and maps a vector of x coordinates to y coordinates.
The two parameters a and b are collected in vector ``params``.

Now the idea is to minimize the distance of our function ``line`` to the y coordinates of the data, so we define
another function ``chi2``, which, for every set of parameters returns the sum of the squared distances of data points to function values:

In [None]:
def chi2(params):
    return np.sum(np.square(y_data - line(x_data, params)))

Let's look at this step by step: 

* ``line(x_data, params)``: Here we passed on the parameters of ``chi2`` to the line function which we evaluate for all the data x values. The result is a vector y values.
* ``y_data - line(x_data, params)``: This is then the vector of distances between the data y values and the y values of our function
* ``np.square(y_data - line(x_data, params))``: The vector of squared distances
* ``np.sum(np.square(y_data - line(x_data, params)))``: Summing everything up

First, try to manually tune the parameters and see how `chi2` changes:

In [None]:
params = [3, 0.1]
plt.plot(x_data, y_data, 'ko', label="data")
plt.plot(x_data, line(x_data, params), label="line")
plt.legend()
print("Chi2: ", chi2(params))

Of course we don't do this manually in practice - especially with higher number of parameters this can be challenging.

Scipy provides [`scipy.optimize.minimize`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html). It needs start points, which we just set to ``(0, 1)``

In [None]:
result = minimize(chi2, (0, 1))

Note how ``minimize`` is a higher order function, that takes a function as first argument!

The results object contains quite a lot of useful information, but we just want the values of our parameters:

The value of `chi2` for the parameters that minimize it is:

In [None]:
chi2(result.x)

Let's plot to see how well it fits:

In [None]:
# Plotting our data point
plt.plot(x_data, y_data, 'ko', label="data")
plt.plot(x_data, line(x_data, result.x), label="fit")
plt.legend()

What we just did is called a "Least Square Fit"

Fitting an arbitrary (non-linear) Function to  weighted points (= measurements with uncertainties).

**Basic principle:**  
Minimize $\chi^2$, the quadratic difference between measurement points and fit, weighted by inverse uncertainty squared:

$$ \chi^2 = \sum \frac{(y_{meas}-y_{fit})^2}{  ( \Delta y )^2} $$

The resulting value for $\chi^2$ is an important check whether the fitting model is sensible:

$$ \left< \frac{\chi^2}{  (n_{points} - n_{par} )} \right> \approx 1$$

$n_{points} - n_{par}$ is the number of degrees of freedom (ndf) in our optimization problem.

**Note** In our example with the line we didn't divide by the uncertainty in the definition of `chi2`, so we made the implicit assumption that all data points are equally weighted.

## Using `scipy.optimize.curve_fit`

For a least-square-fit we don't need to write the cost function manually, but can instead use `scipy.optimize.curve_fit`.

For curve fit we don't need to put all parameters into one tuple. Instead the first argument of the fit function is interpreted to be the input data, and all remaining arguments the parameters:

In [None]:
def f(x, a, b):
    return line(x, [a, b])

`curve_fit` will return the fitted parameters and their covariance matrix

In [None]:
pfit, pcov = curve_fit(f, x_data, y_data, p0=(1, 0))

In [None]:
pfit, pcov

### Fit uncertainties and covariances

The covariance matrix shows the variances on the fit parameters on the diagonal and their covariances on the off diagonal elements.

So the one standard deviation errors on the fit parameters will be given by

In [None]:
a, b = pfit
a_err, b_err = np.sqrt(np.diag(pcov))
print("a = {:.2f} +/- {:.2f}".format(a, a_err))
print("b = {:.2f} +/- {:.2f}".format(b, b_err))

The covariances indicate how much the parameters are correlated, that is how much greater values of one parameter correspond to greater values of the other one.

You can try this out. Change on of the parameters slightly and see if there is a way to get a better `chi2` again by changing the other one:

In [None]:
a_test = a + 1
(b_test,), _ = curve_fit(lambda x, b: f(x, a + 1, b), x_data, y_data, p0=(0))

In [None]:
a_test, b_test = [a + 1, b]
plt.plot(x_data, y_data, "ko")
yfit = f(x_data, a_test, b_test)
plt.plot(x_data, yfit)
print("Chi2:", ((yfit - y_data) ** 2).sum())

Also, if we fix a to a value slightly away from the optimum, and fit `b` again, the optimal `b` won't change significantly:

In [None]:
curve_fit(lambda x, b: f(x, a + 1, b), x_data, y_data, p0=(0))

In this case, we have the same number of points on the positive and negative x-axis, so a higher slope (parameter `a`) puts the data points on the positive x-axis below the fitted line and on the negative x-axis above the fitted line.

Changing the parameter `b` will cause an overall shift to the top or bottom, so there is no way to compensate a less optimal value of `a`.

The situation changes if we only look at data points on the positive x-axis:

In [None]:
x_data2 = np.linspace(0, 1, 10)
y_data2 = -1 + 3 * x_data2 + np.random.random_sample(len(x_data2))

In [None]:
plt.plot(x_data2, y_data2, "ko")

In [None]:
pfit2, pcov2 = curve_fit(f, x_data2, y_data2, p0=(1, 0))
pfit2, pcov2

Here, the off diagonal elements have a significant (negative) non-zero value.

In [None]:
a2, b2 = pfit2

Now we can compensate a slightly higher value of `a` by a slightly lower value of `b`:

In [None]:
a2_test, b2_test = [a2 + 1, b2]
plt.plot(x_data2, y_data2, "ko")
yfit = f(x_data2, a2_test, b2_test)
plt.plot(x_data2, yfit)
print("Chi2:", ((yfit - y_data2) ** 2).sum())

In [None]:
curve_fit(lambda x, b: f(x, a2 + 1, b), x_data2, y_data2, p0=(0))

Since we know the "real" distribution of our data points (we generated them after all) we can check with toy experiments how well the estimation of uncertainties and covariances from the fit worked:

In [None]:
def fit_toy():
    x_data = np.linspace(0, 1, 10)
    y_data = -1 + 3 * x_data + np.random.random_sample(len(x_data))
    pfit, pcov = curve_fit(f, x_data, y_data, p0=(1, 0))
    return pfit, pcov

In [None]:
toy_params = np.array([fit_toy()[0] for i in range(1000)]).T

In [None]:
plt.scatter(*toy_params, marker=".")

In [None]:
np.cov(toy_params)

In [None]:
pcov2

Which looks pretty similar! Note, however, these intervals are typically interpreted as confidence intervals, so what we would actually be interested in is, do 68.27% of all intervals (in repeated experiments that assume different true values) contain the true value ([coverage probability](https://en.wikipedia.org/wiki/Coverage_probability)).

### Error propagation

One thing we can do with this covariance matrix is to visualize uncertainties on the fit using [linear error propagation](https://en.wikipedia.org/wiki/Propagation_of_uncertainty). In this case we can simply calculate it manually:

$$\sigma_y ^ 2 = \left(\frac{\partial y}{\partial a} \sigma_a \right)^2 + \left(\frac{\partial y}{\partial b} \sigma_b \right)^2 + 2\frac{\partial y}{\partial a}\frac{\partial y}{\partial b}\sigma_{ab}$$

where $\sigma_a, \sigma_b$ are the variances and $\sigma_{ab}$ is the covariance. With $y = ax + b$ this becomes:

$$\sigma_y ^ 2 = (x\sigma_a)^2 + \sigma_b^2 + 2x\sigma_{ab}$$

where we can take $\sigma_a, \sigma_b, \sigma_{ab}$ from the covariance matrix:

$$
\pmatrix{
    \sigma_a^2 & \sigma_{ab} \\
    \sigma_{ba} & \sigma_b^2
}
$$

Let's visualize it:

In [None]:
sigma_y = np.sqrt((x_data2 * np.sqrt(pcov2[0, 0])) ** 2 + pcov2[1, 1] + 2 * x_data2 * pcov2[0, 1])

In [None]:
y = line(x_data2, pfit2)
plt.fill_between(x_data2, y - sigma_y, y + sigma_y, alpha=0.5)
plt.plot(x_data2, y)
plt.plot(x_data2, y_data2, "ko")

For more generic templates/functions you can do that automatically. Either use the [uncertainties](https://pythonhosted.org/uncertainties) package (for functions described by simple formulas) or calculate it numerically by varying each parameter up and down and using half of the resulting interval as a replacement for $\frac{\partial f}{\partial x_i}\sigma_{x_i}$.