# Fitting

Taking some material from https://github.com/klieret/HEPFittingTutorial/

Also taking some inspiration from http://www.pp.rhul.ac.uk/~cowan/stat_course.html

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize, curve_fit

## Fit a line to data by minimizing the squared distance

The basic idea of fitting consists of minimizing a cost function by adjusting parameters of some function template.

Let's define some random data:

In [None]:
x_data = np.linspace(-1, 1, 10)
y_data = -1 + 3 * x_data + 2 * np.random.random_sample(len(x_data))

And take a quick look at it:

In [None]:
plt.plot(x_data, y_data, 'ko')

This looks like a line, so this is what we want to fit:

In [None]:
def line(x, params):
    a, b = params
    return a * x + b

The line is defined as $f(\vec x) = a \vec x + b$ and maps a vector of x coordinates to y coordinates.
The two parameters a and b are collected in vector ``params``.

Now the idea is to minimize the distance of our function ``line`` to the y coordinates of the data, so we define
another function ``chi2``, which, for every set of parameters returns the sum of the squared distances of data points to function values:

In [None]:
def chi2(params):
    return np.sum(np.square(y_data - line(x_data, params)))

Let's look at this step by step: 

* ``line(x_data, params)``: Here we passed on the parameters of ``chi2`` to the line function which we evaluate for all the data x values. The result is a vector y values.
* ``y_data - line(x_data, params)``: This is then the vector of distances between the data y values and the y values of our function
* ``np.square(y_data - line(x_data, params))``: The vector of squared distances
* ``np.sum(np.square(y_data - line(x_data, params)))``: Summing everything up

First, try to manually tune the parameters and see how `chi2` changes:

In [None]:
params = [3, 0.1]
plt.plot(x_data, y_data, 'ko', label="data")
plt.plot(x_data, line(x_data, params), label="line")
plt.legend()
print("Chi2: ", chi2(params))

Of course we don't do this manually in practice - especially with higher number of parameters this can be challenging.

Scipy provides [`scipy.optimize.minimize`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html). It needs start points, which we just set to ``(0, 1)``

In [None]:
result = minimize(chi2, (0, 1))

Note how ``minimize`` is a higher order function, that takes a function as first argument!

The results object contains quite a lot of useful information, but we just want the values of our parameters:

The value of `chi2` for the parameters that minimize it is:

In [None]:
chi2(result.x)

Let's plot to see how well it fits:

In [None]:
# Plotting our data point
plt.plot(x_data, y_data, 'ko', label="data")
plt.plot(x_data, line(x_data, result.x), label="fit")
plt.legend()

What we just did is called a "Least Square Fit"

Fitting an arbitrary (non-linear) Function to  weighted points (= measurements with uncertainties).

**Basic principle:**  
Minimize $\chi^2$, the quadratic difference between measurement points and fit, weighted by inverse uncertainty squared:

$$ \chi^2 = \sum \frac{(y_{meas}-y_{fit})^2}{  ( \Delta y )^2} $$

The resulting value for $\chi^2$ is an important check whether the fitting model is sensible:

$$ \left< \frac{\chi^2}{  (n_{points} - n_{par} )} \right> \approx 1$$

$n_{points} - n_{par}$ is the number of degrees of freedom (ndf) in our optimization problem.

<div class="alert alert-block alert-info">
    <b>Note:</b> In our example with the line we didn't divide by the uncertainty in the definition of <code>chi2</code>, so we made the implicit assumption that all data points are equally weighted. Not knowing the uncertainty of the data points also means we can't use $\chi^2$ to test goodness of fit by comparing it to the number of degrees of freedom.
</div>

## Using `scipy.optimize.curve_fit`

For a least-square-fit we don't need to write the cost function manually, but can instead use [`scipy.optimize.curve_fit`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html#scipy.optimize.curve_fit).

For `curve_fit` we don't need to put all parameters into one tuple. Instead the first argument of the fit function is interpreted to be the input data, and all remaining arguments the parameters:

In [None]:
def f(x, a, b):
    return a * x + b

`curve_fit` will return the fitted parameters and their covariance matrix

In [None]:
pfit, pcov = curve_fit(f, x_data, y_data, p0=(1, 0))
pfit, pcov

### Fit uncertainties and covariances

The covariance matrix shows the variances on the fit parameters on the diagonal and their covariances on the off diagonal elements.

<div class="alert alert-block alert-info">
    <b>Note:</b> Since we did not provide uncertainties on the data points (would be done using the <code>sigma</code> argument to <code>curve_fit</code>) the overall normalization of $\chi^2$ is undetermined. By default, `curve_fit` multiplies the covariance matrix by $\chi^2_\mathrm{min}/\mathrm{ndf}$ which effectively scales the uncertainties on the data points to match the observed residuals after the fit. This can be turned off by using <code>absolute_sigma=True</code> which can be used when uncertainties are provided.
</div>

So the one standard deviation errors on the fit parameters will be given by

In [None]:
a, b = pfit
a_err, b_err = np.sqrt(np.diag(pcov))
print("a = {:.2f} +/- {:.2f}".format(a, a_err))
print("b = {:.2f} +/- {:.2f}".format(b, b_err))

The covariances indicate how much the parameters are correlated, that is how much greater values of one parameter correspond to greater values of the other one.

In this case the covariance is very small:

In [None]:
pcov[0, 1]

You can try out why that is. Change on of the parameters slightly and see if there is a way to get a better `chi2` again by changing the other one:

In [None]:
%matplotlib inline
from ipywidgets import interact

@interact(
    a_test=(a - 0.5, a + 0.5, 0.05),
    b_test=(b - 0.5, b + 0.5, 0.05),
    continuous_update=False
)
def interactive_plot(a_test=a, b_test=b):
    plt.plot(x_data, y_data, "ko")
    yfit = f(x_data, a_test, b_test)
    yfit_min = f(x_data, a, b)
    plt.plot(x_data, yfit, label="test")
    plt.plot(x_data, yfit_min, "--", label="fit")
    plt.ylim(-4, 4)
    plt.legend()
    plt.show()
    print("Chi2:", ((yfit - y_data) ** 2).sum())
    print("Chi2_min:", ((yfit_min - y_data) ** 2).sum())

Also, if we fix a to a value slightly away from the optimum, and fit `b` again, the optimal `b` won't change significantly:

In [None]:
b

In [None]:
curve_fit(lambda x, b: f(x, a + 1, b), x_data, y_data, p0=(0))

In this case, we have the same number of points on the positive and negative x-axis, so a higher slope (parameter `a`) puts the data points on the positive x-axis below the fitted line and on the negative x-axis above the fitted line.

Changing the parameter `b` will cause an overall shift to the top or bottom, so there is no way to compensate a less optimal value of `a`.

The situation changes if we only look at data points on the positive x-axis:

In [None]:
x_data2 = np.linspace(0, 1, 10)
y_data2 = -1 + 3 * x_data2 + np.random.random_sample(len(x_data2))

In [None]:
plt.plot(x_data2, y_data2, "ko")

In [None]:
pfit2, pcov2 = curve_fit(f, x_data2, y_data2, p0=(1, 0))
pfit2, pcov2

<div class="alert alert-block alert-success">
    <b>Question:</b> What is the correlation coefficient?<br><br>
    <b>Hint:</b> Have a look at <a href="http://localhost:8888/notebooks/notebooks/StatBasics.ipynb">StatBasics.ipynb</a>
</div>

Here, the off diagonal elements have a significant (negative) non-zero value.

In [None]:
a2, b2 = pfit2

Now we can compensate a slightly higher value of `a` by a slightly lower value of `b`:

In [None]:
%matplotlib inline
from ipywidgets import interact

@interact(
    a_test=(a2 - 0.5, a2 + 0.5, 0.05),
    b_test=(b2 - 0.5, b2 + 0.5, 0.05),
    continuous_update=False
)
def interactive_plot(a_test=a2, b_test=b2):
    plt.plot(x_data2, y_data2, "ko")
    yfit = f(x_data2, a_test, b_test)
    yfit_min = f(x_data2, a2, b2)
    plt.plot(x_data2, yfit, label="test")
    plt.plot(x_data2, yfit_min, "--", label="fit")
    plt.ylim(-2, 4)
    plt.legend()
    plt.show()
    print("Chi2:", ((yfit - y_data2) ** 2).sum())
    print("Chi2_min:", ((yfit_min - y_data2) ** 2).sum())

In [None]:
curve_fit(lambda x, b: f(x, a2 + 1, b), x_data2, y_data2, p0=(0))

The covariance matrix gives an idea of how much the fitted values would spread if we were to repeat the fit with new random data, assuming our model describes it.

Since we know the "real" distribution of our data points (we generated them after all) we can try this out with toy experiments:

In [None]:
def fit_toy():
    x_data = np.linspace(0, 1, 10)
    y_data = -1 + 3 * x_data + np.random.random_sample(len(x_data))
    pfit, pcov = curve_fit(f, x_data, y_data, p0=(1, 0))
    return pfit, pcov

In [None]:
toy_params = np.array([fit_toy()[0] for i in range(5000)]).T

In [None]:
plt.scatter(*toy_params, marker=".")

Compare the empirical covariance matrix of these points to the one determined from the fit:

In [None]:
np.cov(toy_params)

In [None]:
pcov2

### Error propagation

One thing we can do with this covariance matrix is to visualize uncertainties on the fit using [linear error propagation](https://en.wikipedia.org/wiki/Propagation_of_uncertainty). In this case we can simply calculate it manually:

$$\sigma_y ^ 2 = \left(\frac{\partial y}{\partial a} \sigma_a \right)^2 + \left(\frac{\partial y}{\partial b} \sigma_b \right)^2 + 2\frac{\partial y}{\partial a}\frac{\partial y}{\partial b}\sigma_{ab}$$

where $\sigma_a, \sigma_b$ are the variances and $\sigma_{ab}$ is the covariance. With $y = ax + b$ this becomes:

$$\sigma_y ^ 2 = (x\sigma_a)^2 + \sigma_b^2 + 2x\sigma_{ab}$$

where we can take $\sigma_a, \sigma_b, \sigma_{ab}$ from the covariance matrix:

$$
\pmatrix{
    \sigma_a^2 & \sigma_{ab} \\
    \sigma_{ba} & \sigma_b^2
}
$$

Let's visualize it:

In [None]:
sigma_y = np.sqrt((x_data2 * np.sqrt(pcov2[0, 0])) ** 2 + pcov2[1, 1] + 2 * x_data2 * pcov2[0, 1])

In [None]:
y = line(x_data2, pfit2)
plt.fill_between(x_data2, y - sigma_y, y + sigma_y, alpha=0.5)
plt.plot(x_data2, y)
plt.plot(x_data2, y_data2, "ko")

For more generic templates/functions you can do that automatically. The [jacobi](https://github.com/hdembinski/jacobi) package provides a  convenient `propagate` function for that.

It calculates the general case of a function $\vec y = f(\vec x) \rightarrow C_y = J \, C_x \, J^T$ with $J_{ik} = \frac{\partial y_i}{\partial x_k}$ where $C_x$ is the covariance matrix of the input parameters and $C_y$ is the resulting covariance matrix of the function values. 

In [None]:
try:
    from jacobi import propagate
except ModuleNotFoundError:
    !pip install jacobi
    from jacobi import propagate

In [None]:
y, ycov = propagate(lambda params: line(x_data2, params), pfit2, pcov2)

In [None]:
sigma_y2 = np.sqrt(np.diag(ycov))
plt.fill_between(x_data2, y - sigma_y2, y + sigma_y2, alpha=0.5)
plt.plot(x_data2, y)
plt.plot(x_data2, y_data2, "ko")

We see the uncertainties are compatible with our manual calculation:

In [None]:
np.isclose(sigma_y, sigma_y2).all()

Alternatives are the [uncertainties](https://pythonhosted.org/uncertainties) package (for functions described by simple formulas) or calculate it numerically by varying each parameter up and down and using half of the resulting interval as a replacement for $\frac{\partial f}{\partial x_i}\sigma_{x_i}$.

### Goodness of fit

Let's have a look at these data points (this time with uncertainties):

In [None]:
x_data = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0])
y_data = np.array([2.7, 3.9, 5.5, 5.8, 6.5, 6.3, 6.7, 6.2, 6.0])
yerr_data = np.array([0.3, 0.5, 0.7, 0.6, 0.4, 0.3, 0.7, 0.8, 0.5])

In [None]:
plt.errorbar(x_data, y_data, yerr=yerr_data, fmt="ko")

And fit a line again:

In [None]:
def f_linear(x, a, b):
    return a * x + b

In [None]:
pfit, pcov = curve_fit(f_linear, x_data, y_data, sigma=yerr_data, absolute_sigma=True)

In [None]:
plt.errorbar(x_data, y_data, yerr=yerr_data, fmt="ko")
plt.plot(x_data, f_linear(x_data, *pfit))

That doesn't look very great. How can we quantify the quality of this fit? We look at our $\chi^2$ statistic:

In [None]:
def f_chi2(f, params, x, y, yerr):
    return (((f(x, *params) - y) / yerr) ** 2).sum()

In [None]:
f_chi2(f_linear, pfit, x_data, y_data, yerr_data)

Reminder: As a rule of thumb, the number of degrees of freedom "ndf" (number of data points - number of parameters) should be roughly equal to the $\chi^2$ statistic.

In [None]:
len(x_data) - len(pfit)

So this rule of thumb already indicates this is not a very nice fit. We can be even more quantitative. This rule comes from the fact that the $\chi^2$ statistic actually follows a [Chi-squared distribution](https://en.wikipedia.org/wiki/Chi-squared_distribution) (which has ndf as expectation value) if we assume the data points follow a normal distribution. Using this we can calculate a p-value that answers the question "how often would we get such a high value of $\chi^2$ in repeated experiments, given that our function describes the data".

Scipy provides functions for common probability density functions among which there is also the chi-squared distribution. What we want to calculate is

$$p = \int\limits_{\chi^2_\mathrm{min}}^{\infty}f(\chi^2, \mathrm{ndf})\mathrm{d}\chi^2 = 1 - F(\chi^2_\mathrm{min}, \mathrm{ndf})$$

where $F(\chi^2_\mathrm{min}, \mathrm{ndf})$ is the cumulative distribution function of a chi-squared distribution which we can calculate using `scipy.stats.chi2.cdf`:

In [None]:
import scipy.stats

In [None]:
def chi2_pvalue(chi2, ndf):
    return 1 - scipy.stats.chi2.cdf(chi2, ndf)

In [None]:
chi2_pvalue(
    f_chi2(f_linear, pfit, x_data, y_data, yerr_data),
    len(x_data) - len(pfit)
)

which is rather low, again indicating a bad fit!

<div class="alert alert-block alert-success">
    <b>Exercise</b> Fit a quadratic function to the data. What is $\chi^2 / \mathrm{ndf}$ now? What is the p-value?
</div>