# TO DO

- identify discrepancy between ```autograd``` and regular implementation

# Introduction

This notebook will focus on the concepts and fundamentals of $\chi^2$ minimization. For an in-depth review see [J Dobaczewski et al 2014 J. Phys. G: Nucl. Part. Phys. 41 074001](https://iopscience.iop.org/article/10.1088/0954-3899/41/7/074001).

$\chi^2$ minimization is a specific case of so-called optimization problems. These problems focus on using some convex function to force a minimum to appear in the function itself, thereby allowing us to identify important features from the parameters of the function. In other words, optimization problems use a special set of functions like $\chi^2$ (shown below) to generate a dimensionless quantity that can be used to identify important aspects between theoretical models and experimental data. This is reflected in the $\chi^2$ function
$$
\chi^2(\vec{p}) = \sum_{i=1}^{N_d} \frac{\left(O_i(\vec{p}) - O_{i}^{\rm exp}\right)^2}{\Delta O_i^2}
$$
where $\vec{p}$ is a vector containing the paremeters, e.g. $\vec{p}=(p_0, p_1,...,p_n)$ for distinct parameters $p_i$, $O_i(\vec{p})$ is the $i$-th model or theoretically calculated value, $O_i^{\rm exp}$ is the $i$-th  experimental data point, and $\Delta O_i$ is the $i$-th adopted error.

Generally these (penalty) functions can be chosen or tailored to your specific problem. In many cases, using $\chi^2$ is a decent starting point. We will elaborate more on the innerworkings of the process as we go, but there are a few general ideas to remember as we go through

1. What type of data do you have? In some cases, it might be better to consider different forms of the data, like taking a log for data which varies by orders of magnitiude.

2. The error term $\Delta O_i$ is not *only* the experimental error, but a sum of $\Delta O_i^2 = (\Delta O_i^{\rm exp})^2 + (\Delta O_i^{\rm num})^2 + (\Delta O_i^{\rm the})^2$ for the experimental, numerical, and theoretical errors respectively.

3. When we refer to "residuals" these are the differences between experimental and calculated theoretical value.

4. By using a dimensionless function like $\chi^2$, we can use a variety of observables for each datapoint $i$ as we will calculate a difference between theory and experimental value $\left(O_i(\vec{p}) - O_{i}^{\rm exp}\right)^2$ divided by the same units from the error $\Delta O_i^2$ resulting in a cancelation of units.

5. The error $\Delta O_i^2$ can be somewhat arbitrary due to the dependence on user input. Regardless, you can consider this as a weight term by $W_i = 1 / \sqrt{\Delta O_i}$.

Before we get started, we'll import some (hopefully) familiar packages that we'll use a lot later.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Getting the dataset

We assume that you are following along with or participating in the Nuclear Structure Course (PHY981) offered at Michigan State University or a similar course at your current institution. If you are not, a previous assignment titled "introduction_to_reading_data_and_plotting.ipynb" focused on generating the data set we will be using. We recommend you take a look at that notebook now or at minimum the "Assignment" section.

Now, we can read the data from the file we created in the previous assignment (should be a file named ```assignment_output_for_reading_data_and_plotting.csv```).

In [None]:
dataset = pd.read_csv('assignment_output_for_reading_data_and_plotting.csv')
display(dataset)

Let's take a look at our data now using Matplotlib.

In [None]:
# Use the same code as before...
fig, ax = plt.subplots(figsize=[5,5]) # Figure and axis subplot panel respectively

# We will plot data directly onto the axis panel
ax.errorbar(
    dataset['x'], dataset['y'], yerr=2*dataset['error'], # plot x, y, and error data
    fmt='o', capsize=7, # set the format to be circular data points and error bar width size
    label='Data' # Note that we now include a label to go with our legend later
)

# Set the x and y-labels for the plot
ax.set_xlabel('x',fontsize=16)
ax.set_ylabel('y',fontsize=16)

# Set x and y limits
ax.set_xlim(0, 1)
ax.set_ylim(-2,4)

# Set title
ax.set_title('Example data plot', fontsize=16)

# Include a legend on the given axis to the top left quadrant
ax.legend(loc='upper left')

# Setup and Calculate $\chi^2$ minimization

In the following we will create our $\chi^2$ python function and use the package ```scipy``` which has a lot of useful functions like ```optimize.minimize``` to minimize a given function. Before we go any further, we'll make sure this is setup using the cells below.

In [None]:
!pip3 install scipy

In [5]:
from scipy.optimize import minimize

## Picking the model

In the introduction, we hinted at a key idea which we will expand upon now. All of this minimization discussion is centered around a model we pick. Let's imagine we don't know the form of the random data above and are left to come up with our own model. It seems to have $y$ increase proportionally with $x$ so perhaps a good model for this would be a linear one like
$$
y = mx + b.
$$
Of course, if we knew something about the data, maybe we would decide a linear one is not the best and pick another. For now, we'll assume we have no underlying knowledge about the data form.

In [6]:
def linear_model(x, m, b):
    '''
    Function returning the calculated value of a simple line. Given slope m and intercept b, calculates and returns the linear value y.
    '''
    return m * x + b

Now that we have a function, we will give an example of its use in relation to the data. Since we've picked a model based on a presumed form of the data (that is seems linear), we might want to see if this holds up. For illustration sake, we pick a random set of parameters below.

In [None]:
# Set initial parameters for m and b respectively
initial_parameters = [2, -0.25]

# Calculate the value for our model given the initial parameters
modeled_y = linear_model(dataset['x'], initial_parameters[0], initial_parameters[1])

# Use the same code as in the "introduction_to_reading_data_and_plotting.ipynb" notebook...
fig, ax = plt.subplots(figsize=[5,5]) # Figure and axis subplot panel respectively

# We will plot data directly onto the axis panel
ax.errorbar(
    dataset['x'], dataset['y'], yerr=2*dataset['error'], # plot x, y, and error data
    fmt='o', capsize=7, # set the format to be circular data points and error bar width size
    label='Data' # Note that we now include a label to go with our legend later
)

# Now we can create any arbitrary line and plot it
ax.plot(dataset['x'], modeled_y, label='Initial Guess')

# Set the x and y-labels for the plot
ax.set_xlabel('x',fontsize=16)
ax.set_ylabel('y',fontsize=16)

# Set x and y limits
ax.set_xlim(0, 1)
ax.set_ylim(-2,4)

# Set title
ax.set_title('Example data plot', fontsize=16)

# Include a legend on the given axis to the top left quadrant
ax.legend(loc='upper left')

At a quick glance, this doesn't look too bad but it certianly isn't a systematic or rigerous way to pick the parameters. Let's now do our procedure to get the optimized parameters.

## Define our $\chi^2$ function

We will define our $\chi^2$ python function below and aim to keep it as general as possible.

In [8]:
# Create function for linear chi^2
def linear_chi_square(parameters, data_frame, theoretical_error_type):
    m, b = parameters
    
    # We will get the column data as numpy arrays to more easily handle the data
    experimental_x = data_frame['x'].to_numpy()
    experimental_y = data_frame['y'].to_numpy()
    experimental_error = data_frame['error'].to_numpy()

    # Calculate the theoretical values
    theory_y = linear_model(experimental_x, m, b)

    # Calculate the residuals
    residuals = experimental_y - theory_y

    # Calculate the total errors i.e. O^{exp} + O^{num} + O^{the}
    # Depending on our preference, we can either calculate using no theoretical error or a simple RMSE
    if theoretical_error_type.lower() == 'none':
        theoretical_error = 0
    elif theoretical_error_type.lower() == 'rmse':
        theoretical_error = np.sqrt(np.mean(residuals**2)) # Use RMSE to estimate theoretical error
    
    # calculate total error
    total_error = experimental_error + theoretical_error

    # Calculate the chi^2
    chi_square = (residuals**2) / (total_error**2)

    # Return sum over all chi^2 values
    return np.sum(chi_square)

# Create a function which will also calculate the normalized linear function chi^2
def normalized_linear_chi_square(parameters, data_frame, theoretical_error_type):
    chi_square = linear_chi_square(parameters, data_frame, theoretical_error_type)
    n_parameters = len(parameters)
    n_data = len(data_frame['y'])
    return chi_square / (n_data - n_parameters)

Now that we have our $\chi^2$ function, we can minimize it given our initial parameter guess. Note in the above function, we assumed the form of the total error
$$
\Delta O_i^2 = (\Delta O_i^{\rm exp})^2 + (\Delta O_i^{\rm num})^2 + (\Delta O_i^{\rm the})^2
$$
with
\begin{align}
\Delta O_i^{\rm exp} &= {\rm experimental\; values} \nonumber \\
\Delta O_i^{\rm num} &= 0 \nonumber \\
\Delta O_i^{\rm the} &= \sqrt{\frac{1}{N}\sum_{i}^{N}(O_i(\vec{p}) - O_{i}^{\rm exp})^2}. \nonumber 
\end{align}
The choice of theoretical uncertainties can be modified if desired.

## Minimize $\chi^2$

Now that we have our $\chi^2$ python function, we can use ```scipy.optimize.minimize``` to minimize it given some initial conditions.

In [None]:
# Perform minimization
minimized_result = minimize(linear_chi_square, initial_parameters, args=(dataset, 'RMSE'))

# Let's look at the optimized parameters
optimized_parameters = minimized_result.x

# We will print only up to the first 3 decimals
print('Optimized m = {:.3f}'.format(optimized_parameters[0]))
print('Optimized b = {:.3f}'.format(optimized_parameters[1]))

# We can also print the minimized chi^2 value
minimum_chi_square = minimized_result.fun

print("Minimum chi^2 value = {:.3f}".format(minimum_chi_square))

Let's now look at our minimized result!

In [None]:
# Set an analytical x domain beyond our dataset just so we can get a feel for how it may extrapolate
x = np.linspace(0, 1.5, 100)

# Calculate the value for our model given the initial parameters
modeled_y = linear_model(x, optimized_parameters[0], optimized_parameters[1])

# Use the same code as in the "introduction_to_reading_data_and_plotting.ipynb" notebook...
fig, ax = plt.subplots(figsize=[5,5]) # Figure and axis subplot panel respectively

# We will plot data directly onto the axis panel
ax.errorbar(
    dataset['x'], dataset['y'], yerr=2*dataset['error'], # plot x, y, and error data
    fmt='o', capsize=7, # set the format to be circular data points and error bar width size
    label='Data' # Note that we now include a label to go with our legend later
)

# Now we can create any arbitrary line and plot it
ax.plot(x, modeled_y, label='Optimized Parameters')

# Set the x and y-labels for the plot
ax.set_xlabel('x',fontsize=16)
ax.set_ylabel('y',fontsize=16)

# Set x and y limits
# ax.set_xlim(0, 1)
ax.set_ylim(-2,4)

# Set title
ax.set_title('Example data plot', fontsize=16)

# Include a legend on the given axis to the top left quadrant
ax.legend(loc='upper left')

Ultimately, this doesn't look much more impressive compared to our original guess, so why bother with a process like this? Well, the point of minimization isn't only to get a nice fitting line to our data, but to also be able to say something more concrete about why its a good calibration! We can also see how the $\chi^2$ space changes with varying values of $m$ and $b$ in the 3D plot below. This is just to highlight the minimum, and statistical considerations will be elaborated on more below.

In [None]:
# Define the mesh grid and parameter range for m and b
m_range = np.linspace(-5, 5, 100)  # Range for slope m
b_range = np.linspace(-5, 5, 100)  # Range for intercept b

# Meshgrid will make a 2D array 
M, B = np.meshgrid(m_range, b_range)

# Calculate chi-squared for each combination of m and b
chi_squared_values = np.zeros_like(M)

for i in range(M.shape[0]):
    for j in range(M.shape[1]):
        params = (M[i, j], B[i, j])
        chi_squared_values[i, j] = linear_chi_square(params, dataset, theoretical_error_type='rmse')

# Replotting the contour with the minimum point highlighted
plt.figure(figsize=(10, 8))
contour = plt.contourf(M, B, chi_squared_values, levels=50, cmap='viridis')
plt.colorbar(contour, label=r'$\chi^2$')

# Highlight the minimum point
plt.plot(optimized_parameters[0], optimized_parameters[1], marker='*', color='gold', markersize=15, label='Minimum $\\chi^2$')

# Add labels and title
plt.title(r'$\chi^2$ Contour Plot with Minimum Highlighted', fontsize=16)
plt.xlabel(r'Slope $m$', fontsize=14)
plt.ylabel(r'Intercept $b$', fontsize=14)
plt.legend(fontsize=12)
plt.grid(alpha=0.3)
plt.show()


## Statistical Considerations

As we discussed above, there are additional tools we can use to make more helpful claims about the fit we just made. One of the first things we'll need is the Hessian Matrix which is used to determine if the convex function we optimized has a local minima.

### Calculating the Hessian

The Hessian can be calculated (as shown in [J Dobaczewski et al 2014 J. Phys. G: Nucl. Part. Phys. 41 074001](https://iopscience.iop.org/article/10.1088/0954-3899/41/7/074001)) by
$$
\mathcal{M}_{\alpha \beta} = \frac{1}{2} \partial_\alpha \partial_\beta \chi^2 |_{\vec{p}_0}
$$
where the partial derivatives are evaluated at the optimized parameter set $\vec{p}_0$. In the case of our linear form, our equation is
$$
\chi^2 = \sum_i \frac{\left(mx_i + b - O_i^{\rm exp}\right)^2}{\Delta O_i^2}
$$
for which we can generate the Hessian
$$
\mathcal{M} = \frac{1}{2} \begin{bmatrix}
\frac{\partial^2}{\partial m^2}\chi^2 & \frac{\partial^2}{\partial m \partial b}\chi^2 \\
\frac{\partial^2}{\partial b \partial m}\chi^2 & \frac{\partial^2}{\partial b^2}\chi^2
\end{bmatrix} = \frac{1}{2} \begin{bmatrix}
\sum_i \frac{2x_i^2}{\Delta O_i^2} & \sum_i \frac{2x_i}{\Delta O_i^2} \\
\sum_i \frac{2x_i}{\Delta O_i^2} & \sum_i \frac{2}{\Delta O_i^2}
\end{bmatrix} = \begin{bmatrix}
\sum_i \frac{x_i^2}{\Delta O_i^2} & \sum_i \frac{x_i}{\Delta O_i^2} \\
\sum_i \frac{x_i}{\Delta O_i^2} & \sum_i \frac{1}{\Delta O_i^2}
\end{bmatrix}.
$$
This can be done analytically as shown and there are other packages like [```autograd```](https://github.com/HIPS/autograd/tree/7c22772cb42455c00dac964c5363242e62529ed6) which can do it for you automatically. We also show how one can do it with ```Sympy``` (a symbolic package) later in the "Extras" section.

The important thing is that we first calculate our Hessian to then get the Covariance Matrix. We use the derived form above for the Hessian and implement the calculations via ```numpy``` array operations.

In [None]:
# Calculate the RMSE for the theoretical error like we did in linear_chi_squared (or set to zero if you decided not to do this)
rmse = np.sqrt(np.mean(
    (dataset['y'] - linear_model(dataset['x'], optimized_parameters[0], optimized_parameters[1]))**2
)) # Use RMSE to estimate theoretical error

# Alternative comment the above rmse value and uncomment below to set to zero
# rmse = 0

# Create Hessian matrix first as a 2x2 array of zeros
hessian_matrix = np.zeros((2,2), dtype=float)

# Calculate first diagonal term i.e. partial d^2 / dm^2
hessian_matrix[0,0] = np.sum(dataset['x']**2 / (dataset['error'] + rmse)**2)

# Calculate the off-diagonals
hessian_matrix[0,1] = np.sum(dataset['x'] / (dataset['error'] + rmse)**2)
hessian_matrix[1,0] = np.sum(dataset['x'] / (dataset['error'] + rmse)**2)

# Calculate the second diagonal i.e. partial d^2 / db^2
hessian_matrix[1,1] = np.sum(1 / (dataset['error'] + rmse)**2)

# Show our calculated Hessian
print('Analytically calculated Hessian =')
display(hessian_matrix)

### Calculating the Covariance Matrix

Now that we have our Hessian $\mathcal{M}$ we can calculate the covariance matrix $\mathcal{C}$ via
$$
\mathcal{C} = s \mathcal{M}^{-1}
$$
where $s$ is
$$
s = \frac{\chi^2(\vec{p}_0)}{N_d - N_p}.
$$
$N_d$ and $N_p$ are the number of observables and number of parameters respectively. Therefore, our analytical covariance matrix would be
$$
\mathcal{M}^{-1} = \frac{1}{\left(\sum_i \frac{x_i^2}{\Delta O_i^2}\right) \left(\sum_i \frac{1}{\Delta O_i^2}\right) - \left(\sum_i \frac{x_i}{\Delta O_i^2}\right) \left(\sum_i \frac{x_i}{\Delta O_i^2}\right)}
\begin{bmatrix}
\sum_i \frac{1}{\Delta O_i^2} & -\sum_i \frac{x_i}{\Delta O_i^2} \\
-\sum_i \frac{x_i}{\Delta O_i^2} & \sum_i \frac{x_i^2}{\Delta O_i^2}
\end{bmatrix}
$$
where we can just use the analytical form of a $2\times 2$ inverse
$$
\begin{bmatrix}
a & b \\
c & d
\end{bmatrix}^{-1} = \frac{1}{ad - bc}
\begin{bmatrix}
d & -b \\
-c & a
\end{bmatrix}.
$$

We calculate the inverse manually below using the analytical form.

In [None]:
# Calculate the determinant of the Hessian (serves as our ad - bc value)
hessian_det = np.linalg.det(hessian_matrix)

# Create a zeros array to store the hessian inverse
hessian_inv = np.zeros_like(hessian_matrix)

# Set the values of the inverse according to the hessian locations
hessian_inv[0,0] = hessian_matrix[1,1]
hessian_inv[0,1] = -hessian_matrix[0,1]
hessian_inv[1,0] = -hessian_matrix[1,0]
hessian_inv[1,1] = hessian_matrix[0,0]

# Scale by dividing by determinant
hessian_inv = hessian_inv / hessian_det

print('Analytical inverse of the Hessian =')
display(hessian_inv)

One can also calculate the inverse of a matrix numerically via ```numpy``` using the ```linalg.inv``` function. We note that ```linalg``` contains very helpful linear algebra functions which are fast and fairly user-friendly, so try to use these instead of coding your own!

In [None]:
print('Numerical inverse of the Hessian =')
display(np.linalg.inv(hessian_matrix))

Now that we have the inverse of our Hessian, we can calculate our covariance matrix below.

In [None]:
# Calculate s or the normalized chi^2 using the optimal values
covariance_s = normalized_linear_chi_square(optimized_parameters, dataset, 'RMSE')
print('Value of s =', covariance_s)

# Calculate covariance matrix
covariance_matrix = covariance_s * hessian_inv

print('Covariance matrix =')
display(covariance_matrix)

## Correlation Analysis

Now that we have our covariance matrix, we can perform an analysis on the parameters of our model (our straight line). A correlation coefficient can inform us on how a quantity affects a model and highlight any underlying trends in its parameterization. In order to extract these helpful quantities, we must take our covariance matrix and divide by the variable standard deviation (or the diagonal of the covariance matrix). Ultimately we end up with
$$
c_{AB} = \frac{|\overline{\Delta A \Delta B}|}{\sqrt{\overline{\Delta A^2}} \sqrt{\overline{\Delta A^2}}}
$$
or if you prefer to consider it as a matrix of coefficients in our specific (linear case)
$$
c = \begin{bmatrix}
\frac{|\mathcal{C}_{mm}^2|}{\sqrt{\mathcal{C}_{mm}} \sqrt{\mathcal{C}_{mm}}} & \frac{|\mathcal{C}_{mm} \mathcal{C}_{bb}|}{\sqrt{\mathcal{C}_{mm}} \sqrt{\mathcal{C}_{bb}}} \\
\frac{|\mathcal{C}_{bb} \mathcal{C}_{mm}|}{\sqrt{\mathcal{C}_{bb}} \sqrt{\mathcal{C}_{mm}}} & \frac{|\mathcal{C}_{bb}^2|}{\sqrt{\mathcal{C}_{bb}} \sqrt{\mathcal{C}_{bb}}}
\end{bmatrix}
$$

In [16]:
# Get the standard deviations of our covariance matrix parameters along the diagonal
standard_dev = np.sqrt(np.diag(covariance_matrix))

Since we need to divide by the standard deviations element-wise in the array, we can do a fancy trick! ```numpy```'s "outer" function will take two one-dimensional arrays and multiply them as an outer product to make a 2D array for us! For example, we can make the above denominator by doing the outer product
$$
\xi = \begin{bmatrix}
\sqrt{\mathcal{C}_{mm}} \\
\sqrt{\mathcal{C}_{bb}}
\end{bmatrix} \quad {\rm then} \quad
\xi \xi^T = \begin{bmatrix}
\sqrt{\mathcal{C}_{mm}} \sqrt{\mathcal{C}_{mm}} & \sqrt{\mathcal{C}_{mm}} \sqrt{\mathcal{C}_{bb}} \\
\sqrt{\mathcal{C}_{bb}} \sqrt{\mathcal{C}_{mm}} & \sqrt{\mathcal{C}_{bb}} \sqrt{\mathcal{C}_{bb}}
\end{bmatrix}.
$$
This can be implemented in ```numpy``` as shown below.

In [None]:
# Calculate correlation coefficient matrix using an outer product
correlation_coefficients = np.abs(covariance_matrix) / np.outer(standard_dev, standard_dev)

print('Correlation coefficient matrix =')
display(correlation_coefficients)

The two extremes of correlation coefficients are $c=1$ and $c=0$ where the parameters are completely correlated and fully independent respectively. We see in our case, the parameters on the diagonal are completely correlated. This is a great sanity check as we are considering cases where we are comparing a parameter to itself, so it should be 1! We see in the off-diagonals the correlation is still close to 1 so the $m$ and $b$ term also appear quite correlated.

# Assignment

Now that we've shown how to do $\chi^2$ minimization for a line, we ask that you do the same process for the following function:
$$
y = ax^2 + bx + c
$$
Using the same dataset, please:
1. Write a new python function for the above quadratic equation and a $\chi^2$ function to be optimized (similar to how we did ```linear_model``` and ```linear_chi_square```) and minimize this function to get optimal parameters.

2. Plot the optimized model on the same plot as the dataset. (Optional) Make a contour surface plot to highlight the minimum location.

3. Calculate the Hessian and covariance matrices.

4. Determine the correlation coefficients for this new model.

In [18]:
# Define your own quadratic model

# Define your own quadratic chi^2 model


In [19]:
# Optimize your chi^2 model

In [20]:
# Perform statistical analysis below

# Extras

## Using ```sympy```

We show below how you can use ```sympy``` to perform the analytical calculations similar to how one might use other software like Mathematica. ```sympy``` can be convinient and has many helpful functions if other packages do not have them. That being said, it can be quite slow at times and should be used with caution.

In [None]:
!pip3 install sympy

Unlike other packages, sympy requires you to import things as needed like Matrices and variable names. These can always be modified afterwards, but you still need to import them like other functions.

In [22]:
from sympy import ordered, Matrix
from sympy import hessian as sympy_hessian

# We import the symbols we will use in our equation
from sympy.abc import m, b, p, x, y

If we wanted to keep things extra simple, we can just consider a case where we drop the sum in our $\chi^2$ function and just look at one term
$$
\frac{(mx_i + b - y_i)}{\Delta O_i^2}.
$$
We use this in the following ```sympy``` operations.

In [None]:
# Define our chi^2 equation where y is the experimental data and p is a placeholder for our error
eq = (m*x + b - y)**2 / p**2

# Calling the equation name in Jupyter will print it in a nice LaTeX format
eq

In [24]:
v = [m, b] # Set the two variables we wish to differentiate

In [25]:
# Set a function to calculate the Jacobian which is used to calculate the Hessian
gradient = lambda f, v: Matrix([f]).jacobian(v)

In [None]:
# print our Jacobian
gradient(eq, v)

In [None]:
# We can then print the corresponding Sympy-calculated Hessian
sympy_hessian(eq, v)

In [None]:
# We could also try to calculate the inverse of the above Hessian
sympy_hessian(eq, v).inv()

Well, that error doesn't appear good, so how do we reconcile the discrepancy between our analytical form and the ```sympy``` version? Thankfully, the answer is simple in that by only considering one term, we are actually not considering the correct matrix to invert! The presence of the zero-valued determinant error is a result of our matrix inversion equation from above as
$$
\begin{bmatrix}
\frac{2x^2}{p^2} & \frac{2x}{p^2} \\
\frac{2x}{p^2} & \frac{2}{p^2}
\end{bmatrix}^{-1} = \frac{1}{\frac{2x^2}{p^2}\frac{2}{p^2} - \frac{2x}{p^2}\frac{2x}{p^2}}
\begin{bmatrix}
\frac{2}{p^2} & -\frac{2x}{p^2} \\
-\frac{2x}{p^2} & \frac{2x^2}{p^2}
\end{bmatrix} = \frac{1}{0}
\begin{bmatrix}
\frac{2}{p^2} & -\frac{2x}{p^2} \\
-\frac{2x}{p^2} & \frac{2x^2}{p^2}
\end{bmatrix}.
$$
If we recall that our actual analytical case is the multiplication of sums, and consider order of operations, we see that
$$
\left(\sum_i \frac{x_i^2}{\Delta O_i^2}\right) \left(\sum_i \frac{1}{\Delta O_i^2}\right) \neq \left(\sum_i \frac{x_i}{\Delta O_i^2}\right) \left(\sum_i \frac{x_i}{\Delta O_i^2}\right).
$$
This means that when we do ```sympy``` calculations, you should always remember that any tricks you do must be properly considered. We highlight this below by looping and printing over each matrix we would consider to make the Hessian properly.

In [None]:
# We will loop over each row of the dataset and call that set of values ('y', 'x', and 'error')
for i in range(len(dataset)):
    print(sympy_hessian(eq, v).subs(# For each call of the hessian function, we will then substitute in the proper values for the 'x' and 'p' variable
        {x:dataset['x'].iloc[i], p:dataset['error'].iloc[i]+rmse}
    ))

In [None]:
temp = Matrix([[0,0],[0,0]])
for i in range(len(dataset)):
    temp += sympy_hessian(eq, v).subs(# For each call of the hessian function, we will then substitute in the proper values for the 'x' and 'p' variable
            {x:dataset['x'].iloc[i], p:dataset['error'].iloc[i]+rmse}
        )

calculated_sympy_hessian = 0.5 * temp
calculated_sympy_hessian

Now that we have properly summed over each term to construct the true Hessian, we can try to invert this matrix.

In [None]:
calculated_sympy_hessian.inv()

We see here that this total Hessian can now be inverted properly and matches with our analytical case!

## ERROR: Using ```autograd```

In [None]:
!pip3 install autograd

In [33]:
from autograd import hessian
import autograd.numpy as np

In [None]:
linear_hessian = hessian(linear_chi_square)

optimized_hessian = linear_hessian(optimized_parameters, dataset, 'RMSE')

print('Autograd Hessian')
display(optimized_hessian)
display(np.linalg.inv(optimized_hessian))

print('Analytic Hessian')
display(hessian_matrix)