### Phys 629: Statistical Tools for Physics Research
***Anuradha Gupta***

# Homework 5
### Due: Friday, Oct 6 at 11:59 pm CT

## Problem 1

This week we have only one problem worth 20 points. This problem uses a dataset in `/coursework/homeworks/hw_data/`.

1) Read in `hw5_data.npy`. This is a (50 x 2) numpy array, with measurements in the first column and uncertainties in the second column. Using the analytic results for heteroscedastic Gaussian data from lectures, compute the sample mean and the standard error on the sample mean from for this data.

2) Reusing some approaches and tools from `lecture_11`, write a ln-likelihood function for heteroscedastic Gaussian data, and use it in a fitting algorithm to find the best-fit mean. *Remember that scipy optimizers are set up to minimize functions.*

3) Using the same numerical technique from `lecture_10`, compute the Fisher uncertainty estimate on the mean.

4) While we have fitted a heteroscedastic Gaussian to this data, let's try something else. Write some code to define a ln-likelihood for a Laplace distribution evaluated on this data. Fit simultaneously for the Laplace location parameter $\mu$ and scale parameter $\Delta$.

5) Compute the AIC values for the heteroscedastic Gaussian model and the Laplacian model. Which model is favored by the data?

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats

data = np.load('../../homeworks/hw_data/hw5_data.npy')

# 1.1 Calculate the sample mean and standard error

In [3]:
values, sigma = data[:, 0], data[:, 1]

mu_hat = np.sum(values / sigma ** 2) / np.sum(1 / sigma ** 2)
sigma_mu = 1 / np.sqrt(np.sum(1 / sigma ** 2))

print(f"The maximum likelihood estimation of mean is {mu_hat}")
print(f"The uncertainty for the mean is {sigma_mu}")

The maximum likelihood estimation of mean is 3.9179920346060557
The uncertainty for the mean is 0.09481084100510956


# 1.2 Best fit for ln-likelihood function for heteroscedastic Gaussian

In [9]:
from scipy import optimize

mu = np.linspace(-1, 10, 10000)

def ln_likelihood_gaussian(mu, values, sigma):
    sum_residuals_gaussian = np.sum((values - mu) ** 2 / sigma ** 2 + np.log(sigma ** 2))
    return 0.5 * sum_residuals_gaussian

guess_gaussian = np.mean(values)
best_fit = optimize.minimize(ln_likelihood_gaussian, guess_gaussian, args=(values, sigma)).x[0]

print("The best-fit mean is", best_fit, "which is very close to MLE of the mean given above.")


The best-fit mean is 3.9179920276168665 which is very close to MLE of the mean given above.


# 1.3 Calculating the Fisher uncertainty matrix for the mean

In [11]:
def compute_FIM(mu, values, sigma):
    inv_square = 1 / np.square(sigma)
    fisher_information_matrix = np.sum(inv_square)
    return fisher_information_matrix

fisher_information_matrix = compute_FIM(best_fit, values, sigma)

# Directly compute sigma_mu, which will be 0 if fisher_information_matrix is 0
sigma_mu = np.sqrt(1 / fisher_information_matrix)

print("The Fisher uncertainty measurement for the mean is:", sigma_mu)

The Fisher uncertainty measurement for the mean is: 0.09481084100510954


# 1.4 ln-likelihood for a Laplace distribution

In [14]:
def ln_likelihood_laplacian(params, values):
    mu, delta = params
    neg_ln_likelihood_laplacian = -len(values) * np.log(2 * delta) - np.sum(np.abs(values - mu) / delta)
    return -neg_ln_likelihood_laplacian 

guess_laplacian = [np.mean(values), np.std(values)]

mu_mle, delta_mle = optimize.minimize(lambda params: ln_likelihood_laplacian(params, values), guess_laplacian).x

print("Maximum likelihood estimate for Laplace location parameter and Laplace scale parameter (respectively) are:", mu_mle,"and", delta_mle)


Maximum likelihood estimate for Laplace location parameter and Laplace scale parameter (respectively) are: 4.0859516335725 and 0.8822692389909652


# 1.5.1 Compute the AIC for each model

In [16]:
# This could have been specified earlier but was used in-line instead. In the future, it may be more beneficial to define more variables for this exact reason.
output_gaussian = optimize.minimize(ln_likelihood_gaussian, guess_gaussian, args=(values, sigma))
output_laplacian = optimize.minimize(ln_likelihood_laplacian, guess_laplacian, args=(values,))

# Specify the number of parameters for each
params_laplacian = 2
params_gaussian = 1

# Calculate the AIC values
AIC_gaussian = -2 * -ln_likelihood_gaussian(output_gaussian.x, values, sigma) + 2 * params_gaussian + (2 * params_gaussian * (params_gaussian + 1)) / (len(values) - params_gaussian - 1)
AIC_laplacian = -2 * -ln_likelihood_laplacian(output_laplacian.x, values) + 2 * params_laplacian + (2 * params_laplacian * (params_laplace + 1)) / (len(values) - params_laplacian - 1)

print("The AIC for Gaussian Model:", AIC_gaussian)
print("The AIC for Laplacian Model:", AIC_laplacian)


NameError: name 'params_laplace' is not defined

# 1.5.2 Which model is favored by the data?

The Gaussian model shows the smaller AIC value so it is preferred. It has the fewer parameters so this checks out.