# Data Analysis Homework 3
### Due date: Friday, November 1st 2024, 11AM

In [1]:
from __future__ import division
from IPython.display import HTML
from IPython.display import display
from scipy.optimize import *
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Question 1: Linear Regression, Curvature Matrix

Consider the data listed below,
\begin{equation}
\begin{array}{lcccccc}
\hline
{\rm frequency~(Hz)} &10&20&30&40&50&60\\
{\rm voltage~(mV)} &16&45&64&75&70&115\\
{\rm error~(mV)}   &5&5&5&5&30&5\\
\hline
{\rm frequency~(Hz)} &70&80&90&100&110&\\
{\rm voltage~(mV)} &142&167&183&160&221&\\
{\rm error~(mV)}   &5&5&5&30&5&\\
\hline
\end{array} 
\end{equation}

This data is also contained in the file 'linear_regression.csv'.

Required:
<br>
(i) Calculate the 4 elements of the curvature matrix.
<br>
(ii) Invert this to give the error matrix.
<br>
(iii) What are the uncertainties in the slope and intercept?
<br>
(iv) Comment on your answer.

### (i) Calculate the 4 elements of the curvature matrix.

In [3]:
data = pd.read_csv('linear_regression.csv')
frequencies = data.iloc[:,0]
voltages = data.iloc[:,1]
voltage_errors = data.iloc[:,2]

def one_i():
    '''Your function should return something of the form np.matrix([[a_cc,a_cm],[a_mc,a_mm]])'''
    '''where m is the slope and c the intercept'''
    # YOUR CODE HERE
    weights = 1 / voltage_errors**2
    a_cc = np.sum(weights)
    a_cm = np.sum(frequencies * weights)
    a_mc = np.sum(frequencies * weights)
    a_mm = np.sum(frequencies**2 * weights)
    curvature_matrix = np.matrix([[a_cc, a_cm], [a_mc, a_mm]])
    
    return(curvature_matrix)

print(one_i())

[[3.62222222e-01 2.05666667e+01]
 [2.05666667e+01 1.53788889e+03]]


In [4]:
'''TEST CELL - DO NOT DELETE'''
assert isinstance(one_i(), (list, tuple, np.ndarray)), \
    'Please make sure that the return value is list/array of floats.'
assert len(one_i()) == 2 , 'Please make sure you return a matrix with dimensions 2x2' 

### (ii) Invert this to give the error matrix.

In [6]:
data = pd.read_csv('linear_regression.csv')
frequencies = data.iloc[:,0]
voltages = data.iloc[:,1]
voltage_errors = data.iloc[:,2]

def one_ii():
    '''Your function should return something of the form np.matrix([[a_cc,a_cm],[a_mc,a_mm]])'''
    # YOUR CODE HERE
    curvature_matrix = one_i()
    inverted_matrix = np.linalg.inv(curvature_matrix)
    
    return(inverted_matrix)

print(one_ii())

[[ 1.14708117e+01 -1.53402734e-01]
 [-1.53402734e-01  2.70174453e-03]]


In [7]:
'''TEST CELL - DO NOT DELETE'''
assert isinstance(one_ii(), (list, tuple, np.ndarray)), \
    'Please make sure that the return value is list/array of floats.'
assert len(one_ii()) == 2 , 'Please make sure you return a matrix with dimensions 2x2' 

### (iii) What are the uncertainties in the slope and intercept?

In [9]:
data = pd.read_csv('linear_regression.csv')
frequencies = data.iloc[:,0]
voltages = data.iloc[:,1]
voltage_errors = data.iloc[:,2]

def one_iii():
    slope_uncertainty = 0
    intercept_uncertainty = 0
    # YOUR CODE HERE
    inverted_matrix = one_ii()
    slope_uncertainty = np.sqrt(inverted_matrix[1, 1])
    intercept_uncertainty = np.sqrt(inverted_matrix[0, 0])
    
    return(slope_uncertainty,intercept_uncertainty)

print(one_iii())

(0.05197830827721803, 3.3868586735217363)


In [10]:
'''TEST CELL - DO NOT DELETE'''
assert isinstance(one_iii(), (list, tuple, np.ndarray)), \
    'Please make sure that the return value is list/array of floats.'
assert len(one_iii()) == 2 , 'Please make sure you return a list/array of length 2' 

In [11]:
'''TEST CELL - DO NOT DELETE'''

'TEST CELL - DO NOT DELETE'

### (iv) Comment on your answer

The uncertainty in the slope is very small which is 0.05. It shows that the slope part of thefit is relatively stable.

The uncertainty in the intercept is larger which is 3.39. It shows that the intercept part of the fit has higher variability.

## Question 2: Using a calibration curve

A frequently encountered case where the correlation of the uncertainties must be taken into account is that of a calibration curve.  Consider the following set of measurements from an optical-activity experiment, where the angle of rotation of a plane-polarized light beam, $\theta$, is measured as a function of the independent variable, the concentration, $C$, of a sucrose solution. 

\begin{equation}
\begin{array}{lcccc}
\hline
C \mbox{ (g cm$^{-3}$)} &0.025&0.05&0.075&0.100\\
\theta \mbox{ (degrees)}&10.7&21.6&32.4&43.1\\
\hline
C \mbox{ (g cm$^{-3}$)}&0.125&0.150&0.175\\
\theta \mbox{ (degrees)}&53.9&64.9&75.4\\
\hline
\end{array} 
\end{equation}

The errors in the angle measurement are all $0.1^{\circ}$, the errors in the concentration are negligible.  A straight line  fit to the data yields  a gradient of $431.7\,^{\circ}\mbox{ g$^{-1}$ cm$^{3}$}$, and intercept $-0.03^{\circ}$. This data is contained in 'optical_activity.csv'.

<br>
Required:
<br>
(i) Show that the curvature matrix, $\mathsf{A}$, is given by 

\begin{equation}
\mathsf{A}=\left[\begin{array}{cc}
700\left((^{\circ})^{-2}\right)&70\left((^{\circ})^{-2}\mbox{g cm$^{-3}$}\right)\\
70\left((^{\circ})^{-2}\mbox{g cm$^{-3}$}\right)&8.75\left((\mbox{g/$^\circ$ cm$^{3})^2$}\right)\\
\end{array}\right] ,
\end{equation}


and that the error matrix  is 

\begin{equation}
\mathsf{C}=\left[\begin{array}{cc}
0.00714\left((^{\circ})^2\right)&-0.0571\left((^{\circ})^2\mbox{g$^{-1}$cm$^{3}$}\right)\\
-0.0571\left((^{\circ})^2\mbox{g$^{-1}$cm$^{3}$}\right)&0.571\left((^{\circ})^2\mbox{g$^{-2}$ cm$^{6}$}\right)\\
\end{array}\right] .
\end{equation}

The entry for the intercept is in the top left-hand corner, that for the gradient in the bottom right-hand corner.  
<br>
(ii) Calculate the associated correlation matrix.  

Use the  entries of the error matrix to answer the following  questions: 
<br>
(iii) What are the uncertainties in the best-fit intercept and gradient? 
<br>
(iv) What optical rotation is expected for a known concentration of $C=0.080g cm^{-3}$, and what is the uncertainty? 
<br>
(v) What is the concentration given a measured rotation of $\theta=70.3^{\circ}$ and what is the uncertainty?

### (i) Verify the curvature matrix and the error matrix above.

In [12]:
data = pd.read_csv('optical_activity.csv')
concentrations = data.iloc[:,0]
angles = data.iloc[:,1]
angle_errors = data.iloc[:,2]

def two_i():
    '''Your function should return something of the form np.matrix([[a_cc,a_cm],[a_mc,a_mm]]). Must return the curvature and error matricies)'''
    curvature_matrix = 0
    error_matrix = 0
    # YOUR CODE HERE
    weights = 1 / angle_errors**2
    a_cc = np.sum(weights)
    a_cm = np.sum(concentrations * weights)
    a_mc = np.sum(concentrations * weights)
    a_mm = np.sum(concentrations**2 * weights)
    curvature_matrix = np.matrix([[a_cc, a_cm], [a_mc, a_mm]])
    error_matrix = np.linalg.inv(curvature_matrix)
    
    return(curvature_matrix,error_matrix)

print(two_i())

(matrix([[700.  ,  70.  ],
        [ 70.  ,   8.75]]), matrix([[ 0.00714286, -0.05714286],
        [-0.05714286,  0.57142857]]))


In [13]:
'''TEST CELL - DO NOT DELETE'''
assert isinstance(two_i(), (list, tuple, np.ndarray)), \
    'Please make sure that the return value is list/array of floats.'
assert len(two_i()) == 2 , 'Please make sure you return a list/array of two matrices' 
assert len(two_i()[0]) == 2, 'Please make sure that the first entry is a 2x2 matrix' 
assert len(two_i()[1]) == 2, 'Please make sure that the second entry is a 2x2 matrix'  

In [14]:
'''TEST CELL - DO NOT DELETE'''

'TEST CELL - DO NOT DELETE'

### (ii) Calculate the associated correlation matrix.  

In [16]:
data = pd.read_csv('optical_activity.csv')
concentrations = data.iloc[:,0]
angles = data.iloc[:,1]
angle_errors = data.iloc[:,2]

def two_ii():
    '''Your function should return something of the form np.matrix([[a_cc,a_cm],[a_mc,a_mm]])'''
    # YOUR CODE HERE
    _, error_matrix = two_i()
    correlation_matrix = np.ones_like(error_matrix)
    for i in range(2):
        for j in range(2):
            correlation_matrix[i, j] = error_matrix[i, j] / np.sqrt(error_matrix[i, i] * error_matrix[j, j])
    
    return correlation_matrix
    
print(two_ii())    

[[ 1.         -0.89442719]
 [-0.89442719  1.        ]]


In [17]:
'''TEST CELL - DO NOT DELETE'''
assert isinstance(two_ii(), (list, tuple, np.ndarray)), \
    'Please make sure that the return value is a matrix.'
assert len(two_ii()) == 2 

### (iii) What are the uncertainties in the best-fit intercept and gradient? 

In [18]:
data = pd.read_csv('optical_activity.csv')
concentrations = data.iloc[:,0]
angles = data.iloc[:,1]
angle_errors = data.iloc[:,2]

def two_iii():
    '''Your function should return the uncertainty in the gradient and intercept'''
    gradient_uncertainty = 0
    intercept_uncertainty = 0
    # YOUR CODE HERE
    _, error_matrix = two_i()
    intercept_uncertainty = np.sqrt(error_matrix[0, 0])
    gradient_uncertainty  = np.sqrt(error_matrix[1, 1]) 
    
    return(gradient_uncertainty,intercept_uncertainty)

print(two_iii())

(0.7559289460184548, 0.08451542547285171)


In [19]:
'''TEST CELL - DO NOT DELETE'''
assert isinstance(two_iii(), (list, tuple, np.ndarray)), \
    'Please make sure that the return value is a list/array of floats.'
assert len(two_iii()) == 2 

In [20]:
'''TEST CELL - DO NOT DELETE'''

'TEST CELL - DO NOT DELETE'

### (iv) What optical rotation is expected for a known concentration of $C=0.080g cm^{-3}$, and what is the uncertainty? 

In [22]:
data = pd.read_csv('optical_activity.csv')
concentrations = data.iloc[:,0]
angles = data.iloc[:,1]
angle_errors = data.iloc[:,2]

def two_iv():
    '''Your function should return the angle and the uncertainty'''
    angle = 0
    uncertainty = 0
    # YOUR CODE HERE
    intercept = -0.03
    slope = 431.7
    C = 0.080
    _, error_matrix = two_i()
    
    angle = slope * C + intercept
    uncertainty = np.sqrt((C**2) * error_matrix[1, 1] + error_matrix[0, 0] + 2 * C * error_matrix[0, 1])
    
    return(angle,uncertainty)

print(two_iv())

(34.506, 0.04070801956792861)


In [23]:
'''TEST CELL - DO NOT DELETE'''
assert isinstance(two_iv(), (list, tuple, np.ndarray)), \
    'Please make sure that the return value is a list/array of floats.'
assert len(two_iv()) == 2 

In [24]:
'''TEST CELL - DO NOT DELETE'''

'TEST CELL - DO NOT DELETE'

### (v) What is the concentration given a measured rotation of $\theta=70.3^{\circ}$ and what is the uncertainty? You must return your answer in $gcm^{-3}$

In [25]:
data = pd.read_csv('optical_activity.csv')
concentrations = data.iloc[:,0]
angles = data.iloc[:,1]
angle_errors = data.iloc[:,2]

def two_v():
    '''Your function should return the concentration and uncertainty'''
    # YOUR CODE HERE
    intercept = -0.03
    slope = 431.7
    theta = 70.3
    _, error_matrix = two_i()
    
    # theta = intercept + slope * concentration
    concentration = (theta - intercept) / slope
    uncertainty = np.sqrt(concentration**2 * error_matrix[1, 1] + error_matrix[0, 0] +  2 * concentration * error_matrix[0, 1]) / slope
    
    return(concentration,uncertainty)

print(two_v())

(0.16291406069029418, 0.00014071939754941703)


In [26]:
'''TEST CELL - DO NOT DELETE'''
assert isinstance(two_v(), (list, tuple, np.ndarray)), \
    'Please make sure that the return value is a list/array of floats.'
assert len(two_v()) == 2 

In [27]:
'''TEST CELL - DO NOT DELETE'''

'TEST CELL - DO NOT DELETE'

## Question 3: Error bars from a $\chi^2$ minimisation to a non-linear function

In this question we will analyse the data shown in the figure below, which is an X-ray spectrum as a function of angle.

![title](diffraction.JPG)
 
The data is contained in the file 'LorentzianData.csv'. There are three columns: the angle, the signal (in counts per second), and the error.  The number of X-rays counted in 20 seconds was recorded.

The model to describe the data has four parameters:  the height of the Lorentzian lineshape, $S_0$; the angle at which the peak is centered, $\theta_{0}$;
 the angular width of the peak, $\Delta\theta$; and a constant background offset, $S_{\rm bgd}$. Mathematically, the signal, $S$, is of the form:
\begin{equation}
S=S_{\rm bgd}+\frac{S_{0}}{1+4\left(\frac{\theta-\theta_{0}}{\Delta\theta}\right)^2}.
\end{equation}

and the function is defined by lorentzian(theta, s_0, s_bgd,delta_theta,theta_0).

Required:
<br>
(i) Explain how the error in the count rate was calculated.
<br>
(ii) Perform a $\chi^2$ minimisation.  What are the best-fit parameters?
<br>
(iii) Evaluate the error matrix.
<br>
(iv) Calculate the correlation matrix.
<br>
(v) What are the uncertainties in the best-fit parameters?
<br>
(vi) If you can plot contour plots, show the $\chi^2$ contours for 
<br>
>(a) background vs. peak centre. 
<br>
>(b) background vs. peak width.  


These figures are shown in figure 6.11 of Hughes and Hase. Comment on the shape of the contours. Only your comment will be graded. 

### (i) Explain how the error in the count rate was calculated.

error in poisson distribution is
$$
    error = \frac{\sqrt{\text{total count}}}{20}
$$

### (ii) Perform a $\chi^2$ minimisation.  What are the best-fit parameters?

In [35]:
data = pd.read_csv('LorentzianData.csv') 

def lorentzian(theta, s_0, s_bgd,delta_theta,theta_0):
    return s_bgd+(s_0/(1+4*(((theta-theta_0)/delta_theta)**2)))

def three_ii():
    s_0 = 0
    s_bgd = 0
    delta_theta = 0
    theta_0 = 0
    covariance_matrix = 0
    angles = data.iloc[:,0]
    intensity = data.iloc[:,1]
    intensity_errors = data.iloc[:,2]
    # YOUR CODE HERE
    initial_guess = [7.5, 2, 4, 44]
    popt, pcov = curve_fit(lorentzian, angles, intensity, sigma=intensity_errors, p0=initial_guess)
    
    s_0, s_bgd, delta_theta, theta_0 = popt
    covariance_matrix = pcov
    
    return(s_0,s_bgd,delta_theta,theta_0,covariance_matrix)

print(three_ii())

(5.426329289784673, 1.4044042518420166, -0.9498809059868631, 44.39012005495169, array([[ 4.48044642e-02,  4.49950459e-04,  6.89979849e-03,
         2.50865194e-05],
       [ 4.49950459e-04,  9.19630350e-04,  7.78118610e-04,
        -2.60290301e-06],
       [ 6.89979849e-03,  7.78118610e-04,  2.33235657e-03,
         8.62734693e-07],
       [ 2.50865194e-05, -2.60290301e-06,  8.62734693e-07,
         2.68733296e-04]]))


In [36]:
'''TEST CELL - DO NOT DELETE'''
assert isinstance(three_ii(), (list, tuple, np.ndarray)), \
    'Please make sure that the return value is a list/array.'
assert len(three_ii()) == 5 , 'Please make sure that you return five values' 

### (iii) Evaluate the error matrix.

In [37]:
data = pd.read_csv('LorentzianData.csv') 

def three_iii():
    '''Your function should return something of the form np.matrix([4x4])'''
    # YOUR CODE HERE
    _, _, _, _, error_matrix = three_ii()

    return(error_matrix)

print(three_iii())

[[ 4.48044642e-02  4.49950459e-04  6.89979849e-03  2.50865194e-05]
 [ 4.49950459e-04  9.19630350e-04  7.78118610e-04 -2.60290301e-06]
 [ 6.89979849e-03  7.78118610e-04  2.33235657e-03  8.62734693e-07]
 [ 2.50865194e-05 -2.60290301e-06  8.62734693e-07  2.68733296e-04]]


In [38]:
'''TEST CELL - DO NOT DELETE'''
assert isinstance(three_iii(), (list, tuple, np.ndarray)), \
    'Please make sure that the return value is a matrix.'
assert len(three_iii()) == 4 , 'Please make sure that you return a 4x4 matrix' 

### (iv) Calculate the correlation matrix.

In [39]:
data = pd.read_csv('LorentzianData.csv') 

def three_iv():
    '''Your function should return something of the form np.matrix([[a_cc,a_cm],[a_mc,a_mm]])'''
    # YOUR CODE HERE
    error_matrix = three_iii()
    correlation_matrix = np.ones_like(error_matrix)
    
    for i in range(error_matrix.shape[0]):
        for j in range(error_matrix.shape[1]):
            correlation_matrix[i, j] = error_matrix[i, j] / np.sqrt(error_matrix[i, i] * error_matrix[j, j])
    
    return(correlation_matrix)

print(three_iv())

[[ 1.          0.07009667  0.67496039  0.00722968]
 [ 0.07009667  1.          0.53130228 -0.00523589]
 [ 0.67496039  0.53130228  1.          0.00108973]
 [ 0.00722968 -0.00523589  0.00108973  1.        ]]


In [40]:
'''TEST CELL - DO NOT DELETE'''
assert isinstance(three_iv(), (list, tuple, np.ndarray)), \
    'Please make sure that the return value is a matrix.'
assert len(three_iv()) == 4 , 'Please make sure that you return a 4x4 matrix' 

### (v) What are the uncertainties in the best-fit parameters?

In [41]:
data = pd.read_csv('LorentzianData.csv') 

def three_v():
    uncertainty_s_0 = 0
    uncertainty_s_bgd = 0
    uncertainty_delta_theta = 0
    uncertainty_theta_0 = 0
    # YOUR CODE HERE
    error_matrix = three_iii()
    
    uncertainty_s_0 = np.sqrt(error_matrix[0, 0])
    uncertainty_s_bgd = np.sqrt(error_matrix[1, 1])
    uncertainty_delta_theta = np.sqrt(error_matrix[2, 2])
    uncertainty_theta_0 = np.sqrt(error_matrix[3, 3])
    
    return(uncertainty_s_0,uncertainty_s_bgd,uncertainty_delta_theta,uncertainty_theta_0)

print(three_v())

(0.21167065032572505, 0.030325407664183294, 0.04829447764357627, 0.016393086821964532)


In [42]:
'''TEST CELL - DO NOT DELETE'''

'TEST CELL - DO NOT DELETE'

### (vi) These contours are shown in figure 6.11 of Hughes and Hase. Comment on the shape of the contours.

For Peak Centre vs Background, the shape of the contours is like circle which means the weak correlation between them.

For Peak width vs Background, the shape of the contours is like ellipse which means there is a certain correlation between them.

## Question 4: Prove the following properties:

Assume in this question that the uncertainties in $A$ and $B$ are correlated.
<br>
(i) If $Z=A\pm B$, show that
${\displaystyle\alpha_{Z}^2=\alpha_{A}^2+\alpha_{B}^2\pm2\alpha_{AB}}$.
<br>
(ii) If $Z=A\times B$, show that
 ${\displaystyle\left(\frac{\alpha_Z}{Z}\right)^2=\left(\frac{\alpha_A}{A}\right)^2+\left(\frac{\alpha_B}{B}\right)^2+2\left(\frac{\alpha_{AB}}{AB}\right)}$.
<br>
(iii) If ${\displaystyle Z=\frac{A}{B}}$, show that
${\displaystyle\left(\frac{\alpha_Z}{Z}\right)^2=\left(\frac{\alpha_A}{A}\right)^2+\left(\frac{\alpha_B}{B}\right)^2-2\left(\frac{\alpha_{AB}}{AB}\right)}$.

(i)  
$\frac{\partial Z}{\partial A}=1,\quad\frac{\partial Z}{\partial B}=1$  
$\alpha_Z^2=\left(\frac{\partial Z}{\partial A}\right)^2\alpha_A^2+\left(\frac{\partial Z}{\partial B}\right)^2\alpha_B^2+2\frac{\partial Z}{\partial A}\frac{\partial Z}{\partial B}\alpha_{AB}$ 
so  
${\displaystyle\alpha_{Z}^2=\alpha_{A}^2+\alpha_{B}^2\pm2\alpha_{AB}}$

(ii)  
$\frac{\partial Z}{\partial A}=B,\quad\frac{\partial Z}{\partial B}=A$  
$\alpha_Z^2=\left(\frac{\partial Z}{\partial A}\right)^2\alpha_A^2+\left(\frac{\partial Z}{\partial B}\right)^2\alpha_B^2+2\frac{\partial Z}{\partial A}\frac{\partial Z}{\partial B}\alpha_{AB}$  
so  
${\displaystyle\left(\frac{\alpha_Z}{Z}\right)^2=\left(\frac{\alpha_A}{A}\right)^2+\left(\frac{\alpha_B}{B}\right)^2+2\left(\frac{\alpha_{AB}}{AB}\right)}$.

(iii)  
$\frac{\partial Z}{\partial A}=\frac{1}{B},\quad\frac{\partial Z}{\partial B}=-\frac{A}{B^2}$  
$\alpha_Z^2=\left(\frac{\partial Z}{\partial A}\right)^2\alpha_A^2+\left(\frac{\partial Z}{\partial B}\right)^2\alpha_B^2+2\frac{\partial Z}{\partial A}\frac{\partial Z}{\partial B}\alpha_{AB}$  
so  
${\displaystyle\left(\frac{\alpha_Z}{Z}\right)^2=\left(\frac{\alpha_A}{A}\right)^2+\left(\frac{\alpha_B}{B}\right)^2-2\left(\frac{\alpha_{AB}}{AB}\right)}$.