Physics 474 - Spring 2023 <br>
Homework 2 - Fitting Data, Parameter Estimation and Confidence Interval

<font color='red'>Author: 

_________________________________________________________________________________
In this homework we will practice fitting a function with parameters to some data. 
In addition, we will place some emphisis also on determining the confidence interval for
the fit parameters.

skills we will excercise:
- reading in data
- plotting data
- writing user defined functions
- fitting a function to data with 'curve_fit'
- calculating $\chi^2$ and $\chi^2$ probaility
- plotting residuals
- dtermining confidence intervals using the $\Delta \chi^2$
- analyzing data and making observations

__________________________________________________________________________

We will be investgating something called the Cosmic Microwave Background or CMB for short.

Quoted from Wikipedia:

_"In Big Bang cosmology the cosmic microwave background (CMB, CMBR) is electromagnetic radiation that is a remnant from a primordial stage of the universe, also known as "relic radiation". The CMB is faint cosmic background radiation filling all space. It is an important source of data on the early universe because it is the oldest electromagnetic radiation in the universe, dating to the epoch of recombination when the first atoms were formed. With a standard optical telescope, the background space between stars and galaxies is almost completely dark. However, a sufficiently sensitive radio telescope detects a faint background glow that is almost uniform and is not associated with any star, galaxy, or other object. This glow is strongest in the microwave region of the radio spectrum. The accidental discovery of the CMB in 1965 by American radio astronomers Arno Penzias and Robert Wilson was the culmination of work initiated in the 1940s, and earned the discoverers the 1978 Nobel Prize in Physics."_


We will be using measurements of CMB microwave intensity (W/m^2/sr/Hz) vs microwave frequency (GHz)
compiled around 2000 in the source 

**Salvaterra and Burigana, 2000 arXiv:astro-ph/0206350**

I have compiled the data into an easily read NumPy binary data file

"CMB_Intensity_Data.npz"

We will use this data to:
- plot the intensity vs frequency
- fit the data to a blackbody intensity vs frequency function to estimate the best-fit Temperature
- plot the data along with the best fit
- plot the residuals (fractional residuals in this particular case)
- calcultae the best-fit $\chi^2$
- use the $\Delta \chi^2$ as a function of temperature to estimate the confidence intervals for fit temperature
_________________________________________________________________________________________

Background for the problem:

Black-Body Radiation

Every physical body (including the CMB surface) of temperature $T$ spontaneously and continuously emits electromagnetic radiation of radiance $I(f;T)$ which describes the spectral emissive power per unit area, per unit solid angle, per unit frequency for particular radiation frequencies $f$. The relationship is given by Planck's radiation law

$ \,\,\,\,\,\,\,{\Large I(f;T) = \frac{2h}{c^2}\frac{f^3}{e^{hf/kT}-1} }$

where $h$ is Plank's constant, $k$ is Boltzman's constant and $c$ is the speed of light.
___________________________________________________________________

You are given data for measurments of $I$ vs frequency $f$ and are being asked to find the temperature $T$ that gives the best-fir between Plank's law and the data. You will then compare the best-fit to the data and estimate the confidence interval on the parameter $T$.
___________________________________________________________________________


Part 1) (2 pts)

Read in the datafile 'CMB_Intensity_Data.npz' and print the "keys" in the file

e.g.

filename = 'CMB_Intensity_Data.npz'<br>
Data = np.load(filename)<br>
print(Data.files)

In [107]:
# Your code...
from math import exp
import numpy as np
import pandas as pd
import plotly.express as pex
import plotly.graph_objects as go
from scipy.optimize import curve_fit
import scipy.stats as st 


## define constants 

h=6.626e-34
k=1.38e-23
c=3.0e8

## useful functions

planck_radiation_law=lambda f,T: (2*h/c**2)*(f**3/(np.exp(h*f/k*T)-1))

def chi_squared(theory:np.array,data:np.array,sigma:np.array)->np.array:
    """
    This function calculates TOTAL chi-squared between Theory and Data using sigma 
    as errors.
    The 3 arrays must be of equal size.
    Note: This is NOT reduced chi-squared
    Usage:  
     inputs: theory = input hypothesis (or Theory)
             data = Data points
             sigma = uncertainty on data points
     output: if arrays are of equal size returns the TOTAL chi-squared
             if arrays are not of equal size returns -1.0
    """
    if np.size(theory)==np.size(data) and np.size(data)==np.size(sigma):
        return np.sum((theory-data)**2/sigma**2)
    else:
        print('error - arrays of unequal size')
        return -1.

## import data

filename='CMB_Intensity_Data.npz'
cmb_intensity_data=np.load(f'data/{filename}')
cmb_intensity_data.files


['description', 'frequency', 'intensity', 'error']

Print the 'description' 

e.g. 

print(Data['description'])

In [108]:
## print description of data
cmb_intensity_data['description']

array(['-----------------------------------------------',
       'This file contains data for Cosmic Microwave Background (CMB)',
       'Data compiled from salvaterra and burigana, 2000 arXiv:astro-ph/0206350',
       'Intensity measurements versus microwave frequency',
       'The data is given as frequency (GHz)',
       'CMB Intensity (W/m^2/sr/Hz) and error on Intensity',
       'description = this text describing data',
       'frequency = Frequency of mesurement in GHz',
       'intensity = CMB Intensity (W/m^2/sr/Hz)',
       'error = estimated experimental uncertainty of intensity in same units',
       'NOTE:', 'Removed 1 data point f= 113.6 Ghz T=2.279 K',
       'and used delta_T = 0.025K in place of 0.01K for last 40 points',
       '--------------------------------------------------'], dtype='<U71')

Part 2)  (3 pts)

Make a plot of the data with errorbars vs frequency [GHz]
- use log scale for both x and y axes
- set x-ticks at 0.1, 1, 10, 100, 1000 GHz

In [117]:
fig = go.Figure(data=go.Scatter(
        x=cmb_intensity_data['frequency'],
        y=cmb_intensity_data['intensity'],
        error_y=dict(
            type='data', # value of error bar given in data coordinates
            array=cmb_intensity_data['error'],
            color='orange')
   ,mode='markers' ))


fig.update_layout(
    title="Intensity vrs. Frequency: Spectral Density of Blackbody",
    xaxis_title="Frequency (Ghz)",
    yaxis_title="Intensity (W/m^2/sr/Hz)",
    legend_title="error",
    font=dict(
        family="Courier New, monospace",
        size=10,
        color="RebeccaPurple"
    ),
      xaxis = dict(
        tickmode = 'array',
        tickvals = [.1,1,10,100,1000],
    )
)


fig.update_xaxes(type="log")
fig.update_yaxes(type="log")


fig.show()

______________________________________________________________
Part 3a) (3 pts)

fit the data above to Plank's law to get the best fit temperature
- print the best fit temperature
- print the estimated 1-$\sigma$ error returned by curve_fit
_____________________________________________________________

In [106]:
# Your code...
T0 = 2.0
poptT, pcovT=curve_fit(f=planck_radiation_law,
                        xdata=cmb_intensity_data['frequency']*1e9,
                        ydata=cmb_intensity_data['intensity'],
                        p0=T0,
                        sigma=cmb_intensity_data['error'],
                        absolute_sigma=True)

poptT


array([0.36626359])

____________________________________________________________________
Part 3b) (2 pts)

Calculate and print the $\chi^2$, reduced-$\chi^2$, and $\chi^2$ probability

______________________________________________________________

In [None]:
# Your code...

_________________________________________________________________________
Observations:

_______________________________________________________________________

Part 4) (5 pts)

Make a single figure with two subplots
- top: data with errorbars and best fit curve vs frequency (same parameters as plot above)
- bottom: fractional residuals (i.e residual/Radiance = (data-Fit)/fit) with errorbar (only log on x-scale)


_________________________________________________________________________

In [None]:
# Your code...

_____________________________________________________________________
Observations:

____________________________________________________________________

Part 5) (5 pts)

Now calculate the $\Delta\chi^2$ vs temperature for $\pm \, 4\sigma$ (based on return of curve_fit) around the best-fit temperature $T_{fit}$. That is:

$\Delta\chi^2 (T) = \chi^2 (T) - \chi^2 (T_{fit})$

make a plot
- $\Delta\chi^2 (T)$ vs T
- vertical line at $T_{fit}$
- vertical dotted line at currrent world best estimate of $T=2.72548 \, K$
- horizontal lines at $\Delta\chi^2$ of 1, 4, 9 

______________________________________________________________________

In [None]:
# Your code...

_______________________________________________________________________
Summary and Conclusions:

(comment on)
- what do the horizontal lines at $\Delta\chi^2$ of 1, 4, 9 represent?
- what are the approximate 68% and 95% confidence intevals on the fit temperature from this data?
- what do these confidence intervals represent?
- how do the $\Delta\chi^2$ of 1, 4, 9 compare to the estimate of the 1-sigma error retuned by curve fit?


_________________________________________________________________________