## Notebook 3: Cosmological parameters
___

#### This notebook is adapted from the course textbook:
#### Machine Learning for Physics and Astronomy by Viviana Acquaviva

---



#### Please upload your completed notebook to Canvas as an .ipynb file
#### Title the file as: LastName_notebook3.ipynb

### Original work statement:

Please write your name and the names of your collaborators in this cell.

Please be sure to cite sources along the way as appropriate.

### Your name:
#### Collaborators:

\You can edit this notebook directly by adding code and text cells as needed. As always, begin by importing the necessary packages.


---



# Problem 1: Visulaizing data with error bars

#### 1a. Import the data from the file sn_data.txt

This data set contains three columns.

The first column is the cosmological redshift (z) of the supernovae. Cosmological redshift is a unitless measure of very large distances that takes into consideration the motion of the object away from us (or expansion of the space between objects). Redshift 0, z = 0, is our local universe at present day. Increasing cosmological redshift represents our universe as we move further back in time, where we today are increasingly further away from objects at increasing redshift.

The second column is the luminosity distance to the supernovae, expressed in the units of Mpc (megaparsec). Mega = 1 Million; 1 parsec (pc) = 3.26 light years.

The third column contains the error on the luminosity distance, again expressed in Mpc.

#### 1b. Make an errorbar plot of this data set.

Place redshift (z) on the x-axis, luminosity distance on the y-axis, and use error on the luminosity distance. Add labels to the axes and don't forget the units where appropriate

#### 1c. Describe the trend that you see.

# Problem 2: Modeling data with a function

#### A few words on our model

The plot that you just made just shows the data. Now we want to build different models for the luminosity distance of these supernovae, and then choose the model that best describes the data.

Our model for the Luminosity Distance is a function of different parameters:

- z (the redshift) is the independent variable; in other words, our model will return a luminosity distance for any redshift value.

- $\Omega_{\rm m}$ is a cosmological parameter that describes the fraction of total matter in the universe, and it's a number between 0 and 1;

- $\Omega_{\Lambda}$ is a cosmological parameter that describes the fraction of dark energy in the universe; we can assume that $\Omega_{\rm m} + \Omega_{\Lambda}$ = 1 (so in our models the universe only contains matter and dark energy), which simplifies our model and we can use $\Omega_{\Lambda} = 1 - \Omega_{\rm m}$ in our function;

- $H_0$ is the Hubble constant, a cosmological parameter which gives the current rate of expansion of the universe; we will fix it in the beginning to be $H_0$ = 70 km/s/Mpc;

- ($1+z_f$) is a constant. It is essentially the inverse scale factor $a^{-1}$. $z_f$ (stands for z final) is also the upper limit of the integral. In other words, we will integrate from 0 to a value for $z_f$, and use that upper $z_f$ value for our scale factor.

- c is the speed of light in km/s; $c = 2.99 \times 10^5$. This is a constant of nature so we won't need to change it in our model.

Here comes our model:


\begin{align}
D_L(z,\Omega_{\rm m}) = \,(1+z_f) \frac{c}{H_0}\int_0^{z_f} \frac{dz}{\sqrt{\Omega_{\rm m}(1+z)^3 + \Omega_{\Lambda}}}
\end{align}

#### 2a. Define the function.

Notes:

i. $\Omega_{\Lambda}$ is not a parameter of the function, so make sure you write $\Omega_{\Lambda}$ in terms of $\Omega_{\rm m}$ only in the function.

ii. First define a function for the integrand. This will be passed to another function that performs the integral. Recall the scipy.integrate.quad function of the scipy package from our class notebook.
       
https://docs.scipy.org/doc/scipy-0.18.1/reference/tutorial/integrate.html


**hint: you may want to start your first function as something like:
       
def LumDist_integrand(z, zf, Omega_m, H0 = 70, ckms = 2.9979 $\times 10^5$) :


#### 2b. Answer the following:
What are the arguments of the LumDist_inregrand function? Which ones have default values?
#### 2c. Sanity Check:
Test your integrated luminosity function for the following inputs:\
zf=1.0, $\Omega_{\rm m}$=0.7 gives a luminosity distance ~ 5512\
zf=0.5, $\Omega_{\rm m}$=0.7 gives a luminosity distance ~ 2522

#### 2d. Vectorize the function.
Vectorize the luminosity distance function so we can pass it a 1D array of redshifts as input. We can typically pass an array directly to a function, but since we are integrating, we need to vectorize the function first. Check that it works by passing a numpy array of the supernova redshift data to your function.

#### 2e. Calculate and plot the models

In the same figure, plot the supernovae data with error bars as before, and overlay the luminosity distance values for the supernovae predicted by these three models:
    
Model 1: $\Omega_{\rm m} = 0.0$
    
Model 2: $\Omega_{\rm m} = 0.3$
    
Model 3: $\Omega_{\rm m} = 1.0$

Make sure you add labels for the models and the data points and to include a legend!

#### 2f. Answer the following questions:

i. What are the $\Omega_{\Lambda}$ values for the above models? Which model contains no dark energy at all?

ii. Judging by eye, which model seems to be the best fit to the data and why?

# Problem 3: Evaluating the models

Now we will define a function to evaluate how well the models fit the data. This is done through the  reduced $\chi^2$ (we will just call this $\chi^2$, "chi squared") function, that we mentioned in the "Fitting a line" class notebook.

#### 3a. Write a function that computes the reduced $\chi^2$.
The function will take three arrays as input.
1. an array containing the measured distances of the supernovae, $y$
2. an array of model predictions, $\hat{y}$,  ("y-hat")
3. another array containing the measured errors, $\sigma$

The formula is:

$$ \chi^2 = \sum_i \frac{(y_i-\hat{y_i})^2}{\sigma
_i^2}$$
    
   

#### 3b. Calculate the $\chi^2$ of your models from problem 2e.
What are $\chi^2$ scores? Which is evaluated as the best model?

(Hint: your model prediction array is the vector of luminosity distances for all the supernovae)

#### 3c. Calculate and store the $\chi^2$ values for different models with values of $\Omega_{\rm m}$ between 0 and 1, spaced every 0.05.

** Hint: Begin by creating an array that holds the $\Omega_{\rm m}$ values. Then you will want to create a variable to store all the $\chi^2$ values.

#### 3d. Create a scatter plot that has the values of $\Omega_{\rm m}$ on the x-axis, and the value of the $\chi^2$ on the y axis.

#### 3e. Find the model with the lowest $\chi^2$ and answer the questions:

i. Which value of $\Omega_{\rm m}$ corresponds to this model?

ii. Which value of $\Omega_{\Lambda}$ corresponds to this model?

iii. Based on your answers, is a non-zero value of dark energy supported by the data?

# Problem 4 (challenge): Multi-parameter model

#### 4a. Repeat the $\chi^2$ analysis for a  multi-parameter model:
Start by creating a function for a model that includes $H_0$.

\begin{align}
D_L(z,\Omega_{\rm m}, H_0) = \,(1+z_f) \frac{c}{H_0}\int_0^{z_f} \frac{dz}{\sqrt{\Omega_{\rm m}(1+z)^3 + \Omega_{\Lambda}}}
\end{align}

Then create an array for $\Omega_{\rm m}$ varying between 0 and 1 in 0.05 intervals and one for $H_0$, the Hubble constant, varying between 50 and 80 km/s/Mpc in intervals of 5 km/s/Mpc. Pass these to your multi-parameter model


#### 4b. What are the values of $\Omega_{\rm m}$ and $H_0$ that correspond to the lowest $\chi^2$ in this multi-parameter model?
How do your conclusions change in this case?

#### 4c. plot and compare the best modles
In the same figure, plot the supernovae data with error bars as before, and overlay your best model from problem 2e., your best model from problem 3e. and your best multi-parameter model. Label each model with the value it had for $\Omega_{\rm m}$ and $H_0$.

#### 4e. Visually inspect your best models.
Why do think it's important to use a statistical evaluation method such as the $\chi^2$ score when comparing models in addition to your own inspection?