# Assignment 1 - SED Fitting

## Error estimation

This assignment is split into 3 sections, roughly corresponding to the contents of each of the 3 weeks in the Error Estimation module

The SETUP section is designed to be done first, to familarize yourself with the data and the details of SED fitting. Section 1 is shorter to account for this

Feel free to write functions in a separate module and import them here if you like

# SETUP

- Download the data from [here](https://irfu.cea.fr/Pisp/yu-yen.chang/sw.html) (both the input and output catalog). These come from [Chang et al. (2015)](https://ui.adsabs.harvard.edu/abs/2015ApJS..219....8C/abstract)

- Install [Prospector](https://github.com/bd-j/prospector) (we found this easiest to do using the [conda script](https://github.com/bd-j/prospector/blob/main/conda_install.sh) provided)

- Pick a galaxy in the *input* data (this can be any row with FLAG=1)

- Use prospector to fit an SED model (pick any model you like) to your chosen galaxy

    - To do this we recommend following the [quickstart guide](https://prospect.readthedocs.io/en/latest/quickstart.html) in the prospector documentation and adapting to our data 


- Now repeat for a few different galaxies (try to pick a range of magnitudes, redshifts, etc)

- Try using a model with a fixed redshift (using the spectroscopic redshift in the catalog) vs fitting the redshift from the photometry

########

###Do we want to provide a working example for our data?

########

In [None]:
## python assignment_params.py --objid=33 --optimize --emcee --outfile=test

In [11]:
import prospect.io.read_results as reader
import numpy as np
import matplotlib.pyplot as plt

In [12]:
file_name = 'test_24Jan10-12.05_result.h5'
res, obs, model = reader.results_from(file_name)
results_type = "emcee" # | "dynesty"
randint = np.random.randint

sps = reader.get_sps(res)


if results_type == "emcee":
## ANI NOTES you can choose this at random?
    
    nwalkers, niter = 2, 2
    theta = res['chain'][randint(nwalkers), randint(niter)]
else:
    theta = res["chain"][randint(len(res["chain"]))]

imax = np.argmax(res['lnprobability'])

if results_type == "emcee":
    i, j = np.unravel_index(imax, res['lnprobability'].shape)
    theta_max = res['chain'][i, j, :].copy()
    thin = 5
else:
    theta_max = res["chain"][imax, :]
    thin = 1

a = 1.0 + model.params.get('zred', 0.0) # cosmological redshifting
# photometric effective wavelengths
wphot = obs["phot_wave"]
# spectroscopic wavelengths
# *restframe* spectral wavelengths, since obs["wavelength"] is None
wspec = sps.wavelengths
wspec *= a #redshift them
xmin, xmax = np.min(wphot)*0.8, np.max(wphot)/0.8
initial_spec, initial_phot, initial_mfrac = model.sed(theta, obs=obs, sps=sps)

temp = np.interp(np.linspace(xmin,xmax,10000), wspec, initial_spec)
ymin, ymax = temp.min()*0.8, temp.max()/0.4


In [23]:
initial_spec, initial_phot, initial_mfrac = model.sed(theta, obs=obs, sps=sps)

# generate model
prediction = model.mean_model(theta, obs=obs, sps=sps)
pspec, pphot, pfrac = prediction

# SECTION 1

- #### Goodness of fit of Prospector examples

In [None]:
### Plot the data for a chosen galaxy (with error bars) 
### Flux or magnitude vs band wavelength or index

### Add the best fit model to the plot

### Compute the goodness-of-fit (chi squared)
### (Prospector assumes the magnitudes are independent so you can to, but we'll come back to this later)

In [None]:
### Is the goodness of fit reasonable (why?)

In [None]:
### What is the best fit, mean, and 1/2/3 sigma confidence intervals for each of the constrained parameters
### Are they consistent with the results from Chang et al (the output file linked above)?
### How similar do we expect them to be?

In [None]:
### How did allowing the redshift as a free parameter change the results? did you get the same mass? is the redshift correct?

# SECTION 2 

- #### Covariance of contrained parameters (Gaussian assumption)

In [None]:
### First lets look at the covariance of constrained parameters
### Plot a corner plot of the prospector outputs, showing the 68% and 95% 2D contours
### (You will want to use one of the MCMC methods in the prospector fitting ...
###    we will discuss this more in the MCMC section. For now we can assume that the density of ...
###    output samples at a given location in parameter space, is proportional to the probability of ...
###    those parameters, given the data and model )

In [None]:
### Are there any degeneracies between parameters in the fit?
### What does this mean?

In [None]:
### Make a covarinace matrix of the fitted parameters (describing the uncertainties and their covarinace with each other) 
### Plot it
### Looking at the contour plot, was this a reasonable thing to do?

- #### Covariance of magnitude errors

In [None]:
### So far, prospector has assumed the uncertainties in the magnitudes/fluxes are independent of each other 
### In practice this might not be true
### For this excercise, assume the correlation between the flux in each band is X%
### Plot the covariance matrix, with and without the correlated errors

In [None]:
### Re-compute the goodness of fit with the correlated errors

In [None]:
### Did the goodness-of-fit get worse or better, why?

# SECTION 3 

Estimating error bars and uncertainty of distributions (shot noise, bootstrap)

In [None]:
### For this section we will use the output catalogs, since running prospector on all 
### 800000 galaxies would be a waste of computing for this class

### Plot a histogram of a given measured quantity (stellar mass, redshift, etc) 
### Choose the range and bin size appropriately so that we can see the full distribution

In [None]:
### There are a limited number of objects in each histogram bin
### Add shot noise (Poisson) to the histogram bars to show this
### These with be your "analytic" error bars 

In [None]:
### Now split the data into N subsets and compute the jackknife covarinace of the histogram bins 
### How does it compare to the analytic errors
### Are the bins independent?
### What if you make N very large or very small


In [None]:
### Now repeat this excercise, replacing the histogram with a calculation of the mean stellar 
### mass as a function of redshift (i.e. Split the data into redshift bins, and compute the mean mass in each)
### Use Jackknife to get the errors
### How to these compare with the standard error of the mean?