Fitting catalogues of data with Bagpipes
================================

Commonly, we wish to fit a whole catalogue of observations of different objects (e.g. the Guo et al. (2013) [CANDELS GOODS South catalogue](https://archive.stsci.edu/prepds/candels) used in the previous examples). 

One approach would be to wrap the fitting commands from the previous three examples in a for loop, however Bagpipes provides a [catalogue fitting interface through the fit_catalogue class](https://bagpipes.readthedocs.io/en/latest/fitting_catalogues.html), which makes things easier. In addition, several options for MPI parallelisation are provided.

Setting up
------------

We'll use the setup from Example 3 to demonstrate how catalogue fitting works. First of all let's copy in the load_data function and generate the fit instructions dictionary.

In [1]:
import numpy as np 
import bagpipes as pipes

from astropy.io import fits

def load_goodss(ID):
    """ Load CANDELS GOODS South photometry from the Guo et al. (2013) catalogue. """

    # load up the relevant columns from the catalogue.
    cat = np.loadtxt("hlsp_candels_hst_wfc3_goodss-tot-multiband_f160w_v1-1photom_cat.txt",
                     usecols=(10, 13, 16, 19, 25, 28, 31, 34, 37, 40, 43, 46, 49, 52, 55,
                              11, 14, 17, 20, 26, 29, 32, 35, 38, 41, 44, 47, 50, 53, 56))
    
    # Find the correct row for the object we want.
    row = int(ID) - 1

    # Extract the object we want from the catalogue.
    fluxes = cat[row, :15]
    fluxerrs = cat[row, 15:]

    # Turn these into a 2D array.
    photometry = np.c_[fluxes, fluxerrs]

    # blow up the errors associated with any missing fluxes.
    for i in range(len(photometry)):
        if (photometry[i, 0] == 0.) or (photometry[i, 1] <= 0):
            photometry[i,:] = [0., 9.9*10**99.]
            
    # Enforce a maximum SNR of 20, or 10 in the IRAC channels.
    for i in range(len(photometry)):
        if i < 10:
            max_snr = 20.
            
        else:
            max_snr = 10.
        
        if photometry[i, 0]/photometry[i, 1] > max_snr:
            photometry[i, 1] = photometry[i, 0]/max_snr

    return photometry

goodss_filt_list = np.loadtxt("filters/goodss_filt_list.txt", dtype="str")


exp = {}                                  
exp["age"] = (0.1, 15.)
exp["tau"] = (0.3, 10.)
exp["massformed"] = (1., 15.)
exp["metallicity"] = (0., 2.5)

dust = {}
dust["type"] = "Calzetti"
dust["Av"] = (0., 2.)

fit_instructions = {}
fit_instructions["redshift"] = (0., 10.)
fit_instructions["exponential"] = exp   
fit_instructions["dust"] = dust

Basic catalogue fitting
--------------------------

In the most basic case all you need is a list of IDs. You can pass this, along with fit_instructions and load_data, to fit_catalogue. Fitting is begun by calling the fit function in the same way as you would for the ordinary fit class. Let's start by fitting the first three objects in the Guo et al. catalogue.

In [2]:
IDs = np.arange(1, 4)

fit_cat = pipes.fit_catalogue(IDs, fit_instructions, load_goodss, spectrum_exists=False,
                              cat_filt_list=goodss_filt_list, run="guo_cat")

fit_cat.fit(verbose=False)


Bagpipes: fitting object 1


Completed in 253.3 seconds.

Parameter                          Posterior percentiles
                                16th       50th       84th
----------------------------------------------------------
dust:Av                        0.846      0.999      1.205
exponential:age                1.308      1.481      1.847
exponential:massformed        10.626     10.689     10.756
exponential:metallicity        0.678      1.535      2.127
exponential:tau                0.318      0.366      0.479
redshift                       0.474      0.504      0.527






Bagpipes: 1 out of 3 objects completed.

Bagpipes: fitting object 2


Completed in 207.0 seconds.

Parameter                          Posterior percentiles
                                16th       50th       84th
----------------------------------------------------------
dust:Av                        0.148      0.319      0.506
exponential:age                2.974      3.267      3.467
exponential:massformed        10.263     10.366     10.441
exponential:metallicity        2.256      2.407      2.475
exponential:tau                1.180      1.656      3.618
redshift                       1.762      1.850      1.918


Bagpipes: 2 out of 3 objects completed.

Bagpipes: fitting object 3


Completed in 210.9 seconds.

Parameter                          Posterior percentiles
                                16th       50th       84th
----------------------------------------------------------
dust:Av                        0.519      0.616      0.685
exponential:age                2.156 

## Output catalogues

A summary catalogue will automatically be saved under pipes/cats/run_name.fits.

More complex options
--------------------------

There are a few other options that might come in handy. For example, if you have a list of spectroscopic redshifts for the objects you're fitting you might wish to fix the redshift of each fit to a different value. You can do this by passing an array of redshift values as the redshifts keyword argument.

In [8]:
redshifts = np.ones(IDs.shape)

cat_fit = pipes.fit_catalogue(IDs, fit_instructions, load_goodss, spectrum_exists=False,
                              cat_filt_list=goodss_filt_list, run="guo_cat_redshift_1",
                              redshifts=redshifts)

If instead you want to vary the redshift within a small range around the input redshift you can additionally pass a float to the redshift_sigma keyword argument. This will cause the redshift for each object to be fitted with a Gaussian prior centred on the value passed in redshifts with the specified standard deviation. The maximum deviation allowed from the value in redshifts is three times the given redshift_sigma.

## Varying the filt_list

Finally, if you have a bunch of different objects with different photometry that you want to fit with the same model you can pass a list of filter lists to catalogue_fit as the cat_filt_list keyword argument. If you do this you need to set the vary_filt_list keyword argument to True, and the code will expect the first entry in cat_filt_list to be the filter list for the first object and so on. We can set this up using the same filter list for each object just to demonstrate:

In [10]:
list_of_filt_lists = [goodss_filt_list] * 10

cat_fit = pipes.fit_catalogue(IDs, fit_instructions, load_goodss, spectrum_exists=False,
                              cat_filt_list=list_of_filt_lists, run="guo_cat_vary_filt_list",
                              redshifts=redshifts, redshift_sigma=0.05, vary_filt_list=True)

## MPI Parallelisation

Fit_catalogue supports mpi parallelisation in the same way as fit (see Example 3). In addition it is also possible to request that fit_catalogue assigns a different object to each of the available cores, fitting multiple objects at once. This is faster for running on large numbers of cores or fitting simple models to large photometric catalogues, however the individual fits will take longer. This can be achieved by setting the mpi_serial keyword argument of fit_catalogue to True. [A slightly modified version of pymultinest](https://www.github.com/ACCarnall/pymultinest) is currently required to run bagpipes in this way.