Fitting catalogues of data with Bagpipes
================================

Commonly, we wish to fit a whole catalogue of observations of different objects (e.g. the Guo et al. (2013) [CANDELS GOODS South catalogue](https://archive.stsci.edu/prepds/candels) used in the previous examples). 

One approach would be to wrap the fitting commands from the previous three examples in a for loop, however Bagpipes provides a [catalogue fitting interface through the fit_catalogue class](https://bagpipes.readthedocs.io/en/latest/fitting_catalogues.html), which makes things easier. One advantage of doing catalogue fitting this way is the ability for different objects to automatically be parcelled out to identical processes running on different cores, in effect parallelising the catalogue fitting.

Setting up
------------

We'll use the setup from Example 3 to demonstrate how catalogue fitting works. First of all let's copy in the load_data function and generate the fit instructions dictionary.

In [1]:
import numpy as np 
import bagpipes as pipes

from astropy.io import fits

def load_goodss(ID):
    """ Load CANDELS GOODS South photometry from the Guo et al. (2013) catalogue. """

    # load up the relevant columns from the catalogue.
    cat = np.loadtxt("hlsp_candels_hst_wfc3_goodss-tot-multiband_f160w_v1-1photom_cat.txt",
                     usecols=(10, 13, 16, 19, 25, 28, 31, 34, 37, 43, 46, 49, 52, 55,
                              11, 14, 17, 20, 26, 29, 32, 35, 38, 44, 47, 50, 53, 56))
    
    # Find the correct row for the object we want.
    row = int(ID) - 1

    # Extract the object we want from the catalogue.
    fluxes = cat[row, :14]
    fluxerrs = cat[row, 14:]

    # Turn these into a 2D array.
    photometry = np.c_[fluxes, fluxerrs]

    # blow up the errors associated with any missing fluxes.
    for i in range(len(photometry)):
        if (photometry[i, 0] == 0.) or (photometry[i, 1] <= 0):
            photometry[i,:] = [0., 9.9*10**99.]
            
    # Enforce a maximum SNR of 20, or 10 in the IRAC channels.
    for i in range(len(photometry)):
        if i < 10:
            max_snr = 20.
            
        else:
            max_snr = 10.
        
        if photometry[i, 0]/photometry[i, 1] > max_snr:
            photometry[i, 1] = photometry[i, 0]/max_snr

    return photometry

goodss_filt_list = np.loadtxt("filters/goodss_filt_list.txt", dtype="str")


exp = {}                                  
exp["age"] = (0.1, 15.)
exp["tau"] = (0.3, 10.)
exp["massformed"] = (1., 15.)
exp["metallicity"] = (0., 2.5)

dust = {}
dust["type"] = "Calzetti"
dust["Av"] = (0., 2.)

fit_instructions = {}
fit_instructions["redshift"] = (0., 10.)
fit_instructions["exponential"] = exp   
fit_instructions["dust"] = dust

Basic catalogue fitting
--------------------------

In the most basic case all you need is a list of IDs. You can pass this, along with fit_instructions and load_data, to fit_catalogue fit and then call the fit function in the same way as you would for the ordinary fit class. Let's start by fitting the first five objects in the Guo et al. catalogue.

In [2]:
IDs = np.arange(1, 6)

fit_cat = pipes.fit_catalogue(IDs, fit_instructions, load_goodss, spectrum_exists=False,
                              cat_filt_list=goodss_filt_list, run="guo_cat")

fit_cat.fit(verbose=False)


Bagpipes: fitting object 1


Completed in 259.8 seconds.

Parameter                          Posterior percentiles
                                16th       50th       84th
----------------------------------------------------------
dust:Av                        0.850      0.997      1.201
exponential:age                1.309      1.495      1.875
exponential:massformed        10.628     10.692     10.759
exponential:metallicity        0.643      1.522      2.139
exponential:tau                0.318      0.370      0.493
redshift                       0.473      0.504      0.527



Bagpipes: fitting object 2


Completed in 159.1 seconds.

Parameter                          Posterior percentiles
                                16th       50th       84th
----------------------------------------------------------
dust:Av                        0.149      0.314      0.494
exponential:age                2.992      3.276      3.467
exponential:massformed        10.267     10.369     10.446

The real advantage here is that if you set another instance of these commands running elsewhere, Bagpipes will automatically share the objects in the catalogue out between these two (or more) processes. Processes can be started and stopped at any time and everything should carry on working.

The only exception is starting more than one process at exactly the same time which can lead to conflicts. If you're setting a large number of parallel processes going at once, try adding a small random time delay to the beginning of the file to avoid this.


## Merging fit_catalogue outputs

A summary output catalogue will be generated in the pipes/cats directory automatically every time a process finishes fitting a batch of ten objects. To do this manually you can run the pipes.catalogue.merge command as follows:


In [3]:
pipes.catalogue.merge("guo_cat")

Bagpipes: 5 out of 5 objects completed.



## Cleaning partially completed objects
    
If the code crashes or is interruped in the middle of fitting an object, the code will see these objects as completed and not try to fit them again. To fix this, you can run the pipes.catalogue.clean command, specifying the run, which will kill all running processes and clear any partially completed objects so that they can be re-fitted from scratch.

In [4]:
pipes.catalogue.clean("guo_cat")

Bagpipes: 5 out of 5 objects completed.
Bagpipes: Partially completed objects reset.


More complex options
--------------------------

There are a few other options that might come in handy. For example, if you have a list of spectroscopic redshifts for the objects you're fitting you might wish to fix the redshift of each fit to a different value. You can do this by passing an array of redshift values as the redshifts keyword argument.

In [None]:
redshifts = np.ones(ID.shape), 

cat_fit = pipes.catalogue_fit(IDs, fit_info, load_uvista, spectrum_exists=False,
                              cat_filt_list=uvista_filt_list, run="guo_cat_redshift_1",
                              redshifts=redshifts)

If instead you want to vary the redshift within a small range around the input redshift you can pass a float to the redshift_sigma keyword argument. This will cause the redshift for each object to be fitted with a Gaussian prior centred on the value passed in redshifts with the specified standard deviation.

## Varying the filt_list

Finally, if you have a bunch of different objects with different photometry that you want to fit with the same model you can pass a list of filter lists to catalogue_fit as the cat_filt_list keyword argument. If you do this you need to set the vary_filt_list keyword argument to True, and the code will expect the first entry in cat_filt_list to be the filter list for the first object and so on. We can set this up using the same filter list for each object just to demonstrate:

In [None]:
list_of_filt_lists = [uvista_filt_list] * 10

cat_fit = pipes.catalogue_fit(IDs, fit_instructions, load_goodss, spectrum_exists=False,
                              cat_filt_list=list_of_filt_lists, run="guo_cat_vary_filt_list",
                              redshifts=redshifts, redshift_sigma=0.05, vary_filt_list=True)