In [1]:
import warnings
from matplotlib.cbook import MatplotlibDeprecationWarning
warnings.simplefilter('ignore', MatplotlibDeprecationWarning)
warnings.simplefilter('ignore', UserWarning)
warnings.simplefilter('ignore', RuntimeWarning)
warnings.simplefilter('ignore',UnicodeWarning)


![HELP logo](https://github.com/pdh21/FIR_bootcamp_2016/blob/master/Figures/Help_Logo.png?raw=true)
# XID+
### _Peter Hurley  (SCUBA2-edits Matt Smith)_

1. Uses a MCMC based approach to get FULL posterior
2. Provide a natural framework to introduce additional prior information
3. Allows more accurate estimate of flux density errors for each source
4. Provides a platform for doing science with the maps (e.g Hierarchical stacking of LBGs, Luminosity function from the map etc)

![stan logo](https://github.com/stan-dev/logos/blob/master/pystan_logo_name.png?raw=true)


Cross-identification tends to be done with catalogues, then science with the matched catalogues.

XID+ takes a different philosophy.
* Catalogues are a form of data compression. OK in some cases, not so much in others: 
    - i.e. confused images: catalogue compression loses correlation information
* Ideally, science should be done without compression..

XID+ provides a framework to cross identify galaxies we know about in different maps, with the idea that it can be extended to do science with the maps!!


## Probabilistic Framework


Philosophy: 
* build a probabilistic generative model for the SPIRE maps
* Infer model on SPIRE maps


### Bayes Theorem
$p(\mathbf{f}|\mathbf{d}) \propto p(\mathbf{d}|\mathbf{f}) \times p(\mathbf{f})$

### Generative Model
In order to carry out Bayesian inference, we need a model to carry out inference on.

For the SPIRE maps, our model is quite simple, with likelihood defined as:
    $L = p(\mathbf{d}|\mathbf{f}) \propto |\mathbf{N_d}|^{-1/2} \exp\big\{ -\frac{1}{2}(\mathbf{d}-\mathbf{Af})^T\mathbf{N_d}^{-1}(\mathbf{d}-\mathbf{Af})\big\}$

where:
    $\mathbf{N_{d,ii}} =\sigma_{inst.,ii}^2+\sigma_{conf.}^2$
    

Simplest model for XID+ assumes following:
* All sources are known and have positive flux (fi)
* A global background (B) contributes to all pixels 
* PRF is fixed and known
* Confusion noise is constant and not correlated across pixels
---
Because we are getting the joint probability distribution, our model is generative:
    
* Given parameters, we generate data and vica-versa
    
Compared to discriminative model (i.e. neural network), which only obtains conditional probability distribution:

* Neural network, give inputs, get output. Can't go other way'

Generative model is full probabilistic model. Allows more complex relationships between observed and target variables


## XID+ in action
XID+ applied to GALFORM simulation of COSMOS field

Lets look at part of COSMOS:

Fit to Lacey GALFORM
* SAM simulation (with dust) ran through SMAP pipeline_ similar depth and size as COSMOS
* Used galaxies with an observed 100 micron flux of gt. $50\mathbf{\mu Jy}$. Gives 64823
* used tiles of 0.2 degrees with buffer.
* Uninformative prior: uniform in log space $10^{-8} - 10{^3} \mathbf{mJy}$


# RUN SCRIPT
======================

Import required modules

In [2]:
from astropy.io import ascii, fits
import pylab as plt
%matplotlib inline
from astropy import wcs

import sys
sys.path.append("/home/gandalf/spxmws/Hard-Drive/help/XIDplus/XID_plus") # temporary hack for MS machine
import numpy as np
import xidplus
from xidplus import moc_routines
import pickle

Set image and catalogue filenames

In [3]:
# set bands to run
bands = {"850": True, "450":False}

#Folder containing maps
imfolder='/home/gandalf/spxmws/Hard-Drive/help/s2cosmos/'

s850fits=imfolder+'COSMOS-850-edit.fits'#SPIRE 850 map
s450fits=imfolder+'COSMOS-850-edit.fits'#SPIRE 450 map


#Folder containing prior input catalogue
catfolder=imfolder
#prior catalogue
prior_cat='dmu26_XID+SPIRE_COSMOS_20161129.fits'


#output folder
output_folder=imfolder

Load in images, noise maps, header info and WCS information - set up to deal with a pipeline SCUBA-2 image)

In [4]:
#-----850-------------
if bands["850"]:
    # open fits file
    hdulist = fits.open(s850fits)
    
    # adjust data and header into 2D rather than 3D
    header850 = hdulist[0].header
    header850['NAXIS'] = 2
    header850["i_naxis"] = 2
    del(header850['NAXIS3'])
    del(header850["CRPIX3"])
    del(header850["CDELT3"])
    del(header850["CRVAL3"])
    del(header850["CTYPE3"])
    del(header850["LBOUND3"])
    del(header850["CUNIT3"])
    
    im850phdu=header850
    im850hdu=header850
    
    # convert variance to error
    nim850 = np.sqrt(hdulist[1].data[0,:,:])
    
    # convert units if needed
    if im850hdu['BUNIT'] == 'mJy/beam':
        im850 = hdulist[0].data[0,:,:]
    elif im850hdu['BUNIT'] == 'mJy/arcsec**2':
        im850=hdulist[0].data[0,:,:]*229.487
        nim850=nim850.data*229.487
    else:
        raise Exception("Unit not Programmed")
      
    w_850 = wcs.WCS(hdulist[0].header)
    pixsizes850 = wcs.utils.proj_plane_pixel_scales(w_850)*3600.0
    if np.abs(pixsizes850[0] - pixsizes850[1]) > 0.01:
        raise Exception("Not programmed for Rectangular Pixels")
    pixsize850= pixsizes850[0]
    hdulist.close()

#-----450-------------
if bands["450"]:
    hdulist = fits.open(s450fits)
    im450phdu=hdulist[0].header
    im450hdu=hdulist[0].header
    
    # adjust data and header into 2D rather than 3D
    header450 = hdulist[0].header
    header450['NAXIS'] = 2
    header450["i_naxis"] = 2
    del(header450['NAXIS3'])
    del(header450["CRPIX3"])
    del(header450["CDELT3"])
    del(header450["CRVAL3"])
    del(header450["CTYPE3"])
    del(header450["LBOUND3"])
    del(header450["CUNIT3"])
    
    # convert variance to error
    nim450 = np.sqrt(hdulist[1].data[0,:,:])
    
    # convert units if needed
    if im450hdu['BUNIT'] == 'mJy/beam':
        im450=hdulist[0].data[0,:,:]
    elif im450hdu['BUNIT'] == 'mJy/arcsec**2':
        im450=hdulist[0].data[0,:,:]*104.246
        nim450=nim450.data*104.246
    else:
        raise Exception("Unit not Programmed")
    
    w_450 = wcs.WCS(hdulist[1].header)
    pixsizes450 = wcs.utils.proj_plane_pixel_scales(w_450)*3600.0
    if np.abs(pixsizes450[0] - pixsizes450[1]) > 0.01:
        raise Exception("Not programmed for Rectangular Pixels")
    pixsize450= pixsizes450[0]
    hdulist.close()


Load in catalogue you want to fit (and make any cuts)

In [5]:
hdulist = fits.open(catfolder+prior_cat)
fcat=hdulist[1].data
hdulist.close()
inra=fcat['RA']
indec=fcat['DEC']

#sgood=fcat['S100']>0.050

#inra=inra[sgood]
#indec=indec[sgood]

Set prior classes


In [6]:
#---prior850--------
if bands["850"]:
    prior850=xidplus.prior(im850,nim850,im850phdu,im850hdu)#Initialise with map, uncertianty map, wcs info and primary header
    prior850.prior_cat(inra,indec,prior_cat)#Set input catalogue
    prior850.prior_bkg(-5.0,5)#Set prior on background (assumes Guassian pdf with mu and sigma)
#---prior450--------
if bands["450"]:
    prior450=xidplus.prior(im450,nim450,im450phdu,im450hdu)
    prior450.prior_cat(inra,indec,prior_cat)
    prior450.prior_bkg(-5.0,5)


Set PSF

In [7]:
#use Gaussian2DKernel to create prf (requires stddev rather than fwhm hence pfwhm/2.355)
from astropy.convolution import Gaussian2DKernel

#---PSF850-----------
if bands["850"]:
    prf850=0.98*Gaussian2DKernel(13.0/2.355,x_size=101,y_size=101)+0.02*Gaussian2DKernel(48.0/2.355,x_size=101,y_size=101)
    prf850.normalize(mode='peak')
    pind850=np.arange(0,101,1)*1.0/pixsize850 #get 850 scale in terms of pixel scale of map
    prior850.set_prf(prf850.array,pind850,pind850)#requires psf as 2d grid, and x and y bins for grid (in pixel scale)

#---PSF450--------------
if bands["450"]:
    prf450=0.94*Gaussian2DKernel(7.9/2.355,x_size=101,y_size=101)+0.06*Gaussian2DKernel(25.0/2.355,x_size=101,y_size=101)
    prf450.normalize(mode='peak')
    pind450=np.arange(0,101,1)*1.0/pixsize450 #get 450 scale in terms of pixel scale of map
    prior450.set_prf(prf450.array,pind450,pind450)


In [8]:
if bands["850"] and bands["450"]:
    print 'fitting '+ str(prior850.nsrc)+' sources \n'
    print 'using ' +  str(prior850.snpix)+' and '+ str(prior450.snpix)+' pixels'
elif bands["850"]:
    print 'fitting '+ str(prior850.nsrc)+' sources \n'
    print 'using ' +  str(prior850.snpix)
else:
    print 'fitting '+ str(prior450.nsrc)+' sources \n'
    print 'using ' +  str(prior450.snpix)


fitting 44310 sources 

using 7976201


Fitting this number of sources and datapoints is not practical. Suggest cutting down to a MOC based on a HEALPix tile with an order no greater than 10 for SPIRE.

In [9]:
order=10
Tile=6978160
moc=moc_routines.get_fitting_region(order,Tile)
if bands["850"]:
    prior850.set_tile(moc)
if bands["450"]:
    prior450.set_tile(moc)

In [10]:
if bands["850"] and bands["450"]:
    print 'fitting '+ str(prior850.nsrc)+' sources \n'
    print 'using ' +  str(prior850.snpix)+' and '+ str(prior450.snpix)+' pixels'
elif bands["850"]:
    print 'fitting '+ str(prior850.nsrc)+' sources \n'
    print 'using ' +  str(prior850.snpix)
else:
    print 'fitting '+ str(prior450.nsrc)+' sources \n'
    print 'using ' +  str(prior450.snpix)


fitting 149 sources 

using 23899


Calculate pointing matrix

In [11]:
if bands["850"]:
    prior850.get_pointing_matrix()
if bands["450"]:
    prior450.get_pointing_matrix()


Default prior on flux is a uniform distribution, with a minimum and maximum of 0.01 and 1000.0 $\mathrm{mJy}$ respectively for each source. running the function upper_lim _map resets the upper limit to the maximum flux value (plus a 5 sigma Background value) found in the map in which the source makes a contribution to.

In [12]:
if bands["850"]:
    prior850.upper_lim_map()
if bands["450"]:
    prior450.upper_lim_map()

Now fit using the interface to pystan

In [13]:
from xidplus.stan_fit import SCUBA2

if bands["850"]:
    fit850=SCUBA2.single_band(prior850,iter=1500)
if bands["450"]:
    fit450=SCUBA2.single_band(prior450,iter=1500)


./XID+SCUBA2.pkl found. Reusing


Initialise the posterior class with the fit object from pystan, and save alongside the prior classes

In [14]:
outfile=output_folder+'Tile_'+str(Tile)+'_'+str(order)+'_new.pkl'
output = {}

if bands["850"]:
    posterior850=xidplus.posterior_stan(fit850,[prior850])
    output['850'] = {"prior":prior850, 'posterior':posterior850}

if bands["450"]:
    posterior450=xidplus.posterior_stan(fit450,[prior450])
    output['450'] = {"prior":prior450, 'posterior':posterior450}

with open(outfile, 'wb') as f:
    pickle.dump(output,f)