# DC2 Object Catalog Run1.2i GCR tutorial -- Part IV: accessing photo-z

Owners: **Yao-Yuan Mao [@yymao](https://github.com/LSSTDESC/DC2-analysis/issues/new?body=@yymao)**  
Last Verifed to Run: **2019-02-22** (by @yymao)

This notebook will show you how to access the "add-on" columns that provide the photometric redshift (photo-z) information for the DC2 Object Catalog (Run 1.2i). 

__Learning objectives__: After going through this notebook, you should be able to:
  1. Load and efficiently access a DC2 object catalog (+ photo-z) with the GCR
  2. Understand how the photo-z data are stored / represented
  3. Look at an example of galaxy photo-z distributions
  
__Logistics__: This notebook is intended to be run through the JupyterHub NERSC interface available here: https://jupyter-dev.nersc.gov. To setup your NERSC environment, please follow the instructions available here: https://confluence.slac.stanford.edu/display/LSSTDESC/Using+Jupyter-dev+at+NERSC

__Other notes__: 
If you restart your kernel, or if it automatically restarts for some reason, all imports and variables will become undefined so, you will have to re-run everything.

In [None]:
import sys
sys.path.insert(0, '../../gcr-catalogs/')

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
import GCRCatalogs
from GCR import GCRQuery

## Load the catalog

Loading the object catalog with photo-z add-on. The catalog name is `dc2_object_run1.2i_with_photoz`. 

**Note**: if you need more quantities (including those not in DPDD), then use `dc2_object_run1.2i_all_columns_with_photoz` instead.

It takes a few seconds for the catalog instance to initiate.

In [None]:
cat = GCRCatalogs.load_catalog('dc2_object_run1.2i_with_photoz')

## Photo-z access methods

The photo-z infomation can be accessed in two ways:

1. As a single column `photoz_mode` that has the mode value (z_peak), and
2. As a single multi-dimension coulmn `photoz_pdf` (i.e., when accessing this column, you get a 2D array instead of a 1D one). 

We will demostrate both access method in detail. You can notice that all the photo-z columns have a prefix of `photoz_`. 

Let's first make sure that these columns are indeed available. 

In [None]:
sorted(q for q in cat.list_all_quantities() if q.startswith('photoz_'))

Let try access the photo-z data! Everything you already about the GCR access of object catalogs will still apply. 
Including the use of `filters` and `native_filters` (`native_filters` is used for selecting tracts mostly). 

In [None]:
data = cat.get_quantities(['photoz_mode'], 
                          filters=['photoz_mode < 0.2', 'mag_i < 26'], 
                          native_filters=['tract==4850'])

# check if the filters work
print((data['photoz_mode'] < 0.2).all())

Now, if you want to make a plot of the PDF, it might be easier to access the `pz_pdf_full` column. Note that it is a multi-dimension column, so use with care!

As an example, let's just load one patch (using the `return_iterator` feature) of the full PDFs:

In [None]:
data = next(cat.get_quantities(['photoz_pdf'], return_iterator=True))

There are 72 objects in this patch, and there are 101 bins in the photo-z PDF, so this 2D array has a shape of (72, 101). Note how the 2D array is orientied.

In [None]:
data['photoz_pdf'].shape

Now, let's plot 10 PDFs. To get an array of bin center values, you can access the `pz_pdf_bin_centers` attribute. 

In [None]:
for pdf in data['photoz_pdf'][:10]:
    plt.plot(cat.photoz_pdf_bin_centers, pdf);

plt.xlabel('$z$');
plt.ylabel('$p(z)$');

## Example

Now that we have learned all the access methods, let's try to work out an example!

First of all, let's define a set of reasonable cuts to give us galaxies

In [None]:
cuts = [
    GCRQuery('extendedness > 0'),     # Extended objects
    GCRQuery((np.isfinite, 'mag_i')), # Select objects that have i-band magnitudes
    GCRQuery('clean'), # The source has no flagged pixels (interpolated, saturated, edge, clipped...) 
                       # and was not skipped by the deblender
    GCRQuery('snr_i_cModel > 10'),    # SNR > 10
    GCRQuery('mag_i_cModel < 22'),  # cModel imag brighter than 22
    GCRQuery('mag_i_cModel > 20'),  # cModel imag fainter than 20 (exclude super bright objects)
]

Now let's make some plots!

In [None]:
data = cat.get_quantities(['photoz_mode', 'mag_g_cModel', 'mag_r_cModel', 'mag_i_cModel'], filters=cuts, native_filters=['tract==4850'])

In [None]:
plt.hist(data['photoz_mode'], 50);

In [None]:
plt.scatter(data['mag_g_cModel'] - data['mag_r_cModel'],
            data['mag_r_cModel'] - data['mag_i_cModel'],
            c=data['photoz_mode'], s=4, vmin=0, vmax=1);

plt.xlim(-1, 3);
plt.ylim(-0.5, 2);
plt.xlabel('$g-r$');
plt.ylabel('$r-i$');
plt.colorbar(label='$z$');