# Refcat Loader Demo

<br>Owner: **Keith Bechtol** ([@bechtol](https://github.com/LSSTScienceCollaborations/StackClub/issues/new?body=@bechtol))
<br>Last Verified to Run: **2021-07-09**
<br>Verified Stack Release: **w_2021_25**

This notebook demonstrates how to load a reference catalog with color terms applied. Thanks to Eli Rykoff for adding this functionality.

This notebook uses HSC RC2 dataset (a few tracts of HSC data that are reprocessed ~monthly for routine science performance evaluation of the science pipelines).

### Learning Objectives
After working through and studying this notebook you should be able to
   1. Access the schema of `sourceTable_visit` catalog 
   2. Load `sourceTable_visit` catalog into memory, including subset of columns (reading from a parquet file)
   3. Load subset of reference catalog that overlaps the same region of the sky.

### Logistics
This notebook is intended to be runnable on `lsst-lsp-stable.ncsa.illinois.edu` from a local git clone of https://github.com/LSSTScienceCollaborations/StackClub.

## Setup
You can find the Stack version by using `eups list -s` on the terminal command line.

In [None]:
# What version of the Stack am I using?
! echo $HOSTNAME
! eups list -s | grep lsst_distrib

In [None]:
import os.path
import numpy as np
from astropy.time import Time

import lsst.geom
from lsst.pipe.tasks.loadReferenceCatalog import LoadReferenceCatalogConfig, LoadReferenceCatalogTask
from lsst.meas.algorithms import ReferenceObjectLoader
import lsst.daf.butler as dafButler
from lsst.utils import getPackageDir

import matplotlib.pyplot as plt
%matplotlib widget

## Explore the `sourceTable_visit` and `objectTable_tract` Tables

The `sourceTable_visit` and `objectTable_tract` tables are stored as parquet files and it is possible to load just a subset of columns for rapid data access.

First we need to set up the butler, in this case, pointing to a HSC RC2 dataset.

In [None]:
repo = '/repo/main/'
config= os.path.join(repo,'butler.yaml')
butler = dafButler.Butler(config=config)
registry = butler.registry
collections = ['HSC/runs/RC2/w_2021_18/DM-29973']

Access the column names for `sourceTable_visit`. The cell below uses [getDeferred](https://pipelines.lsst.io/py-api/lsst.daf.butler.Butler.html#lsst.daf.butler.Butler.getDeferred) syntax to return a `DeferredDatasetHandle` which can later retrieve a dataset, after an immediate registry lookup. In this case, we don't need the catalog itself -- we just want the columns. We will use `getDeferred` again when accessing the reference catalogs.

In [None]:
dat_refs_source_table = sorted(registry.queryDatasets('sourceTable_visit', collections=collections))
butler.getDeferred('sourceTable_visit', dat_refs_source_table[0].dataId, collections=collections).get(component='columns').values

Similarly, we can access the column names for `objectTable_tract`.

In [None]:
dat_refs_object_table = sorted(registry.queryDatasets('objectTable_tract', collections=collections))
butler.getDeferred('objectTable_tract', dat_refs_object_table[0].dataId, collections=collections).get(component='columns').values

Load all columns for all sources in a visit...

In [None]:
catalog = butler.getDirect(dat_refs_source_table[0])
catalog

Or load just a few columns of interest:

In [None]:
catalog = butler.getDirect(dat_refs_source_table[0], parameters={"columns": ['sourceId', 'coord_ra', 'coord_dec']})
catalog

Note that the `objectTable_tract` for HSC RC2 is large enough that one cannot load all columns into memory on the RSP, so one must specify a subset of columns. You may have to restart the kernel due to memory overflow if you make this mistake.

## Load Reference Catalog

Next we demonstrate how to load a reference catalog, in this case, either Gaia or PS1 depending on whether you are more interested in astrometry (with proper motions) or photometry (with color terms).

In [None]:
# Toggle between
# refDataset = 'gaia_dr2_20200414'
refDataset = 'ps1_pv3_3pi_20170110'

Set up butler:

In [None]:
repo = '/repo/main/'
config = os.path.join(repo, 'butler.yaml')
butler = dafButler.Butler(config=config)
registry = butler.registry
collection = 'refcats'

Let's see what reference catalogs are available:

In [None]:
registry.getCollectionSummary(collection).datasetTypes.names

The first step to load reference catalogs is to select a specific region of the sky because we don't want to load the entire catalog into memory at once. Fortunately, the reference catalogs are spatially sharded so that we load a subset of the full reference catalog that is spatially localized. In this example, we first access the dataset references for shards that spatially overlap one of the HSC visits.

In [None]:
visit = 35892
datasetRefs = list(registry.queryDatasets(datasetType=refDataset,
                                          collections=collection,
                                          instrument='HSC',
                                          skymap='hsc_rings_v1',
                                          where=f'visit={visit}').expanded())

dataIds = [_.dataId for _ in datasetRefs]

# Get DeferredDatasetHandles for reference catalog
refCats = [butler.getDeferred(refDataset, _, collections=['refcats'])
           for _ in dataIds]

cat_ref_example = butler.getDirect(datasetRefs[0])

In [None]:
cat_ref_example.asAstropy()

Next we load the HSC source catalog for that visit.

In [None]:
# Get the HSC catalog for comparsion
refs = list(registry.queryDatasets(datasetType='sourceTable_visit',
                                   collections=['HSC/runs/RC2/w_2021_18/DM-29973'],
                                   instrument='HSC',
                                   skymap='hsc_rings_v1',
                                   where=f'visit={visit}'))
cat_hsc = butler.getDirect(refs[0])

In [None]:
cat_hsc

In [None]:
config = LoadReferenceCatalogConfig()
config.refObjLoader.ref_dataset_name = refDataset

if refDataset == 'gaia_dr2_20200414':
    # Apply proper motions for Gaia catalog
    config.refObjLoader.requireProperMotion = True
    config.refObjLoader.anyFilterMapsToThis = 'phot_g_mean'
    config.doApplyColorTerms = False
else:
    # Apply color terms for PS1 catalog
    config.refObjLoader.load(os.path.join(getPackageDir('obs_subaru'),
                                          'config',
                                          'filterMap.py'))
    config.colorterms.load(os.path.join(getPackageDir('obs_subaru'),
                                        'config',
                                        'colorterms.py'))

# Set the epoch for proper motions. Here picking a random date:
epoch = Time('2021-06-10')

loaderTask = LoadReferenceCatalogTask(config=config,
                                      dataIds=dataIds,
                                      refCats=refCats)

# Define center relative to HSC catalog
center = lsst.geom.SpherePoint(np.median(cat_hsc['coord_ra']),
                               np.median(cat_hsc['coord_dec']),
                               lsst.geom.degrees)
# Alternatively, define center relative to reference catalog
# center = lsst.geom.SpherePoint(cat_ref_example['coord_ra'][0],
#                                cat_ref_example['coord_dec'][0],
#                                lsst.geom.radians)
print('Using center (RA, DEC) =', center)

cat_ref = loaderTask.getSkyCircleCatalog(center,
                                         1.0*lsst.geom.degrees,
                                         ['HSC-G', 'HSC-R'],
                                         epoch=epoch)
print('Found %i reference catalog objects'%(len(cat_ref)))

In [None]:
cat_ref

Note that the reference catalog fluxes have been converted to magnitudes in the HSC system if color terms have been applied.

In [None]:
plt.figure()
plt.scatter(cat_ref['ra'], cat_ref['dec'], marker='.', s=10, edgecolor='none', label='Reference')
plt.scatter(cat_hsc['coord_ra'], cat_hsc['coord_dec'], marker='.', s=1, edgecolor='none', label='HSC')
plt.xlabel('RA (deg)')
plt.ylabel('Dec (deg)')
plt.legend(markerscale=4)

## Exercise

Perform a spatial matching between the HSC and reference catalog and compare the astrometry and photometry of matched objects.