# Ingest and load local refcat demo using DELVE_DR1

<br>Owner: **Peter Ferguson** ([@psferguson](https://github.com/LSSTScienceCollaborations/StackClub/issues/new?body=@psferguson))
<br>Last Verified to Run: **2022-04-01**
<br>Verified Stack Release: **w_2021_49**

### Learning Objectives

This notebook demonstrates how to: <br>
1. Create an LSST-format reference catalog from an existing acii or fits reference catalog
2. Create an empty Gen3 butler repo
3. Ingest the LSST-format refcat into newly created repo
4. Load the new reference catalog with the butler

### Set Up 
You can find the Stack version by using `eups list -s` on the terminal command line.

In [None]:
# Site, host, and stack version
! echo $EXTERNAL_INSTANCE_URL
! echo $HOSTNAME
! eups list -s | grep lsst_distrib

In [None]:
import subprocess
import numpy as np
import pylab as plt
import lsst.geom
import lsst.daf.butler  as dafButler

### Create a gen3 reference catalog

For this example we will create a refcat from a DELVE (DEcam Local Volume Exploration survey [Website](https://delve-survey.github.io/)) DR1 healpixel located on NCSA, and import it into a gen3 repo. 

Following the developer instructions [piplines refcat documentation](https://pipelines.lsst.io/v/daily/modules/lsst.meas.algorithms/creating-a-reference-catalog.html)

The first step is to use the `ConvertReferenceCatalogTask` along with a config we create for the conversion to create a catalog in LSST format. 

In [None]:
# setting names
refcatDir='custom_refcat_demo'
configFile="ingestConfigOverride.cfg"
inputFile="/project/shared/data/delve_dr1/cat/cat_hpx_07798.fits"

In [None]:
# This notebook will only run if refcatDir doesn't exist
if os.path.exists(refcatDir):
    msg = f"Please remove directory '{refcatDir}' to continue:\n! rm -r {refcatDir}"
    raise Exception(msg)

In [None]:
! mkdir {refcatDir}

Below is the set of configs used in creating this refcat
 1. Since the refcat is in FITS format we retarget the file reader
 2. It is required to give a name to this refcat, in this case 'delve_dr1'
 3. We also need to specify ra, dec, mag, and mag_error columns
 4. Finally we can give the config a list of extra columns to include in the refcat (e.g., a star/galaxy classifier)
 
for this tutorial we will change the HTM depth to 4 to improve runtime but an HTM depth of 7 is default.

The `%%writefile` cell magic writes the contents of the following cell to `configFile`.

In [None]:
%%writefile {configFile}
from lsst.meas.algorithms.readFitsCatalogTask import ReadFitsCatalogTask

# Default is ReadTextCatalogTask
config.file_reader.retarget(ReadFitsCatalogTask)

# String to pass to the butler to retrieve persisted files.
config.dataset_config.ref_dataset_name='delve_dr1'


config.dataset_config.indexer.name='HTM'

# Depth of the HTM tree to make.  Default is depth=7 which gives ~ 0.3 sq. deg. per trixel.
# for this tutorial we will change the HTM depth to 4 to improve runtime 
config.dataset_config.indexer['HTM'].depth=4

# Number of python processes to use when ingesting.
config.n_processes=5

# Name of RA column
config.ra_name='RA'

# Name of Dec column
config.dec_name='DEC'

# Name of column to use as an identifier (optional).
config.id_name='QUICK_OBJECT_ID'

# The values in the reference catalog are assumed to be in AB magnitudes. List of column names to use for
# photometric information.  At least one entry is required.
config.mag_column_list=['MAG_PSF_G', 'MAG_PSF_R','MAG_PSF_I', 'MAG_PSF_Z']

# A map of magnitude column name (key) to magnitude error column (value).
config.mag_err_column_map={'MAG_PSF_G':'MAGERR_PSF_G', 'MAG_PSF_R':'MAGERR_PSF_R','MAG_PSF_I':'MAGERR_PSF_I', 'MAG_PSF_Z':'MAGERR_PSF_Z'}

# Names of extra columns to include 
config.extra_col_names=['SPREAD_MODEL_G','SPREAD_MODEL_R','SPREAD_MODEL_I','SPREAD_MODEL_Z',
                        'SPREADERR_MODEL_G', 'SPREADERR_MODEL_R', 'SPREADERR_MODEL_I', 'SPREADERR_MODEL_Z',
                        'EXTINCTION_G', 'EXTINCTION_R', 'EXTINCTION_I', 'EXTINCTION_Z']


### Convert Files to LSST format
We then use the `convertReferenceCatalog` command line tool to ingest the catalog, this takes a bit of time to run. 

In [None]:
! convertReferenceCatalog {refcatDir} {configFile} {inputFile}

example output:

    lsst.ConvertReferenceCatalogTask INFO: Creating 2048 file locks.
    lsst.ConvertReferenceCatalogTask INFO: File locks created.
    lsst.ConvertReferenceCatalogTask INFO: Completed 1 / 1 files: 100 % complete 
    Completed refcat conversion. Ingest the resulting files with the following commands, substituting the path to your butler repo for REPO:
        butler register-dataset-type REPO delve_dr1 SimpleCatalog htm7
        butler ingest-files -t direct REPO gaia_dr2 refcats custom_refcat_demo/filename_to_htm.ecsv

### Create a gen3 repo and load catalog into it
We now have a LSST format refcat, for this demo we will create a new gen3 repo to ingest the refcat into. 

The creation of an empty repo needs a `butler.yaml` file. (e.g `./custom_refcat_demo/butler.yaml`).

In [None]:
# Create this file using notebook shell commands
filepath = "./custom_refcat_demo/butler.yaml"
filecontent = """
datastore:
  cls: lsst.daf.butler.datastores.fileDatastore.FileDatastore 
  root: <butlerRoot>
registry:
  db: sqlite:///<butlerRoot>/test.sqlite3 
""" 
! echo "$filecontent" > $filepath
! cat $filepath

Now we can run the `butler create` command line task to create a new repo.

In [None]:
repoName="test_repo_gen3"
! mkdir {refcatDir}/{repoName}
! touch {refcatDir}/{repoName}/test.sqlite3

In [None]:
!butler create {refcatDir}/{repoName} --seed-config {refcatDir}/butler.yaml --override

Now that we have an empty gen3 repo we can ingest the catalog into it with the following commands. 

    butler register-dataset-type REPO RefcatName SimpleCatalog htm3
    butler ingest-files -t direct REPO RefcatName collectionName filename_to_htm.ecsv
note the collection name must be a RUN not CHAIN type collection.

Also note that the htm option (htm4) must match with the ingestion config. 

In [None]:
!butler register-dataset-type {refcatDir}/{repoName} delve_dr1 SimpleCatalog htm4
!butler ingest-files -t direct {refcatDir}/{repoName} delve_dr1 refcats {refcatDir}/filename_to_htm.ecsv

### Loading the new refcat
We can now load this new repo, and check the "refcats" collection to see what it contains. 

In [None]:
butler = dafButler.Butler(refcatDir+"/"+repoName, writeable=True)
registry = butler.registry

In [None]:
[i for i in list(registry.queryCollections())]

In [None]:
registry.getCollectionSummary('refcats').datasetTypes.names

We can set a docstring for this refcat collection

In [None]:
registry.setCollectionDocumentation('refcats', "doc for delve dr1 refcat")

No longer need this butler repo to be writeable 

In [None]:
butler = dafButler.Butler(refcatDir+"/"+repoName, writeable=False)
registry = butler.registry

In [None]:
registry.getCollectionDocumentation('refcats')

In [None]:
refDataset="delve_dr1"
refcatRefs = list(registry.queryDatasets(datasetType=refDataset,
                                          collections=["refcats"]).expanded())
refDataIds=[_.dataId for _ in refcatRefs]
refCatsDef = [butler.getDeferred(refDataset, __, collections=['refcats']) for __ in refDataIds]

In [None]:
refCats=[butler.getDirect(__) for __ in refcatRefs]

Finally we can plot the loaded refcat. The two different colors arise because this DELVE DR1 healpixel has been sharded into two htm4 pixels.

In [None]:
import pylab as plt
fit,ax=plt.subplots()
for refCat in refCats:
    ax.scatter(refCat["coord_ra"], refCat["coord_dec"], label="refcat",s=0.01)
plt.xlabel("RA")
plt.ylabel("DEC")