# Exploring a Data Repository

<br>Owner: **Phil Marshall** ([@drphilmarshall](https://github.com/LSSTScienceCollaborations/StackClub/issues/new?body=@drphilmarshall))
<br>Last Verified to Run: **2018-09-07**
<br>Verified Stack Release: **16.0**

This notebook shows how to find out what's in a data repository, and how to find out which inputs went into each component of it.  

### Learning Objectives:
After working through and studying this notebook you should be able to understand how to use the Butler to figure out: 
   1. Which data types are present in a data repository
   2. If coadds have been made, what the available tracts and patches are
   3. TBD
   
### Logistics
This notebook is intended to be runnable on `lsst-lspdev.ncsa.illinois.edu` from a local git clone of https://github.com/LSSTScienceCollaborations/StackClub.


## Set Up

In [None]:
import os
import sys
import warnings
import matplotlib.pyplot as plt
%matplotlib inline

# Filter some warnings printed by v16.0 of the stack
warnings.simplefilter("ignore", category=FutureWarning)
warnings.simplefilter("ignore", category=UserWarning)

## The HSC Dataset: What's in there?
We'll use the `hsc` dataset as our testing ground, and start by figuring out what's there.

We'll need a butler to interrogate the `hsc` data repository.

In [None]:
from lsst.daf.persistence import Butler

# Instantiate the butler
depth = 'WIDE' # WIDE, DEEP, UDEEP
field = 'SSP_WIDE' # SSP_WIDE, SSP_DEEP, SSP_UDEEP
repo = '/datasets/hsc/repo/rerun/DM-13666/%s/'%(depth)
butler = Butler(repo)

In [None]:
# datasetRefOrType : forced_src
# see all options at
# /opt/lsst/software/stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/obs_subaru/16.0+1/python/lsst/obs/hsc

# Better to point to a resource online:

In [None]:
from stackclub import where_is

In [None]:
where_is('obs_subaru', in_the='source')

In [None]:
help(where_is)

## Tracts and Patches, Coadd Images
Let's try exploring the `hsc` dataset's coadd images, and the visits that went into them. Jim Bosch shows how to do this in [this community.lsst.org post](https://community.lsst.org/t/visualizing-source-images-in-a-coadd/441/2).

We'll need a single coadd image to work from.

In [None]:
from lsst.daf.persistence import Butler

# Instantiate the butler
depth = 'WIDE' # WIDE, DEEP, UDEEP
field = 'SSP_WIDE' # SSP_WIDE, SSP_DEEP, SSP_UDEEP
repo = '/datasets/hsc/repo/rerun/DM-13666/%s/'%(depth)
butler = Butler(repo)

# The following does not work, because ci_hsc has not been ingested or reduced!
# butler = Butler('/project/shared/data/ci_hsc/')

In [None]:
# Jim's function for returning the visits that went into a coadd:
def showInputs(butler, dataId, coaddType="deepCoadd"):
    coadd = butler.get(coaddType, **dataId)
    visitInputs = coadd.getInfo().getCoaddInputs().visits
    ccdInputs = coadd.getInfo().getCoaddInputs().ccds
    ccdDict = dict((int(v), int(ccd)) for v, ccd in zip(ccdInputs.get("visit"), ccdInputs.get("ccd")))
    for v in visitInputs.get("id"):
        md = butler.get("calexp_md", visit=int(v), ccd=ccdDict[v])
        print("%d %4.0f" % (v, afwImage.Calib(md).getExptime()))

In [None]:
# Find all tracts, and choose the first one we come to:
import os, glob
tracts = sorted([int(os.path.basename(x)) for x in
                 glob.glob(os.path.join(repo, 'deepCoadd-results', 'merged', '*'))])
tract = tracts[15]

In [None]:
# Find all patches in our tract, and choose the first one we come to. For this, we need the skymap:
skyMap = butler.get('deepCoadd_skyMap')
tractInfo = skyMap[tract]
patchInfo = tractInfo.getPatchInfo([0, 0])

In [None]:
tractInfo

In [None]:
patchInfo

In [None]:
# Obtain a random dataid for a coadd image in the i-band
band = 'HSC-I'
patch = (0,0)

subset = butler.subset('deepCoadd', dataId={'filter':band, 'tract':tract, 'patch':patch})
dataid = subset.cache[0]
print(dataid)

In [None]:
coadd = butler.get('deepCoadd', **dataid)

In [None]:
showInputs(butler, dataid)