# Exploring the Gen-3 Butler

<br>Owners: **Alex Drlica-Wagner** ([@kadrlica](https://github.com/LSSTScienceCollaborations/StackClub/issues/new?body=@kadrlica)), **Douglas Tucker** ([@douglasleetucker](https://github.com/LSSTScienceCollaborations/StackClub/issues/new?body=@douglasleetucker))
<br>Last Verified to Run: **2019-08-08**
<br>Verified Stack Release: **v18.1.0**

## Core Concepts

This notebook provides a first look at the structure and organization of a repo created with the Gen-3 Butler. The Gen-3 Butler is still under development, so this notebook is expected to be updated after the Gen-3 release.

1. Create a Gen-3 butler
2. Use the Gen-3 butler to explore the ci_hsc_gen3 data repo

## Learning Objectives:

This notebook lays out features of how the Gen-3 butler functions:

1. Explore a Gen-3 data repo

In [None]:
# Generic imports
import os
import pylab as plt

In [None]:
# Stack imports
import lsst.daf.butler as dafButler
import lsst.afw.display as afwDisplay

In [None]:
# Directory where the repo lives
repo='/project/shared/data/ci_hsc_gen3'

You can poke around this directory a bit to see what outputs have been created.

In [None]:
# The base directory for the repo
!ls $repo

In [None]:
# The outputs are stored in the `shared/ci_hsc_output`
outdir=f'{repo}/shared/ci_hsc_output'
!ls $outdir

To create a butler you need to pass it a configuration file and a run name. The run name tells the butler where the place output files. More on Butler configuration can be found [here](https://pipelines.lsst.io/modules/lsst.daf.butler/configuring.html). By investigating the directory structue, we find that the 'collection' is `shared/ci_hsc_output`.

In [None]:
config = os.path.join(repo,'butler.yaml')
butler = dafButler.Butler(config=config,collection="shared/ci_hsc_output")

With the Gen-2 butler, there was no good way to investigate what data exist in a repo. To get around this, we all developed a habit of investigating the directory structure and file names to figure out what data existed.

In [None]:
!ls $outdir/calexp

In [None]:
!ls $outdir/calexp/903338

Based on these filenames, we have enough to specify the dataId to pass to the butler...

In [None]:
dataId = {'visit':903338,'detector':25,'instrument':'HSC'}
calexp = butler.get('calexp', dataId=dataId)

In [None]:
afwDisplay.setDefaultBackend('matplotlib') 
fig = plt.figure(figsize=(10,8))
afw_display = afwDisplay.Display(1)
afw_display.scale('asinh', 'zscale')
afw_display.mtv(calexp)
plt.gca().axis('off')
# And if it wasn't sacrilege I would rotate this image...

## Gen-3 Butler

Ok, so how do we do this in Gen-3 land? It looks like the butler has a `registry`, that seems promising! 

In [None]:
registry = butler.registry
# help(registry)

So the first step is to figure out what collections exist. The `registry` seems like a good tool for this (more on the registry schema can be found [here](https://dmtn-073.lsst.io/))

In [None]:
butler = dafButler.Butler(config=config,collection="")
registry = butler.registry

In [None]:
registry.getAllCollections()

In [None]:
# Similarly, this should work, but appears to be broken in this release.
# registry.getAllDatasetTypes()

## Some Exploration

We are looking for a way to "get all dataIds" so we can figure out what data exist. So far, this has been unsuccessful.

In [None]:
registry = butler.registry
#help(registry)

The `registry.find` method looks useful, but is it? Not really, because we need the dataId, which we don't know a priori. Once we have the dataId, we can get a DatsetRef that we can pass to the Butler.

In [None]:
ref = registry.find(collection='shared/ci_hsc_output',datasetType='calexp',dataId=dataId)
calexp = butler.get(ref)