# Exploring the Shared Datasets in the LSST Science Platform
<br>Owner(s): **Phil Marshall** ([@drphilmarshall](https://github.com/LSSTScienceCollaborations/StackClub/issues/new?body=@drphilmarshall)), 
<br>Last Verified to Run: **2018-08-05**
<br>Verified Stack Release: **16.0**

In this notebook we'll take a look at some of the datasets available on the LSST Science Platform. 

### Learning Objectives:

After working through this tutorial you should be able to: 
1. Start figuring out which of the available datasets is going to be of most use to you in any given project; 

When it is finished, you should be able to:
2. Plot the patches and tracts in a given dataset on the sky;
3. List the available catalogs in a given dataset.

### Logistics
This notebook is intended to be runnable on `lsst-lspdev.ncsa.illinois.edu` from a local git clone of https://github.com/LSSTScienceCollaborations/StackClub.

## Set-up

We'll need the `stackclub` package to be installed. If you are not developing this package, you can install it using `pip`, like this:
```
pip install git+git://github.com/LSSTScienceCollaborations/StackClub.git#egg=stackclub
```
If you are developing the `stackclub` package (eg by adding modules to it to support the Stack Club tutorial that you are writing, you'll need to make a local, editable installation. In the top level folder of the `StackClub` repo, do:

In [None]:
! cd .. && python setup.py -q develop --user && cd -

When editing the `stackclub` package files, we want the latest version to be imported when we re-run the import command. To enable this, we need the %autoreload magic command.

In [None]:
%load_ext autoreload
%autoreload 2

For accessing the datasets using the Butler, and then visualizing the results, we'll need the following modules:

In [None]:
%matplotlib inline
# %matplotlib ipympl

import os, glob
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from IPython.display import IFrame, display, Markdown

import stackclub

In [None]:
import lsst.daf.persistence as dafPersist
import lsst.daf.base as dafBase

import lsst.afw.math as afwMath
import lsst.afw.geom as afwGeom

import lsst.afw.detection as afwDetect
import lsst.afw.image as afwImage
import lsst.afw.table as afwTable

import lsst.afw.display as afwDisplay

You can find the Stack version that this notebook is running by using eups list -s on the terminal command line:

In [None]:
# What version of the Stack am I using?
! echo $HOSTNAME
! eups list -s | grep lsst_distrib

## Listing the Available Datasets
First, let's look at what is currently available. There are two primary shared dataset folders in the LSP, the read-only `/datasets` folder, and the group-writeable folder `/projects/shared/datasets`. Let's see what's in there

**`/projects/shared/data`:** These datasets are designed to be small test sets, ideal for tutorials.

In [None]:
shared_datasets = ! ls -d /project/shared/data/* | grep -v README
shared_datasets

In [None]:
%%bash
shared_datasets=$( ls -d /project/shared/data/* | grep -v README )
for dataset in $shared_datasets; do
    du -sh $dataset
done

**`/datasets`:**
These are typically much bigger: to measure the size, uncomment the second cell below and edit it to target the dataset you are interested in. Running `du` on all folders takes several minutes.

In [None]:
datasets = ! ls -d /datasets/* | grep -v USAGE | grep -v html
datasets

In [None]:
# %%bash
# datasets=$( ls -d /datasets/* | grep -v USAGE | grep -v html )
# for dataset in $datasets; do
#     du -h $dataset
# done

## Visualizing Sky Coverage
In this section, we'll plot the available patches and tracts in a given dataset on the sky, following the LSST DESC tutorial [dm_butler_skymap.ipynb](https://github.com/LSSTDESC/DC2-analysis/blob/master/tutorials/dm_butler_skymap.ipynb). In fact, we will _import_ this notebook, so that we can re-use its functions. This operation is handled by the `stackclub.wimport` function.

In [None]:
dm_butler_skymap_notebook = "https://github.com/LSSTDESC/DC2-analysis/raw/master/tutorials/dm_butler_skymap.ipynb"

skymapper = stackclub.wimport(dm_butler_skymap_notebook, vb=True)

> BUG: remote notebooks are not yet `wimport`-able. A workaround could be to import the downloaded file explicitly. This is not yet working, hence the commented out failed attempt below.

In [None]:
# import sys, os
# import stackclub
# sys.path.append(os.getcwd() + '/.downloads')

In [None]:
# import dm_butler_skymap

Now we can attempt to plot the available tracts, using the `plot_skymap_tract()` function.

In [None]:
# repo = "/project/shared/data/Twinkles_subset/output_data_v2"
repo = "/datasets/hsc/repo/rerun/DM-13666/WIDE"
butler = dafPersist.Butler(repo)

# Glob the merged coadd folder for the tracts that have data.  Unfortunately, this information is not
# directly accessible from the data butler.
tracts = sorted([int(os.path.basename(x)) for x in
                 glob.glob(os.path.join(repo, 'deepCoadd-results', 'merged', '*'))])

# How many tracts do we have?
print("Found {} tracts".format(len(tracts)))

In [None]:
"""
Uncomment this cell when the `wimport` bug is fixed (or avoided).

# Now, loop over all the tracts, plotting them as gray, numbered, rectangles:
ax = None
for tract in tracts:
    skyMap = butler.get('deepCoadd_skyMap')
    ax = skymapper.plot_skymap_tract(skyMap, tract=tract, title='', ax=ax)
""";

## Summary

In this notebook we took a first look at the datasets available to us in two shared directories in the LSST science platform filesystem.