# Introduction to Nightly Validation Data Products

Contact author: Alex Broughton


## Introduction

In this tutorial, we will learn

- What data set types are availble after every night
- Look at some catalog data
- Validate nightly performance

## 1.0 Set Up

The primary access point for data is the Rubin Science Platform (RSP). For most commissioning integration, testing, and verification and validation (V&V) tests of the telescope system and image quality, we will be using the USDF (ultimately, most major science pipelines will happen on NERSC). 

Luckily for you, all LSST-distributed software comes already installed for you!

Environments with different versions of the DM Stack are in:

        /opt/lsst/software/stack                    (on RSP)
        /sdf/group/rubin/sw/                        (on USDF terminal)

If you want to add your own configuration to your LSST environment startup on the RSP in the
LSST iPython kernel, create a sourceable shell fragment in:

        ${HOME}/notebooks/.user_setups

and it will be sourced during kernel startup.

Find useful documentation for the software and Notebook Aspect at:
- https://pipelines.lsst.io
- https://rsp.lsst.io

**Note: if you want to do this yourself (e.g. you want to run the DM Stack on your own terminal shell), you need to install the version you want and then set it up, like so:**
```
source /sdf/group/rubin/sw/${VERSION}/loadLSST.bash
setup lsst_sitcom -t ${VERSION}
```

When you start up the RSP, selecting the "Recommended" release loads in an envirnonment with the most recent stable release of the DM Stack to your kernel. To check that it is loaded, you can run:

In [None]:
! eups list -s | grep lsst_distrib
! eups list -s | grep ip_isr
! eups list -s | grep cp_pipe

In [None]:
# For this tutorial, we will need:
import numpy as np
import matplotlib.pyplot as plt

from lsst.daf.butler import Butler
from lsst.obs.lsst import LsstComCam

# Camera object
camera = LsstComCam.getCamera()


## 2.0 Data Access Using the Butler

**Data** are stored in **repositories** as **collections** of **dataset types**, and are found by their associated **dimensions**.

The repositories can be found here: `/sdf/group/rubin/repo`

In [None]:
# Each repository has a butler
butler = Butler("/repo/embargo")
registry = butler.registry # Contains list of available data (basically a precompiled catalog of dataset type: dimensions)


What collection should we look in?

<br>Anytime someone runs a pipeline on some data, it will produce other dataset types (e.g. `postISRCCD`, `calexp`, `finalized_src_table`, `diaObject`, etc.) in a new collection (under `u/{username}/your/collection/name/datatimestamp}`) that will be CHAINED to all the input collections.


In [None]:
# Users have generated many collections in this repository over the past month or so:
registry.queryCollections().__len__()

In fact, the `LSSTComCam/calib` collection is a chain of collections containing different dataset types for the different types of calibrations:

In [None]:
collections = registry.queryCollections("LSSTComCam/calib/*")

for col in collections:
    print(col)

What data products (data types) are contained across these collections?

In [None]:
summary = butler.registry.getCollectionSummary('LSSTComCam/calib')
datasetTypes = list(summary.dataset_types)
for dt in datasetTypes:
    print(dt)

<br>We generated our own collection by running our own pipeline. Let's find our collection and see what data products (data types) we produced!

In [None]:
summary = butler.registry.getCollectionSummary('LSSTComCam/nightlyValidation')
datasetTypes = list(summary.dataset_types)
for dt in datasetTypes:
    print(dt)

## 2.0 WOW! Look at all these cool data products! What should we look at first?

Let's look at an image.

In [None]:
collections = ['LSSTComCam/nightlyValidation'] # Note: we can also search multiple collections if we wanted to!

In [None]:
# We generated a collection ourselves!
# Lets fund the 
refs = list(registry.queryDatasets(
    "postISRCCD",
    instrument="LSSTComCam",
    detector=4,
    where="exposure.observation_reason='science'",
    collections=collections,
))

print(f"Found {len(refs)} references of datasetType 'postISRCCD' !")

In [None]:
# Lets look at one of these references:
ref = refs[1]
ref

This reference and dataId corresponds to this particular image, and we can use this reference to get any data product associated with this specific exposure (from `raw` $\rightarrow$ `finalized_src_table`)

In [None]:
# Now that we have a reference for this image, 
# we can ask the butler to go and get it for us:
postISRCCD = butler.get('postISRCCD', dataId=ref.dataId, collections=collections)

In [None]:
plt.hist(postISRCCD.image.array.ravel(), bins=100)
plt.yscale('log')

In [None]:
from matplotlib.colors import LogNorm, SymLogNorm, AsinhNorm
norm = LogNorm(vmin=600, vmax=5000)

plt.figure(figsize=(10,10))
plt.title(ref.dataId)
plt.imshow(postISRCCD.image.array, origin='lower',norm=norm, cmap='binary_r')
plt.colorbar(shrink=0.75)

# 3.0 Data Tables

In [None]:
# We generated a collection ourselves!
# Lets fund the 
refs = list(registry.queryDatasets(
    "src",
    instrument="LSSTComCam",
    detector=4,
    collections=collections,
))

print(f"Found {len(refs)} references of datasetType 'postISRCCD' !")

In [None]:
refs[0].dataId

In [None]:
# Now that we have a reference for this image, 
# we can ask the butler to go and get it for us
src = butler.get('src', dataId=refs[0].dataId, collections=collections)
postISRCCD = butler.get('postISRCCD', dataId=refs[0].dataId, exposure=refs[0].dataId['visit'],collections=collections)

In [None]:
obj_table = src.asAstropy()
obj_table

In [None]:
for col in obj_table.columns:
    print(col)

In [None]:
plt.figure(figsize=(10,10))
plt.title(ref.dataId)
plt.imshow(postISRCCD.image.array, origin='lower', norm=norm, cmap='binary_r')
plt.colorbar(shrink=0.75)
plt.scatter(obj_table['slot_Centroid_x'], obj_table['slot_Centroid_y'], marker="o", facecolor=None, edgecolor="blue", linewidth=1)



In [None]:
good = obj_table['calib_psf_used']

In [None]:
plt.figure(figsize=(10,10))
plt.title(ref.dataId)
plt.imshow(postISRCCD.image.array, origin='lower', norm=norm, cmap='binary_r')
plt.colorbar(shrink=0.75)
plt.scatter(obj_table['slot_Centroid_x'][good], obj_table['slot_Centroid_y'][good], marker="o", facecolor=None, edgecolor="blue", linewidth=1)



In [None]:
plt.scatter(obj_table['base_GaussianFlux_instFlux'][good], obj_table['slot_PsfShape_xx'][good])