# Redshift Database Tutorial

## Abstract

This tutorial will cover the basics of using the redshift database, which is loaded from the outputs of the DESI pipeline.  Currently, this is based on software release 22.1b, and uses a [PostgreSQL](https://www.postgresql.org/) database. We use [SQLAlchemy](http://www.sqlalchemy.org/) to abstract away the details of the database.

## Requirements

This tutorial uses data from the `fuji` production (`/global/cfs/cdirs/desi/spectro/redux/fuji`), and the **DESI 22.1b** kernel.

## Initial Setup

This just imports everything we need and sets up paths and environment variables so we can find things.

In [1]:
#
# Imports
#
import os
from argparse import Namespace
from types import MethodType
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.font_manager import fontManager, FontProperties
from sqlalchemy import __version__ as sqlalchemy_version
from sqlalchemy import inspect
from sqlalchemy.sql import func
import astropy.units as u
from astropy.constants import c as lightspeed
from astropy.table import Table, MaskedColumn
#
# DESI software
#
from desiutil.log import get_logger, DEBUG
from desitarget.targetmask import (desi_mask, mws_mask, bgs_mask)
# from desisim.spec_qa import redshifts as dsq_z
from desisurvey import __version__ as desisurvey_version
from desisurvey.ephem import get_ephem, get_object_interpolator
from desisurvey.utils import get_observer
from desispec import __version__ as desispec_version
import desispec.database.redshift as db
#
# Paths to files, etc.
#
specprod = os.environ['SPECPROD'] = 'fuji'
basedir = os.path.join(os.environ['DESI_SPECTRO_REDUX'], specprod)
# surveydir = os.environ['DESISURVEY_OUTPUT'] = os.path.join(basedir, 'survey')
# targetdir = os.path.join(basedir, 'targets')
# fibassigndir = os.path.join(basedir, 'fiberassign')
# os.environ['DESI_SPECTRO_REDUX'] = os.path.join(basedir, 'spectro', 'redux')
# os.environ['DESI_SPECTRO_SIM'] = os.path.join(basedir, 'spectro', 'sim')
# os.environ['PIXPROD'] = 'mini'
# os.environ['SPECPROD'] = 'mini'
# reduxdir = os.path.join(os.environ['DESI_SPECTRO_REDUX'], os.environ['SPECPROD'])
# simdatadir = os.path.join(os.environ['DESI_SPECTRO_SIM'], os.environ['PIXPROD'])
# os.environ['DESI_SPECTRO_DATA'] = simdatadir
#
# Initialize ephemerides, to find Moon, etc.
#
os.environ['DESISURVEY_OUTPUT'] = os.environ['CSCRATCH']
ephem = get_ephem()
#
# get_ephem() will run freeze_iers(), so we import these after that.
#
from astropy.time import Time
from astropy.coordinates import ICRS
#
# Working directory.
#
workingdir = os.getcwd()
print(f'sqlalchemy=={sqlalchemy_version}')
print(f'desispec=={desispec_version}')
print(f'desisurvey=={desisurvey_version}')

INFO:iers.py:82:freeze_iers: Freezing IERS table used by astropy time, coordinates.
INFO:ephem.py:80:get_ephem: Restored ephemerides for (2019-01-01,2027-12-31) from /global/cscratch1/sd/bweaver/ephem_2019-01-01_2027-12-31.fits.
sqlalchemy==1.4.28
desispec==0.51.13.dev6715
desisurvey==0.18.0.dev1079


In [2]:
def moon(self, mjd, ra, dec):
    """Compute relative location of the Moon.
    
    Parameters
    ----------
    mjd : float
        Time of observation
    ra : float
        Right Ascension
    dec : float
        Declination
    
    Returns
    -------
    tuple
        Moon separation, Moon altitude, Moon illumination fraction
    """
    observation_time = Time(mjd, format='mjd')
    position = ICRS(ra=ra*u.deg, dec=dec*u.deg)
    zenith = get_observer(observation_time, alt=90 * u.deg, az=0 * u.deg).transform_to(ICRS)
    alt = 90 * u.deg - position.separation(zenith)
    moon_dec, moon_ra = get_object_interpolator(self.get_night(observation_time), 'moon', altaz=False)(observation_time.mjd)
    moon_position = ICRS(ra=moon_ra*u.deg, dec=moon_dec*u.deg)
    moon_sep = position.separation(moon_position).to(u.deg).value
    moon_alt = (90 * u.deg - moon_position.separation(zenith)).to(u.deg).value
    moon_frac = ephem.get_moon_illuminated_fraction(observation_time.mjd).tolist()
    return (moon_sep, moon_alt, moon_frac)

ephem.moon = MethodType(moon, ephem)

## Contents of the Database

All tables are grouped into a database "schema" and that schema is named for the production run, (*e.g.* `fuji`).  When writing "raw" SQL, table names need to be schema-qualified, for example, `fuji.target`.  However, the SQLAlchemy abstraction layer is designed to take care of this for you.

The tables are:

* `target`. This contains the photometric and targeting bits.
  - Loaded from `targetphot` files.
  - SQLAlchemy object: `db.Target`.
  - Primary key: `targetid`.
* `tile`. This contains information about observations grouped by tile.
  - Loaded from top-level `tiles-${SPECPROD}.fits`.
  - SQLAlchemy object: `db.Tile`.
  - Primary key: `tileid`.
* `exposure`. This contains information about individual exposures.
  - Loaded from top-level `exposures-${SPECPROD}.fits`, `EXPOSURES` HDU.
  - SQLAlchemy object: `db.Exposure`.
  - Primary key: `expid`.
* `frame`. This contains information about individual exposures, but broken down by camera.  There will usually, but not always, be 30 frames per exposure.
  - Loaded from top-level `exposures-${SPECPROD.fits`, `FRAMES` HDU.
  - SQLAlchemy object: `db.Frame`.
  - Primary key: `frameid`, composed from `expid` and a mapping of `camera` to an arbitrary integer.
* `fiberassign`. This contains information about fiber positions.
  - Loaded from fiberassign files in the tiles product.  All fiberassign files corresponding to tiles in the `tile` table are loaded.
  - SQLAlchemy object: `db.Fiberassign`.
  - Primary key: (`tileid`, `targetid`, `location`)
* `potential`. This contains a list of `targetid`s that *could* have been targeted on a given tile.
  - Loaded from the `POTENTIAL_ASSIGNMENTS` HDU in the same fiberassign files mentioned above.
  - SQLAlchemy object: `db.Potential`.
  - Primary key: (`tileid`, `targetid`, `location`)
* `zpix`. This contains the pipeline redshifts grouped by HEALPixel.
  - Loaded from the `zpix-*.fits` files in the `zcatalog/` directory.
  - SQLAlchemy object: `db.Zpix`.
  - Primary key: (`targetid`, `survey`, `program`)
* `ztile`. This contains the pipeline redshifts grouped by tile in a variety of ways.
  - Loaded from the `ztile-*.fits` files in the `zcatalog/` directory.
  - SQLAlchemy object: `db.Ztile`
  - Primary key: (`targetid`, `spgrp`, `spgrpval`, `tileid`)

## Initial Database Connection

This connection uses a `~/.pgpass` file to set up connection credentials.  [Be sure you have set that up](https://desi.lbl.gov/trac/wiki/DESIProductionDatabase#Setuppgpass).

In [3]:
db.log = get_logger(DEBUG)
postgresql = db.setup_db(schema=specprod, hostname='nerscdb03.nersc.gov', username='desi', verbose=True)

INFO:redshift.py:1337:setup_db: Begin creating tables.
INFO:redshift.py:1343:setup_db: Finished creating tables.


## Learning About the Tables

The tables in the database are listed above.  To inspect an individual table, you can use the `__table__` attribute.

In [None]:
#
# Print the table columns and their types.
#
[(c.name, c.type) for c in db.Zpix.__table__.columns]

We can also `inspect()` the database.  For details see [here](http://docs.sqlalchemy.org/en/latest/core/inspection.html?highlight=inspect#module-sqlalchemy.inspection).

In [None]:
inspector = inspect(db.engine)
for table_name in inspector.get_table_names(schema=specprod):
    print(table_name)
    for column in inspector.get_columns(table_name, schema=specprod):
        print("Column: {name} {type}".format(**column))

### Exercises

* What is the type of the `night` column of the `exposures` table?
* What is the primary key of the `ztile` table?

## Simple Queries

Queries are set up with the `.query()` method on Session objects.  In this case, there's a prepared Session object called `db.dbSession`.  `.filter()` corresponds to a `WHERE` clause in SQL.

### Exposures, Nights, Tiles

Here are some simple queries that demonstrate simple connections between nights, exposures and tiles.

In [None]:
# Number of tiles.
q = db.dbSession.query(db.Tile).count()
q

In [None]:
# Tiles observed on a particluar night.
q = db.dbSession.query(db.Exposure.tileid, db.Exposure.survey, db.Exposure.program).filter(db.Exposure.night==20210115).all()
q

In [None]:
# On what nights were tile 100 observed?
q = db.dbSession.query(db.Exposure.night, db.Exposure.expid).filter(db.Exposure.tileid==100).all()
q

### Select ELG Targets

Note the special way we obtain the bitwise and operator.

In [None]:
q = db.dbSession.query(db.Target).filter(db.Target.desi_target.op('&')(desi_mask.ELG) != 0).all()

In [None]:
[(row.targetid, row.desi_target, row.ra, row.dec) for row in q[:10]]

#### Exercise

* How many objects in the `zpix` table have `spectype` 'GALAXY'?

### Redshift and Classification

Simple query filtering on string values. Note that the slice notation `[:20]` is equivalent to `LIMIT 20` in SQL.

In [None]:
q = db.dbSession.query(db.Zpix.spectype, db.Zpix.subtype, db.Zpix.z).filter(db.Zpix.spectype=='STAR').filter(db.Zpix.subtype!='')[:20]
q

## Joining Tables

### A Simple Join

Let's look at the nights and exposures on which a particular `TARGETID` was observed.

In [None]:
q = db.dbSession.query(db.Fiberassign.tileid, db.Exposure.expid, db.Exposure.night).filter(db.Fiberassign.tileid == db.Exposure.tileid).filter(db.Fiberassign.targetid==933811403620352).all()
q

### Another Simple Join

In this case, we'll look at photometric flux and measured redshift. We'll `LIMIT` the query with slice notation.

In [None]:
q = db.dbSession.query(db.Target, db.Zpix).filter(db.Target.targetid == db.Zpix.targetid)[:50]

In [None]:
[(row.Target.flux_g, row.Target.flux_r, row.Target.flux_z, row.Zpix.z) for row in q]

In [None]:
#
# A very similar plot appears in the tutorial notebook dc17a-truth.
#
# dv = lightspeed.to('km / s') * np.array([(row.ZCat.z - row.Truth.truez) / (1.0 + row.Truth.truez) for row in q])
# ttype = [row.Truth.templatetype for row in q]
# fig, axes = plt.subplots(2, 3, figsize=(9,6), dpi=100)
# for k, objtype in enumerate(set(ttype)):
#     i = k % 2
#     j = k % 3
#     # s = axes[i].subplot(2, 3, 1+i)
#     ii = np.array(ttype) == objtype
#     axes[i][j].hist(dv[ii], 50, (-100, 100))
#     axes[i][j].set_xlabel('{} dv [km/s]'.format(objtype))
# fig.tight_layout()

#### Exercise

* Create a color-color plot for objects targeted as QSOs, and spectroscopically confirmed as such.

### A More Complicated Join

Let's look at objects that appear on more than one tile. For each of those tiles, how many exposures where there?

In this example, we're using `sqlalchemy.sql.func` to get the equivalent of `COUNT(*)` and a subquery that itself is a multi-table join.

In [None]:
# db.dbSession.rollback()
q1 = db.dbSession.query(db.Fiberassign.targetid, func.count('*').label('n_assign')).group_by(db.Fiberassign.targetid).subquery()
q2 = db.dbSession.query(db.Tile.nexp, db.Fiberassign.tileid, q1.c.targetid, q1.c.n_assign).filter(q1.c.n_assign>2).filter(db.Fiberassign.targetid == q1.c.targetid).filter(db.Tile.tileid == db.Fiberassign.tileid)[:100]

In [None]:
q2

In [None]:
#
# If everything matches up, this should return True.
#
# all([row.ZCat.numexp == row.n_assign for row in q2])

#### Exercise

* What is the distribution of number of exposures?

## Fly me to the Moon

How does the Moon affect redshifts?  First, let's find exposures that exposures that had the Moon above the horizon.

In [24]:
moon_up = [e.expid for e in db.dbSession.query(db.Exposure).all() if ephem.moon(e.mjd, e.tilera, e.tiledec)[1] > 0]
len(moon_up)

2022-03-25 16:09:29,605 INFO sqlalchemy.engine.Engine SELECT fuji.exposure.night AS fuji_exposure_night, fuji.exposure.expid AS fuji_exposure_expid, fuji.exposure.tileid AS fuji_exposure_tileid, fuji.exposure.tilera AS fuji_exposure_tilera, fuji.exposure.tiledec AS fuji_exposure_tiledec, fuji.exposure.date_obs AS fuji_exposure_date_obs, fuji.exposure.mjd AS fuji_exposure_mjd, fuji.exposure.survey AS fuji_exposure_survey, fuji.exposure.program AS fuji_exposure_program, fuji.exposure.faprgrm AS fuji_exposure_faprgrm, fuji.exposure.faflavor AS fuji_exposure_faflavor, fuji.exposure.exptime AS fuji_exposure_exptime, fuji.exposure.efftime_spec AS fuji_exposure_efftime_spec, fuji.exposure.goaltime AS fuji_exposure_goaltime, fuji.exposure.goaltype AS fuji_exposure_goaltype, fuji.exposure.mintfrac AS fuji_exposure_mintfrac, fuji.exposure.airmass AS fuji_exposure_airmass, fuji.exposure.ebv AS fuji_exposure_ebv, fuji.exposure.seeing_etc AS fuji_exposure_seeing_etc, fuji.exposure.efftime_etc AS fu

902

In [28]:
moon_up

[90250,
 87505,
 87382,
 87126,
 87128,
 87263,
 87381,
 87259,
 87618,
 79308,
 79309,
 79310,
 79311,
 77579,
 83748,
 83448,
 83166,
 83010,
 83011,
 86514,
 85637,
 86756,
 90247,
 86627,
 86508,
 87124,
 86757,
 87122,
 86987,
 86988,
 87125,
 85636,
 86511,
 87617,
 86509,
 86626,
 83891,
 86993,
 87261,
 87127,
 87262,
 90249,
 86513,
 86755,
 86753,
 86015,
 86013,
 86269,
 86260,
 86264,
 86252,
 86019,
 86256,
 86378,
 86384,
 86388,
 86392,
 86258,
 86254,
 86251,
 86011,
 86017,
 86382,
 86504,
 85635,
 90246,
 86758,
 86625,
 87123,
 87380,
 90245,
 87257,
 87506,
 87129,
 87121,
 90239,
 86986,
 86515,
 85082,
 85502,
 85503,
 90240,
 85628,
 85340,
 87385,
 86619,
 86618,
 87264,
 87384,
 86620,
 86518,
 79576,
 79577,
 74463,
 74462,
 85509,
 85508,
 85507,
 86622,
 86621,
 74829,
 74830,
 74831,
 74832,
 85078,
 86741,
 85189,
 86617,
 86616,
 85196,
 86495,
 85626,
 85075,
 85343,
 85627,
 85624,
 85623,
 85622,
 85621,
 85620,
 82359,
 82495,
 82360,
 82354,
 81859,


So there are a few.  But there is a subtle issue: redshifts are based on *all* exposures, but maybe there are some redshifts where the object was observed *only* with the Moon up. And we can try to compare those objects to similar objects observed *only* with the Moon down.

In [30]:
q_up = db.dbSession.query(db.Ztile.targetid, db.Ztile.z, db.Ztile.zerr, db.Ztile.zwarn)\
                   .join(db.Target, db.Target.targetid==db.Ztile.targetid)\
                   .join(db.Fiberassign, db.Target.targetid==db.Fiberassign.targetid)\
                   .join(db.Exposure, db.Fiberassign.tileid==db.Exposure.tileid)\
                   .filter(db.Ztile.spgrp=='perexp')\
                   .filter(db.Target.desi_target.op('&')(desi_mask.LRG) != 0)\
                   .filter(db.Exposure.expid.in_(moon_up)).all()
q_dn = db.dbSession.query(db.Ztile.targetid, db.Ztile.z, db.Ztile.zerr, db.Ztile.zwarn)\
                   .join(db.Target, db.Target.targetid==db.Ztile.targetid)\
                   .join(db.Fiberassign, db.Target.targetid==db.Fiberassign.targetid)\
                   .join(db.Exposure, db.Fiberassign.tileid==db.Exposure.tileid)\
                   .filter(db.Ztile.spgrp=='perexp')\
                   .filter(db.Target.desi_target.op('&')(desi_mask.LRG) != 0)\
                   .filter(~db.Exposure.expid.in_(moon_up)).all()

2022-03-25 16:14:57,889 INFO sqlalchemy.engine.Engine SELECT fuji.ztile.targetid AS fuji_ztile_targetid, fuji.ztile.z AS fuji_ztile_z, fuji.ztile.zerr AS fuji_ztile_zerr, fuji.ztile.zwarn AS fuji_ztile_zwarn 
FROM fuji.ztile JOIN fuji.target ON fuji.target.targetid = fuji.ztile.targetid JOIN fuji.fiberassign ON fuji.target.targetid = fuji.fiberassign.targetid JOIN fuji.exposure ON fuji.fiberassign.tileid = fuji.exposure.tileid 
WHERE fuji.ztile.spgrp = %(spgrp_1)s AND (fuji.target.desi_target & %(desi_target_1)s) != %(param_1)s AND fuji.exposure.expid IN (%(expid_1_1)s, %(expid_1_2)s, %(expid_1_3)s, %(expid_1_4)s, %(expid_1_5)s, %(expid_1_6)s, %(expid_1_7)s, %(expid_1_8)s, %(expid_1_9)s, %(expid_1_10)s, %(expid_1_11)s, %(expid_1_12)s, %(expid_1_13)s, %(expid_1_14)s, %(expid_1_15)s, %(expid_1_16)s, %(expid_1_17)s, %(expid_1_18)s, %(expid_1_19)s, %(expid_1_20)s, %(expid_1_21)s, %(expid_1_22)s, %(expid_1_23)s, %(expid_1_24)s, %(expid_1_25)s, %(expid_1_26)s, %(expid_1_27)s, %(expid_1_28)s,

In [33]:
desi_mask

desi_mask:
  - [LRG,              0, "LRG", {'obsconditions': 'DARK', 'priorities': {'UNOBS': 3200, 'MORE_ZGOOD': 2, 'MORE_ZWARN': 2, 'DONE': 2, 'OBS': 1, 'DONOTOBSERVE': 0, 'MORE_MIDZQSO': 0}, 'numobs': 2}]
  - [ELG,              1, "ELG", {'obsconditions': 'DARK', 'priorities': {'UNOBS': 3000, 'MORE_ZGOOD': 2, 'MORE_ZWARN': 2, 'DONE': 2, 'OBS': 1, 'DONOTOBSERVE': 0, 'MORE_MIDZQSO': 0}, 'numobs': 2}]
  - [QSO,              2, "QSO", {'obsconditions': 'DARK', 'priorities': {'UNOBS': 3400, 'MORE_ZGOOD': 3350, 'MORE_ZWARN': 3300, 'MORE_MIDZQSO': 100, 'DONE': 2, 'OBS': 1, 'DONOTOBSERVE': 0}, 'numobs': 4}]
  - [QSO_HIZ,          4, "QSO selected using high-redshift Random Forest (informational bit)", {'obsconditions': 'DARK', 'priorities': {'UNOBS': 0, 'DONE': 0, 'OBS': 0, 'DONOTOBSERVE': 0, 'MORE_MIDZQSO': 0, 'MORE_ZWARN': 0, 'MORE_ZGOOD': 0}, 'numobs': -1}]
  - [ELG_LOP,          5, "ELG at standard (ELG) priority", {'obsconditions': 'DARK', 'priorities': {'UNOBS': 3100, 'MORE_ZGOOD': 2,

In [34]:
bgs_mask

bgs_mask:
  - [BGS_FAINT,        0, "BGS faint targets", {'obsconditions': 'BRIGHT', 'priorities': {'UNOBS': 2000, 'MORE_ZGOOD': 2, 'MORE_ZWARN': 2, 'DONE': 2, 'OBS': 1, 'DONOTOBSERVE': 0, 'MORE_MIDZQSO': 0}, 'numobs': 2}]
  - [BGS_BRIGHT,       1, "BGS bright targets", {'obsconditions': 'BRIGHT', 'priorities': {'UNOBS': 2100, 'MORE_ZGOOD': 2, 'MORE_ZWARN': 2, 'DONE': 2, 'OBS': 1, 'DONOTOBSERVE': 0, 'MORE_MIDZQSO': 0}, 'numobs': 2}]
  - [BGS_WISE,         2, "BGS wise targets", {'obsconditions': 'BRIGHT', 'priorities': {'UNOBS': 2000, 'MORE_ZGOOD': 2, 'MORE_ZWARN': 2, 'DONE': 2, 'OBS': 1, 'DONOTOBSERVE': 0, 'MORE_MIDZQSO': 0}, 'numobs': 2}]
  - [BGS_FAINT_HIP,    3, "BGS faint targets at bright priority", {'obsconditions': 'BRIGHT', 'priorities': {'UNOBS': 2100, 'MORE_ZGOOD': 2, 'MORE_ZWARN': 2, 'DONE': 2, 'OBS': 1, 'DONOTOBSERVE': 0, 'MORE_MIDZQSO': 0}, 'numobs': 2}]
  - [BGS_FAINT_NORTH,  8, "BGS faint cuts tuned for Bok/Mosaic", {'obsconditions': 'BRIGHT', 'priorities': {'UNOBS': 0,

In [35]:
mws_mask

mws_mask:
  - [MWS_BROAD,        0, "Milky Way Survey magnitude limited bulk sample", {'obsconditions': 'BRIGHT', 'priorities': {'UNOBS': 1400, 'MORE_ZGOOD': 50, 'MORE_ZWARN': 50, 'DONE': 2, 'OBS': 1, 'DONOTOBSERVE': 0, 'MORE_MIDZQSO': 0}, 'numobs': 2}]
  - [MWS_WD,           1, "Milky Way Survey White Dwarf", {'obsconditions': 'BRIGHT|DARK', 'priorities': {'UNOBS': 2998, 'MORE_ZGOOD': 50, 'MORE_ZWARN': 50, 'DONE': 2, 'OBS': 1, 'DONOTOBSERVE': 0, 'MORE_MIDZQSO': 0}, 'numobs': 2}]
  - [MWS_NEARBY,       2, "Milky Way Survey volume-complete ~100pc sample", {'obsconditions': 'BRIGHT', 'priorities': {'UNOBS': 1600, 'MORE_ZGOOD': 50, 'MORE_ZWARN': 50, 'DONE': 2, 'OBS': 1, 'DONOTOBSERVE': 0, 'MORE_MIDZQSO': 0}, 'numobs': 2}]
  - [MWS_BROAD_NORTH,  4, "Milky Way Survey cuts tuned for Bok/Mosaic", {'obsconditions': 'BRIGHT', 'priorities': {'UNOBS': 0, 'DONE': 0, 'OBS': 0, 'DONOTOBSERVE': 0, 'MORE_MIDZQSO': 0, 'MORE_ZWARN': 0, 'MORE_ZGOOD': 0}, 'numobs': 0}]
  - [MWS_BROAD_SOUTH,  5, "Milky Way

In [26]:
targetid_up, z_up, zerr_up, zwarn_up = zip(*q_up)
targetid_dn, z_dn, zerr_dn, zwarn_dn = zip(*q_dn)
targetid_up = np.array(targetid_up)
z_up = np.array(z_up)
zerr_up = np.array(zerr_up)
zwarn_up = np.array(zwarn_up)
targetid_dn = np.array(targetid_dn)
z_dn = np.array(z_dn)
zerr_dn = np.array(zerr_dn)
zwarn_dn = np.array(zwarn_dn)
ok_up = zwarn_up == 0
ok_dn = zwarn_dn == 0

ValueError: not enough values to unpack (expected 4, got 0)

In [None]:
#
# Observed redshift versus true redshift.
#
fig, axes = plt.subplots(1, 1, figsize=(8, 8), dpi=100)
p1 = axes.plot(truez_up[ok_up], z_up[ok_up], 'r.', label='Up')
p2 = axes.plot(truez_dn[ok_dn], z_dn[ok_dn], 'b.', label='Down')
foo = axes.set_xlabel('True redshift')
foo = axes.set_ylabel('Pipeline redshift')
foo = axes.legend(loc=4)

In [None]:
#
# Velocity residual versus true redshift.
#
fig, axes = plt.subplots(1, 1, figsize=(8, 8), dpi=100)
p1 = axes.semilogy(truez_up[ok_up], np.abs(dv_up[ok_up]), 'r.', label='Up')
p2 = axes.semilogy(truez_dn[ok_dn], np.abs(dv_dn[ok_dn]), 'b.', label='Down')
foo = axes.set_xlabel('True redshift')
foo = axes.set_ylabel('Absolute Velocity residual [km/s]')
foo = axes.legend(loc=1)

Well, there doesn't appear to be much difference here.  That's not necessarily a bad thing!

### Exercise

* Try a different target class!

## Survey Progress

Let's see which nights have data, and count the number of exposures per night.

In [None]:
q = db.dbSession.query(db.Exposure.night, func.count(db.Exposure.expid).label('n_exp')).group_by(db.Exposure.night).order_by(db.Exposure.night).all()
q

Observation timestamp for a given night.  Note how we have both MJD and a corresponding `datetime.datetime` object in the database.

In [None]:
q = db.dbSession.query(db.Exposure.expid, db.Exposure.mjd, db.Exposure.date_obs).filter(db.Exposure.night == 20210428).order_by(db.Exposure.expid).all()
q

So, for a given target in the `target` table, when was the observation completed?  In other words, if a target has multiple observations, we want the date of the *last* observation.

In [None]:
#
# How many targets are there?
#
N_targets = db.dbSession.query(db.Target).count()
N_targets

In [None]:
#
# Find all targetids that have observations.
#
q1 = db.dbSession.query(db.Target.targetid).filter(db.Target.targetid == db.Fiberassign.targetid).filter(db.Fiberassign.tileid == db.Exposure.tileid).group_by(db.Target.targetid).subquery()
#
# Find the exposure times for the targetids that have been observed
#
q2 = db.dbSession.query(db.Fiberassign.targetid, db.Exposure.expid, db.Exposure.mjd).filter(db.Fiberassign.targetid == q1.c.targetid).filter(db.Fiberassign.tileid == db.Exposure.tileid).order_by(q1.c.targetid, db.Exposure.expid).all()
targetid, expid, mjd = zip(*q2)
targetid = np.array(targetid)
expid = np.array(expid)
mjd = np.array(mjd)
#
# Use the counts to give the *last* observation.
#
unique_targetid, i, j, c = np.unique(targetid, return_index=True, return_inverse=True, return_counts=True)
unique_expid = expid[i + (c-1)]
unique_mjd = mjd[i + (c-1)]

In [None]:
#
# Now we have the targets and the date of last observation.  But it's sorted by targetid.
#
ii = unique_expid.argsort()
unique_targetid, i3, j3, c3 = np.unique(unique_expid[ii], return_index=True, return_inverse=True, return_counts=True)
N_completed = np.cumsum(c3)

In [None]:
min_mjd = 10*(int(mjd.min())//10)
fig, axes = plt.subplots(1, 1, figsize=(8, 8), dpi=100)
p1 = axes.plot(unique_mjd[ii][i3] - min_mjd, N_completed/N_targets, 'k-')
foo = axes.set_xlabel(f'MJD - {min_mjd:d}')
foo = axes.set_ylabel('Fraction completed')
foo = axes.grid(True)
# foo = axes.legend(loc=1)

### Exercise

* Break down the progress by target class, target bit, etc.

## Using Relationships in SQLAlchemy

Here we demonstrate how table relationships can simplify certain queries. First we grab a single `db.Exposure` object.

In [None]:
exposure = db.dbSession.query(db.Exposure).filter(db.Exposure.expid==86507).one()
exposure

How do we get the `db.Frame` objects associated with this exposure?

In [None]:
exposure.frames

What tile is associated with this exposure?

In [None]:
exposure.tile

What fiberassignments were made on this tile?

In [None]:
exposure.tile.fiberassign[:20]

What redshifts were measured on this tile?

In [None]:
exposure.tile.ztile_redshifts[:20]

### Exercise

* Rewrite example queries above using relationships wherever possible.