In [None]:
__author__ = 'Benjamin Weaver <benjamin.weaver@noirlab.edu>, Alice Jacques <alice.jacques@noirlab.edu>, Astro Data Lab Team <datalab@noirlab.edu>'
__version__ = '20230501'
__datasets__ = ['desi_edr', 'ls_dr9']
__keywords__ = ['query', 'DESI']

# How to query DESI EDR Data

## Table of Contents

* [Abstract](#abstract)

## Abstract

This tutorial will cover the basics of using the spectroscopic production database, which is loaded from the outputs of the DESI pipeline.  Currently, this is based on software "release" `23.1`, and uses a [PostgreSQL](https://www.postgresql.org/) database. We use [SQLAlchemy](http://www.sqlalchemy.org/) to abstract away the details of the database.

## Requirements

This tutorial uses data from the `fuji` production (`/global/cfs/cdirs/desi/public/edr/spectro/redux/fuji`), and the **DESI 23.1** kernel.

## Getting Help

If you find a problem using this notebook, please [fill out an new issue on GitHub describing your situation](https://github.com/desihub/tutorials/issues/new/).

## Initial Setup

This just imports everything we need and sets up paths and environment variables so we can find things.

In [5]:
#
# Imports
#
import os
from types import MethodType
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.font_manager import fontManager, FontProperties
# from sqlalchemy import __version__ as sqlalchemy_version
# from sqlalchemy import inspect
# from sqlalchemy.sql import func
import astropy.units as u
from dl import queryClient as qc, storeClient as sc, authClient as ac
#
# DESI software
#
# from desiutil.log import get_logger, DEBUG
from desitarget.targetmask import (desi_mask, mws_mask, bgs_mask)
# from desisim.spec_qa import redshifts as dsq_z
from desisurvey import __version__ as desisurvey_version
# from desisurvey.ephem import get_ephem, get_object_interpolator
from desisurvey.utils import get_observer
# from specprodDB import __version__ as specprodDB_version
# import specprodDB.load as db
#
# Set the spectroscopic production run.
#
specprod = os.environ['SPECPROD'] = 'fuji'  # Change this to 'guadalupe' if needed.
#
# Initialize ephemerides, to find Moon, etc.
#
os.environ['DESISURVEY_OUTPUT'] = os.environ['SCRATCH']
# ephem = get_ephem()
#
# get_ephem() will run freeze_iers(), so we import these after that.
#
from astropy.time import Time
from astropy.coordinates import ICRS
#
# Working directory.
#
workingdir = os.getcwd()
# print(f'sqlalchemy=={sqlalchemy_version}')
# print(f'specprodDB=={specprodDB_version}')
print(f'desisurvey=={desisurvey_version}')
qc.set_profile('db01')

desisurvey==0.18.0


This function will compute various Moon paramters needed below.

In [None]:
def moon(self, mjd, ra, dec):
    """Compute relative location of the Moon.
    
    Parameters
    ----------
    mjd : float
        Time of observation
    ra : float
        Right Ascension
    dec : float
        Declination
    
    Returns
    -------
    tuple
        Moon separation, Moon altitude, Moon illumination fraction
    """
    observation_time = Time(mjd, format='mjd')
    position = ICRS(ra=ra*u.deg, dec=dec*u.deg)
    zenith = get_observer(observation_time, alt=90 * u.deg, az=0 * u.deg).transform_to(ICRS)
    alt = 90 * u.deg - position.separation(zenith)
    moon_dec, moon_ra = get_object_interpolator(self.get_night(observation_time), 'moon', altaz=False)(observation_time.mjd)
    moon_position = ICRS(ra=moon_ra*u.deg, dec=moon_dec*u.deg)
    moon_sep = position.separation(moon_position).to(u.deg).value
    moon_alt = (90 * u.deg - moon_position.separation(zenith)).to(u.deg).value
    moon_frac = ephem.get_moon_illuminated_fraction(observation_time.mjd).tolist()
    return (moon_sep, moon_alt, moon_frac)

ephem.moon = MethodType(moon, ephem)

## Contents of the Database

### Schema

All tables are grouped into a database *schema* and that schema is named for the production run, (*e.g.* `fuji`).  When writing "raw" SQL, table names need to be schema-qualified, for example, `fuji.target`.  However, the SQLAlchemy abstraction layer is designed to take care of this for you.

### Important notes

* This database does not contain any sky spectra. Both deliberately-targeted (`targetid & 2**59 != 0`) and negative targetid (`targetid < 0`) sky spectra are excluded.
* Only quantities derived from *cumulative* tile-based spectra are included at the present time.

### The tables

* `photometry`. This contains the pure photometric data. Usually this is derived from the LS DR9 Tractor data, but not every *targeted* object has Tractor photometry.
  - Loaded from `tractorphot` files: `/global/cfs/cdirs/desi/public/edr/vac/lsdr9-photometry/v2.0/potential-targets/tractorphot/tractorphot-potential-*-fuji.fits`.
  - SQLAlchemy object: `db.Photometry`.
  - Primary key: `targetid`.
* `target`. This contains the targeting bits and other data generated by `desitarget`.
  - Loaded from `targetphot` file: `/global/cfs/cdirs/desi/public/edr/vac/lsdr9-photometry/v2.0/potential-targets/targetphot-potential-fuji.fits`.
  - SQLAlchemy object: `db.Target`.
  - Unique identifier: (`targetid`, `survey`, `tileid`).
  - Primary key: `id`, a unique, arbitrary integer composed from (`targetid`, `survey`, `tileid`).
* `tile`. This contains information about observations grouped by tile.
  - Loaded from top-level `tiles-fuji.fits`.
  - SQLAlchemy object: `db.Tile`.
  - Primary key: `tileid`.
* `exposure`. This contains information about individual exposures.
  - Loaded from top-level `exposures-fuji.fits`, `EXPOSURES` HDU.
  - SQLAlchemy object: `db.Exposure`.
  - Primary key: `expid`.
* `frame`. This contains information about individual exposures, but broken down by camera.  There will usually, but not always, be 30 frames per exposure.
  - Loaded from top-level `exposures-fuji.fits`, `FRAMES` HDU.
  - SQLAlchemy object: `db.Frame`.
  - Unique identifier: (`expid`, `camera`).
  - Primary key: `frameid`, composed from `expid` and a mapping of `camera` to an arbitrary integer.
* `fiberassign`. This contains information about fiber positions.
  - Loaded from fiberassign files in the tiles product.  All fiberassign files corresponding to tiles in the `tile` table are loaded.
  - SQLAlchemy object: `db.Fiberassign`.
  - Unique identifier: (`tileid`, `targetid`, `location`).
  - Primary key: `id`, a unique, arbitrary integer composed from (`tileid`, `targetid`, `location`).
* `potential`. This contains a list of `targetid`s that *could* have been targeted on a given tile.
  - Loaded from the `POTENTIAL_ASSIGNMENTS` HDU in the same fiberassign files mentioned above.
  - SQLAlchemy object: `db.Potential`.
  - Unique identifier: (`tileid`, `targetid`, `location`).
  - Primary key: `id`, a unique, arbitrary integer composed from (`tileid`, `targetid`, `location`).
* `zpix`. This contains the pipeline redshifts grouped by HEALPixel.
  - Loaded from the `zall-pix-fuji.fits` file in the `zcatalog/` directory.
  - SQLAlchemy object: `db.Zpix`.
  - Unique identifier: (`targetid`, `survey`, `program`).
  - Primary key: `id`, a unique, arbitrary integer composed from (`targetid`, `survey`, `program`).
* `ztile`. This contains the pipeline redshifts grouped by tile in a variety of ways.
  - Loaded from the `zall-tilecumulative-fuji.fits` file in the `zcatalog/` directory.
  - SQLAlchemy object: `db.Ztile`.
  - Unique identifier: (`targetid`, `spgrp`, `spgrpval`, `tileid`).
  - Primary key: `id`, a unique, arbitrary integer composed from (`targetid`, `spgrp`, `spgrpval`, `tileid`).
* `version`. This contains metadata related to loading the database schema associated with the production run. It has no relationships to other tables.
  - SQLAlchemy object: `db.Version`.
  - Primary key: `id`, a sequential integer.

### Foreign key relationships

* `fuji.target.targetid` -> `fuji.photometry.targetid`.
* `fuji.target.tileid` -> `fuji.tile.tileid`.
* `fuji.exposure.tileid` -> `fuji.tile.tileid`.
* `fuji.frame.expid` -> `fuji.exposure.expid`.
* `fuji.fiberassign.targetid` -> `fuji.photometry.targetid`.
* `fuji.fiberassign.tileid` -> `fuji.tile.tileid`.
* `fuji.potential.targetid` -> `fuji.photometry.targetid`.
* `fuji.potential.tileid` -> `fuji.tile.tileid`.
* `fuji.zpix.targetid` -> `fuji.photometry.targetid`.
* `fuji.ztile.targetphotid` -> `fuji.target.id`.
* `fuji.ztile.targetid` -> `fuji.photometry.targetid`.
* `fuji.ztile.tileid` -> `fuji.tile.tileid`.

## Initial Database Connection

### Set up .pgpass

This connection uses a `~/.pgpass` file to store connection credentials. If you already have this set up, you can skip this subsection.

If you already have a `~/.pgpass` file, simply do:

In [None]:
!cat /global/common/software/desi/desi_public.pgpass >> ~/.pgpass

If you need to create a `~/.pgpass` file:

In [None]:
!cp /global/common/software/desi/desi_public.pgpass ~/.pgpass; chmod 600 ~/.pgpass

In [6]:
ac.whoAmI()

'baweaver'

### Establish connection

In [None]:
#
# For much more output, use DEBUG/verbose mode.
#
# db.log = get_logger(DEBUG)
# postgresql = db.setup_db(schema=specprod, hostname='specprod-db.desi.lbl.gov', username='desi_public', verbose=True)
db.log = get_logger()
postgresql = db.setup_db(schema=specprod, hostname='specprod-db.desi.lbl.gov', username='desi_public')

## Learning About the Tables

The tables in the database are listed above.  To inspect an individual table, you can use the `__table__` attribute.

In [None]:
#
# Print the table columns and their types.
#
[(c.name, c.type) for c in db.Zpix.__table__.columns]

We can also `inspect()` the database.  For details see [here](http://docs.sqlalchemy.org/en/latest/core/inspection.html?highlight=inspect#module-sqlalchemy.inspection).

In [None]:
inspector = inspect(db.engine)
for table_name in inspector.get_table_names(schema=specprod):
    print(table_name)
    for column in inspector.get_columns(table_name, schema=specprod):
        print("Column: {name} {type}".format(**column))

### Exercises

* What is the type of the `night` column of the `exposures` table?
* What is the primary key of the `ztile` table?

## Simple Queries

Queries are set up with the `.query()` method on Session objects.  In this case, there's a prepared Session object called `db.dbSession`.  `.filter()` corresponds to a `WHERE` clause in SQL.

In most of the examples below, we include the equivalent raw SQL command that corresponds to the query.

### Versions

Each schema contains a `version` table that describes the software and other tagged data sets used to load that schema. This is mainly intended for internal metadata tracking purposes, but it is still visible.

```SQL
SELECT * FROM desi_edr.version;
```

In [14]:
# q = db.dbSession.query(db.Version).all()
response = qc.query(sql='SELECT * FROM desi_edr.version;', fmt='csv', timeout=600)
print(response)

queryClientError: Error: relation "desi_edr.version" does not exist
LINE 1: copy (SELECT * FROM desi_edr.version) to stdout with csv hea...
                            ^


### Exposures, Nights, Tiles

Here are some simple queries that demonstrate simple connections between nights, exposures and tiles.

#### How many tiles are there?

```SQL
SELECT COUNT(tileid) FROM desi_edr.tile;
```

In [15]:
response = qc.query(sql='SELECT COUNT(tileid) FROM desi_edr.tile;', fmt='csv', timeout=600)
print(response)

count
732



#### On which nights were a particular tile observed?

```SQL
SELECT night, expid FROM desi_edr.exposure WHERE tileid = 100;
```

In [16]:
response = qc.query(sql='SELECT night, expid FROM desi_edr.exposure WHERE tileid = 100;', fmt='csv', timeout=600)
print(response)

night,expid
20210504,87236
20210505,87361



#### Which tiles were observed on a night?

```SQL
SELECT tileid, survey, program FROM desi_edr.exposure WHERE night = 20210115;
```

In [12]:
response = qc.query(sql='SELECT tileid, survey, program FROM desi_edr.exposure WHERE night = 20210115;', fmt='csv', timeout=600)
print(response)

tileid,survey,program
80715,sv1,other
80715,sv1,other
80715,sv1,other
80674,sv1,dark
80674,sv1,dark
80674,sv1,dark
80678,sv1,dark
80678,sv1,dark
80680,sv1,dark
80680,sv1,dark
80683,sv1,dark
80683,sv1,dark
80685,sv1,dark
80685,sv1,dark
80688,sv1,dark
80688,sv1,dark
80690,sv1,dark
80690,sv1,dark
80653,sv1,bright
80653,sv1,bright
80699,sv1,dark
80699,sv1,dark
80655,sv1,bright
80700,sv1,dark
80700,sv1,dark
80660,sv1,bright
80660,sv1,bright
80662,sv1,bright
80662,sv1,bright
80663,sv1,bright
80663,sv1,bright
80707,sv1,dark
80707,sv1,dark
80665,sv1,bright
80665,sv1,bright



### Select ELG Targets

Note the special way we obtain the bitwise AND operator (`desi_mask.ELG == 2**1`).

```SQL
SELECT * from desi_edr.target WHERE (desi_target & 2) != 0;
```

In [17]:
response = qc.query(sql='SELECT * FROM desi_edr.target WHERE (desi_target & 2) != 0 LIMIT 10;', fmt='csv', timeout=600)
print(response)

id,subpriority,targetid,obsconditions,priority_init,numobs_init,hpxpixel,cmx_target,desi_target,bgs_target,mws_target,sv1_desi_target,sv1_bgs_target,sv1_mws_target,sv2_desi_target,sv2_bgs_target,sv2_mws_target,sv3_desi_target,sv3_bgs_target,sv3_mws_target,scnd_target,sv1_scnd_target,sv2_scnd_target,sv3_scnd_target,random_id,tileid,photsys,program,survey
158457821059512681299414356074,0.114412116058406,39628267690397802,1,3200,2,27991,0,655458,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,98.2995,81100,S,dark,special
158457821059512681299414355923,0.528282631591801,39628267690397651,1,3100,2,27991,0,655394,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,56.4003,81100,S,dark,special
158457821059512681305047305049,0.830456333421932,39628273323346777,1,3200,2,27998,0,720931,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,64.134,81100,S,dark,special
158457821059512681305047305350,0.105351160185103,39628273323347078,1,3100,2,27998,0,655394,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,49.5637,81100,S,dark,special
158457821059512681299422744039,0.834180670680

In [None]:
[(row.targetid, row.desi_target, row.survey, row.program, row.tileid) for row in q[:10]]

#### Exercise

* How many objects in the `zpix` table have `spectype` 'GALAXY'?

### Redshift and Classification

Simple query filtering on string values. Note that the slice notation `[:20]` is equivalent to `LIMIT 20` in SQL.

```SQL
SELECT spectype, subtype, z FROM desi_edr.zpix WHERE spectype = 'STAR' AND subtype != '' LIMIT 20;
```

In [18]:
response = qc.query(sql="SELECT spectype, subtype, z FROM desi_edr.zpix WHERE spectype = 'STAR' AND subtype != '' LIMIT 20;", fmt='csv', timeout=600)
print(response)

spectype,subtype,z
STAR,K,2.50139879461232e-05
STAR,K,-8.70415119232334e-05
STAR,K,-0.000401614832495882
STAR,K,-9.3484725432616e-05
STAR,K,-0.000186847570876283
STAR,G,-0.000131834190257121
STAR,K,-7.36032417051396e-05
STAR,G,-0.000237380789420343
STAR,G,-0.000163911498938115
STAR,M,-6.71986958098689e-05
STAR,K,-0.000310954262469902
STAR,F,0.000166544303106688
STAR,WD,0.000224241838834416
STAR,G,-4.62911822587923e-05
STAR,K,-0.000221235120053661
STAR,F,-0.000521280126039622
STAR,K,-7.24315494706512e-05
STAR,K,0.000248578873712394
STAR,G,-0.000372678693179787
STAR,M,-8.89373659473731e-05



## Joining Tables

### A Simple Join

Let's look at the nights and exposures on which a particular `TARGETID` was observed.

```SQL
SELECT f.tileid, e.expid, e.night FROM desi_edr.fiberassign AS f JOIN desi_edr.exposure AS e ON f.tileid = e.tileid WHERE f.targetid = 933811403620352;
```

In [19]:
q = """SELECT f.tileid, e.expid, e.night
FROM desi_edr.fiberassign AS f
JOIN desi_edr.exposure AS e ON f.tileid = e.tileid
WHERE f.targetid = 933811403620352;"""
response = qc.query(sql=q, fmt='csv', timeout=600)
print(response)

tileid,expid,night
278,85086,20210416
279,85206,20210417
279,85207,20210417
279,85208,20210417
279,85209,20210417
280,86980,20210502
280,86981,20210502



### Another Simple Join

In this case, we'll look at photometric flux and measured redshift. We'll `ORDER`(`BY`) the results and `LIMIT` the query with slice notation.

```SQL
SELECT p.*, z.* FROM desi_edr.photometry AS p JOIN desi_edr.zpix AS z ON p.targetid = z.targetid ORDER BY z.z, p.flux_g LIMIT 50;
```

In [20]:
q = """SELECT p.*, z.*
FROM desi_edr.photometry AS p
JOIN desi_edr.zpix AS z ON p.targetid = z.targetid
ORDER BY z.z, p.flux_g LIMIT 50;"""
response = qc.query(sql=q, fmt='csv', timeout=600)
print(response)

ra,dec,elon,elat,glon,glat,ls_id,ref_id,targetid,ra_ivar,dec_ivar,dchisq_psf,dchisq_rex,dchisq_dev,dchisq_exp,dchisq_ser,ebv,flux_g,flux_r,flux_z,flux_ivar_g,flux_ivar_r,flux_ivar_z,mw_transmission_g,mw_transmission_r,mw_transmission_z,fracflux_g,fracflux_r,fracflux_z,fracmasked_g,fracmasked_r,fracmasked_z,fracin_g,fracin_r,fracin_z,psfdepth_g,psfdepth_r,psfdepth_z,galdepth_g,galdepth_r,galdepth_z,flux_w1,flux_w2,flux_w3,flux_w4,flux_ivar_w1,flux_ivar_w2,flux_ivar_w3,flux_ivar_w4,mw_transmission_w1,mw_transmission_w2,mw_transmission_w3,mw_transmission_w4,fiberflux_g,fiberflux_r,fiberflux_z,fibertotflux_g,fibertotflux_r,fibertotflux_z,ref_epoch,shape_r,shape_e1,shape_e2,shape_r_ivar,shape_e1_ivar,shape_e2_ivar,sersic,sersic_ivar,gaia_phot_g_mean_mag,gaia_phot_g_mean_flux_over_error,gaia_phot_bp_mean_mag,gaia_phot_bp_mean_flux_over_error,gaia_phot_rp_mean_mag,gaia_phot_rp_mean_flux_over_error,gaia_phot_bp_rp_excess_factor,gaia_astrometric_sigma5d_max,parallax,parallax_ivar,pmra,pmra_ivar

NameError: name 'db' is not defined

In [None]:
[(row.Photometry.flux_g, row.Photometry.flux_r, row.Photometry.flux_z, row.Zpix.z) for row in q]

In [None]:
flux_g = np.array([row.Photometry.flux_g for row in q])
flux_r = np.array([row.Photometry.flux_r for row in q])
flux_z = np.array([row.Photometry.flux_z for row in q])
g_minus_r = np.log10(flux_r/flux_g)
r_minus_z = np.log10(flux_z/flux_r)
redshift = np.array([row.Zpix.z for row in q])
fig, axes = plt.subplots(1, 1, figsize=(5, 5), dpi=100)
p = axes.plot(g_minus_r, r_minus_z, 'k.')
foo = axes.set_xlim([-0.2, 1.0])
foo = axes.set_ylim([-0.2, 1.0])
foo = axes.set_aspect('equal')
foo = axes.set_xlabel('$g - r$')
foo = axes.set_ylabel('$r - z$')

#### Exercise

* Create a color-color plot for objects targeted as QSOs, and spectroscopically confirmed as such.

### A More Complicated Join

Let's look at objects that appear on more than one tile. For each of those tiles, how many exposures where there?

In this example, we're using `sqlalchemy.sql.func` to get the equivalent of `COUNT(*)` and a subquery that itself is a multi-table join.

```SQL
SELECT t.nexp, f.tileid, q1.targetid, q1.n_assign FROM desi_edr.fiberassign AS f
    JOIN (SELECT ff.targetid, COUNT(*) AS n_assign FROM desi_edr.fiberassign AS ff GROUP BY ff.targetid) AS q1 ON f.targetid = q1.targetid
    JOIN desi_edr.tile AS t ON f.tileid = t.tileid LIMIT 100;
```

In [21]:
q = """SELECT t.nexp, f.tileid, q1.targetid, q1.n_assign
FROM desi_edr.fiberassign AS f
JOIN (SELECT ff.targetid, COUNT(*) AS n_assign FROM desi_edr.fiberassign AS ff GROUP BY ff.targetid) AS q1 ON f.targetid = q1.targetid
JOIN desi_edr.tile AS t ON f.tileid = t.tileid LIMIT 100;
"""
response = qc.query(sql=q, fmt='csv', timeout=600)
print(response)

nexp,tileid,targetid,n_assign
7,80856,6432023904256,1
7,80856,6448025174016,1
2,80875,6515536691200,2
3,80876,6515536691200,2
4,80889,6521555517440,2
2,80890,6521555517440,2
8,80885,6536638234624,2
5,80886,6536638234624,2
8,80689,6546033475584,1
3,80715,28661214347265,1
3,80715,28665861636097,1
3,80715,28665861636098,1
3,80715,28665861636099,1
3,80715,28665865830400,1
3,80715,28665865830401,1
3,80715,28665874219008,1
3,80715,28665874219009,1
3,80715,28665878413313,1
3,80715,28670500536321,1
3,80715,28670504730624,1
3,80715,28670504730626,1
3,80715,28670508924930,1
3,80715,28670508924932,1
3,80715,28670508924934,1
3,80715,28670508924935,1
3,80715,28670508924936,1
3,80715,28670508924938,1
3,80715,28670508924940,1
3,80715,28670508924941,1
3,80715,28670513119233,1
3,80715,28670513119234,1
3,80715,28670521507841,1
3,80715,28670525702148,1
3,80715,28670529896451,1
3,80715,28675126853639,1
3,80715,28675131047936,1
3,80715,28675131047938,1
3,80715,28675131047940,1
3,80715,28675131047941,1
3,80

Now let's see what the redshift table thinks are the number of exposures for these objects.

```SQL
SELECT z.* FROM fuji.zpix AS z WHERE z.targetid IN (6432023904256, 6448025174016[, ...]);
```

In [None]:
q3 = db.dbSession.query(db.Zpix).filter(db.Zpix.targetid.in_([row[2] for row in q2])).all()

In [None]:
[(row.coadd_numexp, row.zcat_nspec) for row in q3]

#### Exercise

* What is the distribution of number of exposures?

## Fly me to the Moon

How does the Moon affect redshifts?  First, let's find exposures that exposures that had the Moon above the horizon.

In [None]:
moon_up = [e.expid for e in db.dbSession.query(db.Exposure).all() if ephem.moon(e.mjd, e.tilera, e.tiledec)[1] > 0]
len(moon_up)

In [None]:
moon_up

So there are a few.  But there is a subtle issue: redshifts are based on *all* exposures, and the exposures are deliberately designed to enforce the bright/dark dichotomy in targeting. There are execptions though: certain LRGs also get targeted in the BGS & MWS, so that's not hard to capture. 

In [None]:
various_lrgs = (desi_mask.LRG | desi_mask.BGS_ANY | desi_mask.MWS_ANY)
various_lrgs

```SQL
SELECT z.targetid, z.z, z.zerr, z.zwarn
    FROM fuji.ztile AS z
    JOIN fuji.target AS t ON z.targetphotid = t.id
    JOIN fuji.fiberassign AS f ON t.targetid = f.targetid
    JOIN fuji.exposure AS e ON f.tileid == e.tileid
    WHERE z.spgrp = 'cumulative'
    AND t.desi_target & 3458764513820540929 != 0
    AND e.expid IN (90250, 87505, 87382[...]);
```

In [None]:
q_up = db.dbSession.query(db.Ztile.targetid, db.Ztile.z, db.Ztile.zerr, db.Ztile.zwarn)\
                   .join(db.Target, db.Target.id==db.Ztile.targetphotid)\
                   .join(db.Fiberassign, db.Target.targetid==db.Fiberassign.targetid)\
                   .join(db.Exposure, db.Fiberassign.tileid==db.Exposure.tileid)\
                   .filter(db.Ztile.spgrp=='cumulative')\
                   .filter(db.Target.desi_target.op('&')(various_lrgs) != 0)\
                   .filter(db.Exposure.expid.in_(moon_up)).all()

```SQL
SELECT z.targetid, z.z, z.zerr, z.zwarn
    FROM fuji.ztile AS z
    JOIN fuji.target AS t ON z.targetphotid = t.id
    JOIN fuji.fiberassign AS f ON t.targetid = f.targetid
    JOIN fuji.exposure AS e ON f.tileid == e.tileid
    WHERE z.spgrp = 'cumulative'
    AND t.desi_target & 3458764513820540929 != 0
    AND e.expid NOT IN (90250, 87505, 87382[...]);
```

In [None]:
q_dn = db.dbSession.query(db.Ztile.targetid, db.Ztile.z, db.Ztile.zerr, db.Ztile.zwarn)\
                   .join(db.Target, db.Target.id==db.Ztile.targetphotid)\
                   .join(db.Fiberassign, db.Target.targetid==db.Fiberassign.targetid)\
                   .join(db.Exposure, db.Fiberassign.tileid==db.Exposure.tileid)\
                   .filter(db.Ztile.spgrp=='cumulative')\
                   .filter(db.Target.desi_target.op('&')(various_lrgs) != 0)\
                   .filter(~db.Exposure.expid.in_(moon_up)).all()

Unfortunately however, the database currently only contains cumulative tile redshifts, not per-exposure redshifts, so it's not really meaningful to say whether the Moon was up or not. We'll just call this a work in progress.

In [None]:
q_up

In [None]:
q_dn

## Survey Progress

Let's see which nights have data, and count the number of exposures per night.

```SQL
SELECT e.night, COUNT(e.expid) AS n_exp FROM fuji.exposure AS e GROUP BY e.night ORDER BY e.night;
```

In [None]:
q = db.dbSession.query(db.Exposure.night, func.count(db.Exposure.expid).label('n_exp')).group_by(db.Exposure.night).order_by(db.Exposure.night).all()
q

Observation timestamp for a given night.  Note how we have both MJD and a corresponding `datetime.datetime` object in the database.

```SQL
SELECT e.expid, e.mjd, e.date_obs FROM fuji.exposure AS e WHERE e.night = 20210428 ORDER BY e.expid;
```

In [None]:
q = db.dbSession.query(db.Exposure.expid, db.Exposure.mjd, db.Exposure.date_obs).filter(db.Exposure.night == 20210428).order_by(db.Exposure.expid).all()
q

So, for a given target in the `target` table, when was the observation completed?  In other words, if a target has multiple observations, we want the date of the *last* observation.  First, how many targets are there?

```SQL
SELECT COUNT(*) FROM fuji.target;
```

In [None]:
N_targets = db.dbSession.query(db.Target).count()
N_targets

Now we look for targets that have observations and find the MJD of the observation.

```SQL
SELECT f.targetid, e.expid, e.mjd FROM fuji.fiberassign AS f
    JOIN (SELECT tt.targetid FROM fuji.target AS tt JOIN fuji.fiberassign AS ff ON tt.targetid = ff.targetid
              JOIN fuji.exposure AS ee ON ff.tileid = ee.tileid GROUP BY tt.targetid) AS q1 ON f.targetid = q1.targetid
    JOIN fuji.exposure AS e ON f.tileid = e.tileid ORDER BY q1.targetid, e.expid;
```

In [None]:
#
# Find all targetids that have observations.
#
q1 = db.dbSession.query(db.Target.targetid).filter(db.Target.targetid == db.Fiberassign.targetid).filter(db.Fiberassign.tileid == db.Exposure.tileid).group_by(db.Target.targetid).subquery()
#
# Find the exposure times for the targetids that have been observed
#
q2 = db.dbSession.query(db.Fiberassign.targetid, db.Exposure.expid, db.Exposure.mjd).filter(db.Fiberassign.targetid == q1.c.targetid).filter(db.Fiberassign.tileid == db.Exposure.tileid).order_by(q1.c.targetid, db.Exposure.expid).all()
targetid, expid, mjd = zip(*q2)
targetid = np.array(targetid)
expid = np.array(expid)
mjd = np.array(mjd)
#
# Use the counts to give the *last* observation.
#
unique_targetid, i, j, c = np.unique(targetid, return_index=True, return_inverse=True, return_counts=True)
unique_expid = expid[i + (c-1)]
unique_mjd = mjd[i + (c-1)]

Now we have the targets and the date of last observation.  But it's sorted by `targetid`.

In [None]:
ii = unique_expid.argsort()
unique_targetid, i3, j3, c3 = np.unique(unique_expid[ii], return_index=True, return_inverse=True, return_counts=True)
N_completed = np.cumsum(c3)

In [None]:
min_mjd = 10*(int(mjd.min())//10)
fig, axes = plt.subplots(1, 1, figsize=(8, 8), dpi=100)
p1 = axes.plot(unique_mjd[ii][i3] - min_mjd, N_completed/N_targets, 'k-')
foo = axes.set_xlabel(f'MJD - {min_mjd:d}')
foo = axes.set_ylabel('Fraction completed')
foo = axes.grid(True)
# foo = axes.legend(loc=1)

### Exercise

* Break down the progress by target class, target bit, etc.

## Using Relationships in SQLAlchemy

Here we demonstrate how table relationships can simplify certain queries. First we grab a single `db.Exposure` object.

```SQL
SELECT * FROM fuji.exposure where expid = 86507;
```

In [None]:
exposure = db.dbSession.query(db.Exposure).filter(db.Exposure.expid == 86507).one()
exposure

How do we get the `db.Frame` objects associated with this exposure?

In [None]:
exposure.frames

What tile is associated with this exposure?

In [None]:
exposure.tile

What fiberassignments were made on this tile?

In [None]:
exposure.tile.fiberassign[:20]

What redshifts were measured on this tile?

In [None]:
exposure.tile.ztile_redshifts[:20]

### Exercise

* Rewrite example queries above using relationships wherever possible.

## Using q3c in SQLAlchemy

[q3c](https://github.com/segasai/q3c) ([Koposov & Bartunov 2006](https://ui.adsabs.harvard.edu/abs/2006ASPC..351..735K/abstract)) is a popular library that provides spatial indexing and searching in astronomical databases. Here we'll demonstrate how to access this functionality in SQLAlchemy.  Any database function is accessible with `sqlalchemy.sql.func`.  This is a radial ("cone") search on an arbitrary point in the DESI footprint:

```SQL
SELECT p.*, z.*, q3c_dist(p.ra, p.dec, 180.0, 0.0) AS radial_distance
    FROM fuji.photometry AS p JOIN fuji.zpix AS z ON p.targetid = z.targetid
    WHERE q3c_radial_query(p.ra, p.dec, 180.0, 0.0, 1.0/60.0); -- 1 arcmin
```

In [None]:
q = db.dbSession.query(db.Photometry, db.Zpix, func.q3c_dist(db.Photometry.ra, db.Photometry.dec, 180.0, 0.0).label("radial_distance")).join(db.Zpix).filter(func.q3c_radial_query(db.Photometry.ra, db.Photometry.dec, 180.0, 0.0, 1.0/60.0)).all()  # 1 arcmin
q

### Exercise

* What spectra are near your favourite object?