# Introduction to DESI Spectra

The goal of this notebook is to demonstrate how to read in and manipulate DESI spectra using simulated spectra created as part of a DESI Data Challenge.

If you identify any errors or have requests for additional functionality please create a new issue on https://github.com/desihub/tutorials/issues or send a note to desi-data@desi.lbl.gov.

Last updated October 2019 using DESI software release 19.9.

## Getting started

### Using NERSC

The easiest way to get started is to use the jupyter server at NERSC so that you don't need to
install any code or download any data locally.

If you need a NERSC account, see https://desi.lbl.gov/trac/wiki/Computing/AccessNersc

Then do the one-time jupyter-dev configuration described at https://desi.lbl.gov/trac/wiki/Computing/JupyterAtNERSC

From a NERSC command line, checkout a copy of the tutorial code, *e.g.* from cori.nersc.gov
```console
mkdir -p $HOME/desi/
cd $HOME/desi/
git clone https://github.com/desihub/tutorials
```
And then go to https://jupyter-dev.nersc.gov, login, navigate to where you checked out this package (*e.g.* `$HOME/desi/tutorials`), and double-click on `Intro_to_DESI_spectra.ipynb`.

This tutorial has been tested using the "DESI 19.2" kernel installed at NERSC.  To get an equivalent environment from a cori command line:
```console
source /global/common/software/desi/desi_environment.sh 19.9
```

## Import required modules

In [None]:
import os
import numpy as np
import healpy as hp
from glob import glob
import fitsio
from collections import defaultdict

from desitarget.targetmask import desi_mask
import desispec.io

import matplotlib.pyplot as plt
%pylab inline

If you are running locally and any of these fail, 
you should go back through the [installation instructions]( https://desi.lbl.gov/trac/wiki/Pipeline/GettingStarted/Laptop) and/or email desi-data@desi.lbl.gov if you get stuck.
If you are running from jupyter-dev and have problems, double check that your kernel is "DESI 19.2".

## Environment variables and data

Like BOSS, DESI uses environment variables to define the base directories for where to find data.  The below paths are for NERSC, but if you are running locally or want to access a different dataset, change these as needed to wherever your dataset is.

Spectro production runs are grouped under `$DESI_SPECTRO_REDUX`, with `$SPECPROD` indicating which run to use, such that the data are under `$DESI_SPECTRO_REDUX/$SPECPROD`.  *e.g.* during operations, official productions will be in `$DESI_SPECTRO_REDUX=/project/projectdirs/desi/spectro/redux` and `$SPECPROD` would be the name for individual data assemblies, *e.g.* `$SPECPROD=DA1`.  In this case, we'll use reference run 19.2 data.

In [None]:
%set_env DESI_SPECTRO_REDUX=/project/projectdirs/desi/datachallenge/reference_runs/19.9/spectro/redux/
%set_env SPECPROD=mini

`desispec.io.specprod_root` can handle the environment variable path wrangling for you:

In [None]:
reduxdir = desispec.io.specprod_root()
print(reduxdir)

In [None]:
#- Do check that these are set correctly before proceeding
def check_env():
    for env in ('DESI_SPECTRO_REDUX', 'SPECPROD'):
        if env in os.environ:
            print('${}={}'.format(env, os.getenv(env)))
        else:
            print('Required environment variable {} not set!'.format(env))

    reduxdir = desispec.io.specprod_root()
    if not os.path.exists(reduxdir):
        print("ERROR: {} doesn't exist; check $DESI_SPECTRO_REDUX/$SPECPROD".format(reduxdir))
    else:
        print('OK: {} exists'.format(reduxdir))

check_env()

## Data Model for the spectra

### Directory structure

Spectra from individual exposures are in the `exposures` directory.  But since DESI will take multiple exposures of overlapping tiles, the data from any given target or patch of sky could be spread across multiple files in multiple directories.  To simplify this, the calibrated spectra are repackaged into a `spectra-64` directory, where all spectra for a given healpix on the sky are grouped together.  See an appendix to this tutorial for a quick overview of healpix.

The directory structure is: 

```
$DESI_SPECTRO_REDUX/$SPECPROD/spectra-{nside}/{group}/{pix}/*-{nside}-{pix}.fits
```

where
  * `nside` is the healpix nside hierarchy level with a default is
    nside=64 corresponding to pixels of 0.84 $deg^2$ with a few thousand targets each.
  * `group = nside//100` to avoid having thousands of directories at the same level
  * `pix` is the healpixel number using the *nested* scheme.

For example for `nside=64` and `pixel=16879`:

```
$DESI_SPECTRO_REDUX/$SPECTRO/spectra-64/168/16879/spectra-64-16879.fits
$DESI_SPECTRO_REDUX/$SPECTRO/spectra-64/168/16879/zbest-64-16879.fits
```

where the first file contains the spectra and the second file contains information on the best-fit redshifts from the [redrock](https://github.com/desihub/redrock) code.

Let's poke around in these directories.

In [None]:
basedir = os.path.join(os.getenv("DESI_SPECTRO_REDUX"),os.getenv("SPECPROD"),"spectra-64")
subdir = os.listdir(basedir)
print(basedir)
print(subdir)

In [None]:
basedir = os.path.join(basedir,subdir[0])
subdir = os.listdir(basedir)
pixnums = np.array([int(pixnum) for pixnum in subdir])
print(basedir)
print(subdir)

In [None]:
basedir = os.path.join(basedir,subdir[0])
subdir = os.listdir(basedir)
print(basedir)
print(subdir)

`desispec.io.findfile` provides utility functions for the path wrangling, *e.g.*:

In [None]:
desispec.io.findfile('spectra', groupname=5302)

### spectra file format

What about the Data Model for the spectra themselves?

In [None]:
specfiles = sorted(glob(reduxdir+'/spectra-64/*/*/spectra*.fits'))
specfilename = specfiles[2]
DM = fitsio.FITS(specfilename)
DM

HDU 0 is blank.  The others should be used by name, not by number since the order could vary.

`FIBERMAP` stores the mapping of the imaging information used to target and place a fiber on the source.

The other HDUs contain the wavelength arrays, flux, inverse variance (ivar), mask (0 is good), and spectral resolution data for each of the "B", "R", and "Z" cameras.

Let's start by looking at the fibermap.

In [None]:
fm = fitsio.read(specfilename,'FIBERMAP')
fm.dtype.descr

`TARGETID` is the unique mapping from target information to a fiber. So, if you wanted to look up full imaging information for a spectrum, you can map back to target files using `TARGETID`.

Just out of interest, are the RAs and Decs of these objects in the expected HEALPix pixel?

In [None]:
pixnums = hp.ang2pix(64, fm["TARGET_RA"], fm["TARGET_DEC"], nest=True, lonlat=True)
print(np.min(pixnums),np.max(pixnums))
print(specfilename)

I wonder what (roughly) the entirety of this pixel looks like, as mapped out by sources with spectra:

In [None]:
plt.plot(fm["TARGET_RA"],fm["TARGET_DEC"],'b.')

You can see a different density in different parts of the tiles, due to different overlapping exposures.  Let's repeat, color coding by exposure number.

In [None]:
for expid in set(fm['EXPID']):
    ii = (fm['EXPID'] == expid)
    print('expid {} includes {} targets'.format(expid, np.count_nonzero(ii)))
    plot(fm['TARGET_RA'][ii], fm['TARGET_DEC'][ii], '.')

Note that in addition to having multiple tiles, we also have multiple exposures of the same tile resulting in multiple spectra of the same targets.

In [None]:
DM

The remaining extensions store the wavelength, flux, inverse variance on the flux, mask and resolution matrix for the B, R and Z arms of the spectrograph. Let's determine the wavelength coverage of each spectrograph:

In [None]:
bwave = fitsio.read(specfilename, 'B_WAVELENGTH')
rwave = fitsio.read(specfilename, 'R_WAVELENGTH')
zwave = fitsio.read(specfilename, 'Z_WAVELENGTH')
print("B coverage: {:.1f} to {:.1f} Angstroms".format(np.min(bwave),np.max(bwave)))
print("R coverage: {:.1f} to {:.1f} Angstroms".format(np.min(rwave),np.max(rwave)))
print("Z coverage: {:.1f} to {:.1f} Angstroms".format(np.min(zwave),np.max(zwave)))

## Reading in and Displaying spectra

Now that we understand the Data Model, let's plot some spectra. To start, let's use the file we've already been manipulating and read in the flux to go with the wavelengths we already have.

In [None]:
bflux = fitsio.read(specfilename,'B_FLUX')
rflux = fitsio.read(specfilename,'R_FLUX')
zflux = fitsio.read(specfilename,'Z_FLUX')

Note that the wavelength arrays are 1-D (every spectrum in the spectral file is mapped to the same binning in wavelength) but the flux array (and flux_ivar, mask etc. arrays) are 2-D, because they contain multiple spectra:

In [None]:
print(bwave.shape)
print(bflux.shape)

Let's plot the zeroth spectrum in this file (*i.e.* in this HEALPix grouping):

In [None]:
spectrum = 0
plt.plot(bwave,bflux[spectrum], 'b', alpha=0.5)
plt.plot(rwave,rflux[spectrum], 'r', alpha=0.5)
plt.plot(zwave,zflux[spectrum], 'k', alpha=0.5)

## A DESI-specific spectrum reader

Note that, for illustrative purposes, we discussed the Data Model in detail and read in the required files individually from that Data Model. But, the DESI data team has also developed standalone functions in `desispec.io` to facilitate reading in the plethora of information in the spectral files. For example:

In [None]:
specobj = desispec.io.read_spectra(specfilename)

The wavelengths and flux in each band are then available as dictionaries in the `wave` and `flux` attributes:

In [None]:
specobj.wave

In [None]:
specobj.flux

So, to plot the "zeroth" spectrum:

In [None]:
spectrum = 0
plt.plot(specobj.wave["b"],specobj.flux["b"][spectrum],color='b', alpha=0.5)
plt.plot(specobj.wave["r"],specobj.flux["r"][spectrum],color='r', alpha=0.5)
plt.plot(specobj.wave["z"],specobj.flux["z"][spectrum],color='k', alpha=0.5)

which should look very similar to one of the first plots we made earlier in the tutorial. 

The fibermap information is available as a table in the `fibermap` attribute:

In [None]:
specobj.fibermap

The entries with `TARGETID`=-1 are spectra for which the fiber was not assigned to a target, e.g. because that fiber was broken or because it randomly didn't cover any input targets (this latter case should never happen in the real survey).  There can also be multiple spectra for a single `TARGETID` from multiple exposures so there is a utility function for getting the `TARGETID`s in this file:

In [None]:
specobj.target_ids()

There are also functions for getting the number of spectra and selecting a subset of spectra.  All of the information that could be read in from the different extensions of the spectral file can be retrieved from the `specobj` object. Here's what's available:

In [None]:
dir(specobj)

## Target classes

What about if we only want to plot spectra of certain target classes? The targeting information is stored in the `DESI_TARGET`, `BGS_TARGET` and `MWS_TARGET` entries of the fibermap array:

In [None]:
specobj.fibermap.info

and which target corresponds to which targeting bit is stored in the desitarget mask (we imported this near the beginning of the notebook).

In [None]:
desi_mask

Let's find the indexes of all standard F-stars in the spectral file:

In [None]:
stds = np.where(specobj.fibermap["DESI_TARGET"] & desi_mask.mask("STD_FAINT|STD_BRIGHT"))[0]
print(stds)

Where were these located on the original plate-fiber mapping?

In [None]:
fm = specobj.fibermap   #- shorthand
plt.plot(fm["TARGET_RA"],fm["TARGET_DEC"],'b.', alpha=0.1)
plt.plot(fm["TARGET_RA"][stds],fm["TARGET_DEC"][stds],'kx')

Recall that there can be (will be!) more than one spectrum per object

In [None]:
num_standard_stars = len(set(specobj.fibermap['TARGETID'][stds]))
num_stdstar_exposures = np.count_nonzero(stds)
print('{} exposures of {} standards'.format(num_stdstar_exposures, num_standard_stars))

Let's take a look at the spectra of these standard stars, plotting just the first spectrum from each camera for the first 9 standards.

In [None]:
targetids = list(set(specobj.fibermap['TARGETID'][stds]))
figure(figsize=(12,9))
for i, tx in enumerate(targetids[0:9]):
    subplot(3,3,i+1)
    sp = specobj.select(targets=[tx,])
    plt.plot(sp.wave['b'], sp.flux['b'][0], 'b-', alpha=0.5)
    plt.plot(sp.wave['r'], sp.flux['r'][0], 'r-', alpha=0.5)
    plt.plot(sp.wave['z'], sp.flux['z'][0], 'k-', alpha=0.5)
    # plt.show()

These seem realistic. Let's zoom in on some of the Balmer series for the zeroth standard:

In [None]:
Balmer = [4102,4341,4861]
halfwindow = 50
figure(figsize=(4*len(Balmer), 3))
for i in range(len(Balmer)):
    subplot(1,len(Balmer),i+1)
    plt.axis([Balmer[i]-halfwindow,Balmer[i]+halfwindow,0,np.max(bflux[stds[0]])])
    plt.plot(bwave,bflux[stds[0]])
    # plt.show()

## Redshifts

The directory from which we took these spectra also contains information on the best-fit redshifts for the spectra from the [redrock](https://github.com/desihub/redrock) code. The first pixel we looked at didn't have very many targets, so we'll pick one with more targets for studying the redshifts.

In [None]:
zfilename = specfilename.replace('spectra-64-', 'zbest-64-')
zs = fitsio.read(zfilename)
zs.dtype.descr

Note that due to repeated observations, there could be a different number of spectra than final redshifts, meaning that there isn't a row-by-row mapping between spectra and redshifts...

In [None]:
print(zs.shape[0], 'redshifts')
print(specobj.num_targets(), 'targets')
print(specobj.num_spectra(), 'spectra')
print(specobj.flux['b'].shape, 'shape of flux["b"]')

...but the `TARGETID` (which *is* intended to be unique) is in this file, too, allowing sources to be uniquely mapped from targeting, to spectra, to redshift. Let's extract all sources that were targeted as quasars using the fibermap information from the spectral file, and plot the first 20:

In [None]:
qsos = np.where(specobj.fibermap["DESI_TARGET"] & desi_mask["QSO"])[0]
print(len(qsos), 'QSOs')
plt.figure(figsize=(12,9))
for i in range(len(qsos))[0:9]:
    plt.subplot(3,3,i+1)
    plt.plot(bwave,bflux[qsos[i]],'b', alpha=0.5)
    plt.plot(rwave,rflux[qsos[i]],'r', alpha=0.5)
    plt.plot(zwave,zflux[qsos[i]],'k', alpha=0.5)
    # plt.show()

Let's match these quasar targets to the redshift file on `TARGETID` to extract their best-fit redshifts from `redrock`:

In [None]:
dd = defaultdict(list)
for index, item in enumerate(zs["TARGETID"]):
    dd[item].append(index)
zqsos = [index for item in fm[qsos]["TARGETID"] for index in dd[item] if item in dd]

That might be hard to follow at first glance, but all I did was use some "standard" python syntax to match the indices in `zs` (the ordering of objects in the `redrock` redshift file) to those for quasars in `fm` (the ordering of quasars in the fibermap file), on the unique `TARGETID`, such that the indices stored in `qsos` for `fm` point to the corresponding indices in `zqsos` for `zs`. This might help illustrate the result:

In [None]:
zs[zqsos]["TARGETID"][0:7], fm[qsos]["TARGETID"][0:7]

Let's see what best-fit template `redrock` assigned to each quasar. This information is stored in the `SPECTYPE` column.

In [None]:
zs[zqsos]["SPECTYPE"]

Or for standard stars:

In [None]:
dd = defaultdict(list)
for index, item in enumerate(zs["TARGETID"]):
    dd[item].append(index)
zstds = [index for item in fm[stds]["TARGETID"] for index in dd[item] if item in dd]

For stars, we can also display the type of star that `redrock` fit (this is stored in the `SUBTYPE` column):

In [None]:
zipper = zip(zs[zstds]["SUBTYPE"],zs[zstds]["SPECTYPE"])
for sub, spec in zipper:
    print("{}-{}".format(sub.decode('utf-8'),spec.decode('utf-8')))

(here the conversion to `utf-8` is simply for display purposes because the strings in `SUBTYPE` and `SPECTYPE` are stored as bytes instead of unicode).

OK, back to our quasars. Let's plot the quasar targets that *are identified as quasars* , but add a label for the `SPECTYPE` and the redshift fit by `redrock`. I'll also add some median filtering and over-plot some (approximate) typical quasar emission lines at the redrock redshift (if those lines would fall in the DESI wavelength coverage):

In [None]:
from scipy.signal import medfilt

qsoid = np.where(zs[zqsos]["SPECTYPE"] == b'QSO')[0]
qsolines = np.array([1216,1546,1906,2800,4853,4960,5008])

wave = specobj.wave
flux = specobj.flux

plt.figure(figsize=(12,9))
for i in range(len(qsoid))[0:9]:
    plt.subplot(3,3,1+i)
    spectype = zs[zqsos[qsoid[i]]]["SPECTYPE"].decode('utf-8')
    z = zs[zqsos[qsoid[i]]]["Z"]
    plt.plot(wave['b'], medfilt(flux['b'][qsos[qsoid[i]]], 15), 'b', alpha=0.5)
    plt.plot(wave['r'], medfilt(flux['r'][qsos[qsoid[i]]], 15), 'r', alpha=0.5)
    plt.plot(wave['z'], medfilt(flux['z'][qsos[qsoid[i]]], 15), 'k', alpha=0.5)
    plt.title("{}, z={:.3f}".format(spectype,z))
    for line in qsolines:
        if ((1+z)*line > np.min(bwave)) & ((1+z)*line < np.max(zwave)):
            axvline((1+z)*line, color='y', alpha=0.5)

## Appendix: code versions used

In [None]:
from desitutorials import print_code_versions as pcv
print("This tutorial last ran successfully to completion using the following versions of the following modules:") 
pcv()

## Appendix: Healpix overview

DESI uses healpix for grouping spectra on the sky.  It is somewhat overkill for our needs, but it has a nice nested pixel structure and has good fast libraries for common operations like determining which pixels cover which points on the sky.

If you aren't familiar with HEALPix, it is an equal-area splitting of the sphere, where the sphere is initially divided into 12 equal-area pixels, and then each of those pixels is divided into 4 new equal-area pixels as `nside` increases (a quad tree). Schematically, here's how `nside` corresponds to pixel *area* in degrees:

In [None]:
sphere_area = 4*180.*180./np.pi
for i in range(10):
    nside = 2**i
    npix = 12*nside**2
    hpx_area = sphere_area / npix
    print(nside, npix, hpx_area)

The `nside` at which the example spectra are grouped therefore corresponds to ~0.84 sq. deg. Note that I could have checked this more easily (but less pedagogically) using the useful python [HEALPix](https://healpy.readthedocs.io/en/latest/) library:

In [None]:
hp.nside2pixarea(64,degrees=True)

The spectra are stored in this fashion so that they are grouped (roughly) contiguously on the sky, with a reasonable number of spectra in each directory. It's easy to derive the approximate RA/Dec near each pixel number (note that we sneakily stored the pixel numbers as `pixnums` when we were examining the directory structure):

In [None]:
ras, decs = hp.pix2ang(64, pixnums, nest=True, lonlat=True)

Note that **the DESI Data Model will always use the _NESTED_ scheme for HEALPix**.

In [None]:
zipper = list(zip(pixnums,ras,decs))
for pix,ra,dec in zipper[0:10]:
    print("Pixel(nside=64): {} RA: {} DEC: {}".format(pix,ra,dec))
if len(zipper) > 10:
    print('...')