# Introduction to DESI SV Spectra

The goal of this notebook is to demonstrate how to read in and manipulate DESI SV spectra using on-sky data. Specifically, we will use the February/March 2020 mini-SV-2 runs taken as part of DESI _commissioning_.

If you identify any errors or have requests for additional functionality please create a new issue at https://github.com/desihub/tutorials/issues or send a note to desi-data@desi.lbl.gov.

Note that this tutorial specifically deals with on-sky data from SV (or, currently, mini-SV). To learn how to work with Main Survey data look at the _Introduction to DESI Spectra_ tutorial instead. (e.g. https://github.com/desihub/tutorials/blob/master/Intro_to_DESI_spectra.ipynb).

Last updated March 2020 using DESI software release 19.12.

## Getting started

### Using NERSC

The easiest way to get started is to use the jupyter server at NERSC so that you don't need to
install any code or download any data locally.

If you need a NERSC account, see https://desi.lbl.gov/trac/wiki/Computing/AccessNersc

Then do the one-time jupyter configuration described at https://desi.lbl.gov/trac/wiki/Computing/JupyterAtNERSC

From a NERSC command line, checkout a copy of the tutorial code, *e.g.* from cori.nersc.gov
```console
mkdir -p $HOME/desi/
cd $HOME/desi/
git clone https://github.com/desihub/tutorials
```
And then go to https://jupyter.nersc.gov, login, navigate to where you checked out this package (*e.g.* `$HOME/desi/tutorials`), and double-click on `Intro_to_DESI_spectra.ipynb`.

This tutorial has been tested using the "DESI 19.12" kernel installed at NERSC.  To get an equivalent environment from a cori command line:
```console
source /global/common/software/desi/desi_environment.sh 19.12
```

## Import required modules

In [None]:
import os
import numpy as np
import healpy as hp
from glob import glob
import fitsio
from collections import defaultdict

# ADM Note that we use the commissioning targeting mask, as we're working with mini-SV data from commissioning.
from desitarget.cmx.cmx_targetmask import cmx_mask  
import desispec.io

import matplotlib.pyplot as plt
%pylab inline

If you are running locally and any of these fail, 
you should go back through the [installation instructions](https://desi.lbl.gov/trac/wiki/Pipeline/GettingStarted/Laptop) and/or email `desi-data@desi.lbl.gov` if you get stuck.
If you are running from jupyter.nersc.gov and have problems, double check that your kernel is "DESI 19.12".

## Environment variables and data

DESI uses environment variables to define the base directories for where to find data.  The below paths are for NERSC, but if you are running locally or want to access a different dataset, change these as needed to wherever your dataset is.

Spectro production runs are grouped under `$DESI_SPECTRO_REDUX`, with `$SPECPROD` indicating which run to use, such that the data are under `$DESI_SPECTRO_REDUX/$SPECPROD`.  *e.g.* during operations, official productions will be in `$DESI_SPECTRO_REDUX=/global/cfs/cdirs/desi/spectro/redux` and `$SPECPROD` would be the name for individual data assemblies, *e.g.* `$SPECPROD=DA1`.  In this case, we'll use `$SPECPROD=daily`, which corresponds to the daily reductions for mini-SV-2.

In [None]:
%set_env DESI_SPECTRO_REDUX=/global/cfs/cdirs/desi/spectro/redux
%set_env SPECPROD=daily

`desispec.io.specprod_root` can handle the environment variable path wrangling for you:

In [None]:
reduxdir = desispec.io.specprod_root()
print(reduxdir)

In [None]:
#- Do check that these are set correctly before proceeding
def check_env():
    for env in ('DESI_SPECTRO_REDUX', 'SPECPROD'):
        if env in os.environ:
            print('${}={}'.format(env, os.getenv(env)))
        else:
            print('Required environment variable {} not set!'.format(env))

    reduxdir = desispec.io.specprod_root()
    if not os.path.exists(reduxdir):
        print("ERROR: {} doesn't exist; check $DESI_SPECTRO_REDUX/$SPECPROD".format(reduxdir))
    else:
        print('OK: {} exists'.format(reduxdir))

check_env()

## Data Model for the spectra

### Directory structure

Spectra from individual exposures are in the `exposures` directory.  But since SV will focus on targeting individual _tiles_, the relevant directory and file structure is: 

```
$DESI_SPECTRO_REDUX/$SPECPROD/tiles/$TILE/$DATE/*-$SPECTROGRAPH-$TILE-$DATE.fits
```

where:

* `$TILE` is the number of the relevant SV (or mini-SV) tile. For example, for mini-SV-2, see the list of tiles on the mini-SV-2 [wiki page](https://desi.lbl.gov/trac/wiki/TargetSelectionWG/miniSV2#Fieldcenters).
* `$DATE` is the date expressed as YYYYMMDD, for example 20200229 for year=2020, month=february, day=29.
* `$SPECTROGRAPH` corresponds to the DESI spectrograph used to observe the targets (0-9).

The files we will focus on for this tutorial correspond to `$TILE=70003` and `$DATE=20200226` and `$SPECTROGRAPH=0`. For example:

```
$DESI_SPECTRO_REDUX/$SPECPROD/tiles/70003/20200226/coadd-0-70003-20200226.fits
$DESI_SPECTRO_REDUX/$SPECPROD/tiles/70003/20200226/zbest-0-70003-20200226.fits
```
where the first file contains the (coadded) spectra and the second file contains information on the best-fit redshifts from the [redrock](https://github.com/desihub/redrock) code.

Let's poke around in these directories.

In [None]:
basedir = os.path.join(os.getenv("DESI_SPECTRO_REDUX"), os.getenv("SPECPROD"), "tiles")
subdir = sorted(os.listdir(basedir))
print(basedir)
print(subdir)

In [None]:
basedir = os.path.join(basedir, subdir[0])
subdir = sorted(os.listdir(basedir))
print(basedir)
print(subdir)

In [None]:
basedir = os.path.join(basedir, subdir[2])
coaddfiles = glob(os.path.join(basedir, "*coadd*"))
zbestfiles = glob(os.path.join(basedir, "*zbest*"))
print(basedir)
print(coaddfiles)
print(zbestfiles)

### spectra file format

What about the Data Model for the coadded spectra themselves?

In [None]:
tile, date, spectrograph = "70003", "20200226", "0"
dirname = os.path.join(os.getenv("DESI_SPECTRO_REDUX"), os.getenv("SPECPROD"), "tiles", tile, date)
filename = "coadd-{}-{}-{}.fits".format(spectrograph, tile, date)
specfilename = os.path.join(dirname, filename)
DM = fitsio.FITS(specfilename)
DM

HDU 0 is blank.  The others should be used by name, not by number since the order could vary.

`FIBERMAP` stores the mapping of the imaging information used to target and place a fiber on the source.

The other HDUs contain the wavelength arrays, flux, inverse variance (ivar), mask (0 is good), and spectral resolution data coadded across each of the "B", "R", and "Z" cameras.

Let's start by looking at the fibermap.

In [None]:
fm = fitsio.read(specfilename, 'FIBERMAP')
fm.dtype.descr

`TARGETID` is the unique mapping from target information to a fiber. So, if you wanted to look up full imaging information for a spectrum, you can map back to target files using `TARGETID`.

As we are only looking at a single spectrograph this should correspond to a single petal in the DESI focal plane. I wonder if that's true?

In [None]:
plt.plot(fm["TARGET_RA"],fm["TARGET_DEC"],'b.')

This certainly looks like one petal to me.  Let's repeat, color coding by spectrograph number.

In [None]:
# ADM as of mini-SV-2 we only have spectrographs 0, 3, 6, 7, 9.
for spectrograph in "0", "3", "6", "7", "9":
    filename = "coadd-{}-{}-{}.fits".format(spectrograph, tile, date)
    specfilename = os.path.join(dirname, filename)
    DM = fitsio.FITS(specfilename)
    fm = fitsio.read(specfilename, 'FIBERMAP')
    plt.plot(fm["TARGET_RA"],fm["TARGET_DEC"], '.')

Note that in addition to having multiple tiles, we also have multiple exposures of the same tile resulting in multiple spectra of the same targets.

In [None]:
DM

The remaining extensions store the wavelength, flux, inverse variance on the flux, mask and resolution matrix coadded for the B, R and Z arms of the spectrograph. Let's check that the full wavelength coverage across all 3 arms of each of the DESI spectrographs is the same:

In [None]:
for spectrograph in "9", "7", "6", "3", "0":
    filename = "coadd-{}-{}-{}.fits".format(spectrograph, tile, date)
    specfilename = os.path.join(dirname, filename)
    wave = fitsio.read(specfilename, 'BRZ_WAVELENGTH')
    print("wavelength coverage of spectrograph {}: {:.1f} to {:.1f} Angstroms".format(spectrograph, np.min(wave), np.max(wave)))

## Reading in and Displaying spectra

Now that we understand the Data Model, let's plot some spectra. To start, let's use the file we've already been manipulating (for spectrograph 0) and read in the flux to go with the wavelengths we already have.

In [None]:
flux = fitsio.read(specfilename,'BRZ_FLUX')

Note that the wavelength arrays are 1-D (every spectrum in the spectral file is mapped to the same binning in wavelength) but the flux array (and flux_ivar, mask etc. arrays) are 2-D, because they contain multiple spectra:

In [None]:
print(wave.shape)
print(flux.shape)

Let's plot one of the spectra from this file:

In [None]:
spectrum = 23
# ADM make the figure 20-by-5 in size.
plt.figure(figsize=(20, 5))
# ADM some reasonable plot limits.
xmin, xmax, ymin, ymax = np.min(wave), np.max(wave), np.min(flux[spectrum][0:100]), np.max(flux[spectrum][0:100])
plt.axis([xmin, xmax, ymin, ymax])
plt.plot(wave, flux[spectrum], 'b-', alpha=0.5)

## A DESI-specific spectrum reader

Note that, for illustrative purposes, we discussed the Data Model in detail and read in the required files individually from that Data Model. But, the DESI data team has also developed standalone functions in `desispec.io` to facilitate reading in the plethora of information in the spectral files. For example:

In [None]:
specobj = desispec.io.read_spectra(specfilename)

The wavelengths and flux in each band are then available as dictionaries in the `wave` and `flux` attributes:

In [None]:
specobj.wave

In [None]:
specobj.flux

So, to plot the (zeroth-indexed) 24th spectrum:

In [None]:
spectrum = 23
plt.figure(figsize=(20, 5))
plt.axis([xmin, xmax, ymin, ymax])
plt.plot(specobj.wave["brz"], specobj.flux["brz"][spectrum], 'b-', alpha=0.5)

which should look very similar to one of the first plots we made earlier in the tutorial. 

The fibermap information is available as a table in the `fibermap` attribute:

In [None]:
specobj.fibermap

In [None]:
specobj.target_ids()

There are also functions for getting the number of spectra and selecting a subset of spectra.  All of the information that could be read in from the different extensions of the spectral file can be retrieved from the `specobj` object. Here's what's available:

In [None]:
dir(specobj)

## Target classes

What about if we only want to plot spectra of certain target classes? For mini-SV-2 (which is part of DESI _commissioning_) the targeting information is stored in the `CMX_TARGET` entries of the fibermap array:

In [None]:
specobj.fibermap["CMX_TARGET"].info

and which target corresponds to which targeting bit is stored in the commisioning (cmx) mask (we imported this near the beginning of the notebook).

In [None]:
cmx_mask

Let's find the indexes of all standard stars in the spectral file:

In [None]:
stds = np.where(specobj.fibermap["CMX_TARGET"] & cmx_mask.mask("STD_FAINT|STD_BRIGHT|SV0_STD_FAINT|SV0_STD_BRIGHT"))[0]
print(stds)

Where were these located on the original plate-fiber mapping?

In [None]:
fm = specobj.fibermap   #- shorthand
plt.plot(fm["TARGET_RA"],fm["TARGET_DEC"],'b.', alpha=0.1)
plt.plot(fm["TARGET_RA"][stds],fm["TARGET_DEC"][stds],'kx')

Let's take a look at the spectra of the first 9 of these standard stars.

In [None]:
print()
figure(figsize=(12, 9))
for panel, std in enumerate(stds[:9]):
    subplot(3, 3, panel+1)
    plt.plot(specobj.wave['brz'], specobj.flux["brz"][std], 'b-', alpha=0.5)

These seem star-like. Let's zoom in on some of the Balmer series for the zeroth standard:

In [None]:
Balmer = [4102, 4341, 4861]
halfwindow = 50
figure(figsize=(4*len(Balmer), 3))
for i in range(len(Balmer)):
    subplot(1, len(Balmer), i+1)
    plt.axis([Balmer[i]-halfwindow, Balmer[i]+halfwindow, 0, np.max(flux[stds[0]])])
    plt.plot(wave, flux[stds[0]])
    # plt.show()

## Redshifts

The directory from which we took these spectra also contains information on the best-fit redshifts for the spectra from the [redrock](https://github.com/desihub/redrock) code.

In [None]:
zfilename = specfilename.replace('coadd', 'zbest')
zs = fitsio.read(zfilename)
zs.dtype.descr

As a sanity check, let's ensure that there are the same number of redshifts, targets, and spectra in the files. This may not be so in the DESI _Main Survey_, where there might be repeat observations.

In [None]:
print(zs.shape[0], 'redshifts')
print(specobj.num_targets(), 'targets')
print(specobj.num_spectra(), 'spectra')
print(specobj.flux['brz'].shape, 'shape of flux["brz"]')

Seems logical: 5000 DESI fibers, 10 petals, so 500 entries per petal.

The `TARGETID` (which is intended to be unique for each source) is useful for mapping source spectra to redshift. Let's extract all sources that were targeted as SV-like quasars in mini-SV-2 (the bit-name `SV0_QSO`; not to be confused with the Main-Survey-like quasars that were targeted as `MINI_SV_QSO`) using the fibermap information from the spectral file, and plot the first 20.

In [None]:
qsos = np.where(specobj.fibermap["CMX_TARGET"] & cmx_mask["SV0_QSO"])[0]
print(len(qsos), 'QSOs')
plt.figure(figsize=(25,15))
xmin, xmax = np.min(wave), np.max(wave)
for i in range(len(qsos))[0:9]:
    plt.subplot(3,3,i+1)
    ymin, ymax = np.min(flux[qsos[i]][30:50]), np.max(flux[qsos[i]][0:50])
    plt.axis([xmin, xmax, ymin, ymax])
    plt.plot(wave, flux[qsos[i]],'b', alpha=0.5)
    # plt.show()

I definitely see some broad emission lines! Let's match these quasar targets to the redshift file on `TARGETID` to extract their best-fit redshifts from `redrock`:

In [None]:
dd = defaultdict(list)
for index, item in enumerate(zs["TARGETID"]):
    dd[item].append(index)
zqsos = [index for item in fm[qsos]["TARGETID"] for index in dd[item] if item in dd]

That might be hard to follow at first glance, but all I did was use some "standard" python syntax to match the indices in `zs` (the ordering of objects in the `redrock` redshift file) to those for quasars in `fm` (the ordering of quasars in the fibermap file), on the unique `TARGETID`, such that the indices stored in `qsos` for `fm` point to the corresponding indices in `zqsos` for `zs`. This might help illustrate the result:

In [None]:
zs[zqsos]["TARGETID"][0:7], np.array(fm[qsos]["TARGETID"][0:7])

Let's see what best-fit template `redrock` assigned to each quasar target. This information is stored in the `SPECTYPE` column.

In [None]:
zs[zqsos]["SPECTYPE"]

Or for standard stars:

In [None]:
dd = defaultdict(list)
for index, item in enumerate(zs["TARGETID"]):
    dd[item].append(index)
zstds = [index for item in fm[stds]["TARGETID"] for index in dd[item] if item in dd]

For stars, we can also display the type of star that `redrock` fit (this is stored in the `SUBTYPE` column):

In [None]:
zipper = zip(zs[zstds]["SUBTYPE"][10:15], zs[zstds]["SPECTYPE"][10:15])
for sub, spec in zipper:
    print("{}-{}".format(sub.decode('utf-8'),spec.decode('utf-8')))

Here, I just picked 5 correctly identified stars as an example. Note that the conversion to `utf-8` is simply for display purposes because the strings in `SUBTYPE` and `SPECTYPE` are stored as bytes instead of unicode.

OK, back to our quasars. Let's plot the quasar targets that *are identified as quasars* , but add a label for the `SPECTYPE` and the redshift fit by `redrock`. I'll also add some median filtering and over-plot some (approximate) typical quasar emission lines at the redrock redshift (if those lines would fall in the DESI wavelength coverage):

In [None]:
from scipy.signal import medfilt

# ADM we'll clip to z < 5, as redrock can misidentify low S/N sources as very-high-z quasars.
qsoid = np.where( (zs[zqsos]["SPECTYPE"] == b'QSO') & (zs[zqsos]["Z"] < 5) )[0]
qsolines = np.array([1216, 1546, 1906, 2800, 4853, 4960, 5008])

wave = specobj.wave["brz"]
flux = specobj.flux["brz"]
plt.figure(figsize=(25, 15))
for i in range(9):
    plt.subplot(3,3,1+i)
    spectype = zs[zqsos[qsoid[i]]]["SPECTYPE"].decode('utf-8')
    z = zs[zqsos[qsoid[i]]]["Z"]
    plt.plot(wave, medfilt(flux[qsos[qsoid[i]]], 15), 'b', alpha=0.5)
    plt.title("{}, z={:.3f}".format(spectype,z))
    for line in qsolines:
        if ((1+z)*line > np.min(wave)) & ((1+z)*line < np.max(wave)):
            axvline((1+z)*line, color='y', alpha=0.5)

## Appendix: code versions used

In [None]:
from desitutorials import print_code_versions as pcv
print("This tutorial last ran successfully to completion using the following versions of the following modules:") 
pcv()