# Introduction to Survey Simulations

The goal of this notebook is to introduce you to the outputs available from DESI "survey simulations". These are the fastest type of DESI simulation to run and only involve the following components:
 - Simulated stochastic weather (seeing, transparency, ...). See [DESI-3087](https://desi.lbl.gov/DocDB/cgi-bin/private/ShowDocument?docid=3087) for details.
 - Afternoon planning, which updates tile observing priorities and schedules fiber assignment.
 - Next tile selector, which determines which tile to observe next, based on recent progress and current weather.
 
The software for these components is mostly in the [desisurvey](https://desisurvey.readthedocs.io) and [surveysim](https://surveysim.readthedocs.io) packages.  Note that survey simulations operate at the level of tiles, not targets, and never generate spectra or redshifts and do not refer to any input catalog.  For a recent overview of the different DESI simluation types, see [DESI-3377](https://desi.lbl.gov/DocDB/cgi-bin/private/ShowDocument?docid=3377).

This tutorial focuses on the using the outputs of a survey simulation. After working with the outputs from some existing simulations, you might want to run your own survey simulations: that tutorial is [here](http://surveysim.readthedocs.io/en/latest/tutorial.html).  For other tutorials, covering topics such as simulating your own DESI spectra, see [this list](https://github.com/desihub/tutorials/blob/master/README.md).

For general questions and suggestions on this tutorial, email desi-data@desi.lbl.gov. For more specific suggestions or bug reports, please [create a github issue](https://github.com/desihub/tutorials/issues).

## Getting Started

This notebook is optimized for use with the jupyter-dev service at NERSC, which provides pre-installed DESI software running in a jupyter notebook. If this is your first time using jupyter-dev at NERSC, follow [these instructions](https://desi.lbl.gov/trac/wiki/Computing/JupyterAtNERSC) to get it configured.

If you prefer to work on your laptop, you will need to [install the necessary DESI software locally](https://desi.lbl.gov/trac/wiki/Pipeline/GettingStarted/Laptop).

**If you are working through this notebook in a live jupyter session, I recommend removing all the output below for a more interactive experience.** Use the "Cell > Current Outputs > Clear" menu item.

**There are several exercises below for you to work on once you master the basics.**

#### DESI Version Compatibility

- 2017-12-04 : tested using the `DESI master` kernel on jupyter-dev with the `surveysim2017/depth_0m/` outputs.
- 2018-03-30 : tested using the `DESI 18.3` kernel on jupyter-dev with the `surveysim2017/depth_0m/` outputs (which were generated with an earlier version of the code).
- 2018-07-20 : tested using the `DESI 18.6` kernel on jupyter-dev with the `surveysim2017/depth_0m/` outputs (which were generated with an earlier version of the code).
- 2018-10-15 : tested using the `DESI 18.7` kernel on jupyter-dev with the `surveysim2017/depth_0m/` outputs (which were generated with an earlier version of the code).
- 2019-03-22 : tested using the `DESI 18.11` kernel.  It currently does *not* work with the `18.12` or `19.2` kernels.
- 2019-07-01 : updated to use the `DESI 19.2` kernel.
- 2019-10-30 : updated to use the `DESI 19.9` kernel.

### Load Modules

Import numpy and matplotlib and draw plots directly to the notebook:

In [None]:
%pylab inline

Import the `desisurvey` modules we need below:

In [None]:
import desisurvey.utils
import desisurvey.plots
from astropy.io import fits
import numpy
import desimodel
import surveysim.stats

Ignore expected harmless warnings (or don't run these lines if you prefer to see them):

In [None]:
import warnings, matplotlib.cbook, astropy._erfa.core
warnings.filterwarnings('ignore', category=matplotlib.cbook.mplDeprecation)
warnings.filterwarnings('ignore', category=astropy._erfa.core.ErfaWarning)

### Find Simulation Outputs

Identify which survey simulation you want to study by setting the `$DESISURVEY_OUTPUT` environment variable.

Here we look at the first of one hundred different realizations of the baseline survey as part of the surveysim 2018 data challenge.

Note that `$DESISURVEY_OUTPUT` is only read the first time you use a `desisurvey` function, so the easiest way to make a change below take effect is to restart the jupyter kernel and re-run the initial cells.

In [None]:
import os
os.environ['DESISURVEY_OUTPUT'] = '/global/projecta/projectdirs/desi/datachallenge/surveysim2018/weather/000'

## Survey Simulation Outputs

The outputs from a survey simulation are two FITS files, one organized by **exposure** and **tile** (exposures.fits, exposures & tiledata HDUs), and the other organized by **night** (stats.fits).  Tiles are predefined ([DESI-717](https://desi.lbl.gov/DocDB/cgi-bin/private/ShowDocument?docid=717)) to cover the whole survey footprint in 8 dithered passes. Each tile is observed with one or more exposures.  Multiple exposures of a tile are sometimes required to:
 - Split a long exposure to minimize the impact of cosmic rays.
 - Continue an exposure that is terminated early due to a program change (or dawn).
 - Continue an exposure that is found to have insufficient signal to noise after pipeline processing.

After setting `$DESISURVEY_OUTPUT`, look at the corresponding files using:

In [None]:
os.listdir(os.environ['DESISURVEY_OUTPUT'])

The `ephem` and `surveyinit` files contain the ephemerides for the DESI survey duration and the initial LST assignments, and will not be considered further here.

In [None]:
exposures = fits.getdata(os.path.join(os.environ['DESISURVEY_OUTPUT'], 'exposures.fits'), 'exposures')
tilestats = fits.getdata(os.path.join(os.environ['DESISURVEY_OUTPUT'], 'exposures.fits'), 'tiledata')

In [None]:
print('Survey runs {} to {} and observes {} tiles with {} exposures.'
      .format(
          desisurvey.utils.get_date(numpy.min(exposures['mjd'])),
          desisurvey.utils.get_date(numpy.max(exposures['mjd'])), numpy.sum(tilestats['snr2frac'] >= 1), len(exposures)))

Note that progress uses MJD timestamps internally, which can be converted to dates using [`desisurvey.utils.get_date()`](http://desisurvey.readthedocs.io/en/latest/api.html?highlight=get_date#desisurvey.utils.get_date).

The exposures HDU has one record per exposure, and the tiles HDU has one record per tile.

In [None]:
print(repr(exposures[:3]))
print(repr(tilestats[:3]))

The exposures HDU tracks quantities like the MJDs on which the exposures were observed, the tiles which they observed, the conditions of those observations, and the accumulated SNR2 fraction.

The tiles HDU does not replicate useful information already in the tile file.  Let's link in that information...

In [None]:
tiles = desisurvey.tiles.get_tiles()

The tiles object is row-matched to the tilestats HDU.

In [None]:
print(tiles.tileRA.shape, tilestats.shape)

The desisurvey.plots module contains routines to visualize survey parameters.  Here we show tile completeness in each of the 8 passes (4 dark, one gray, three bright).  All passes except the final bright pass were completed.  There is a slight tendency for tiles at low and high decs and at the edge of the footprint to take longer than other tiles, due to airmass and Galactic extinction, but weather effects dominate.

## Tiles Summary

The tiles table has one row per tile containing summary statistics of all exposures (if any) of that tile:

In [None]:
tilestats[:3]

The primary metric used to set the goal total exposure time for each tile is signal-to-noise ratio (SNR) for a set of predefined "threshold targets":
 - DARK & GRAY programs: ELGs with integrated \[OII\] flux of 8e-17 erg/(s cm^2)
 - BRIGHT program: BGS targets with r=19.5 and no emission lines
 
Plot the ratio of actual / goal SNR for each tile:

In [None]:
plt.hist(tilestats['snr2frac'], range=(0.75, 1.25), bins=25)
plt.xlabel('Tile SNR(actual) / SNR (goal)')
plt.axvline(np.median(tilestats['snr2frac']), c='r');

Plot the corresponding total exposure times, which shows two peaks for the BRIGHT and DARK+GRAY programs:

In [None]:
plt.hist(tilestats['exptime'] / 60, range=(0, 60), bins=30)
plt.xlabel('Tile Total Exposure Time [min]')
plt.axvline(np.median(tilestats['exptime'] / 60), c='r');

To plot the distribution of any column's values over the sky, separately for each of the 8 passes, use `plot_sky_passes`:

In [None]:
help(desisurvey.plots.plot_sky_passes)

For example, to see the distributions of SNR(actual) / SNR(goal) over the sky after year 1 (this function takes ~30s to run):

The following columns summarize the afternoon planning and scheduling of fiber assignment (FA):
 - covered: Date the tile is first covered by previous layers and thus eligible for FA.
 - available: Date the tile first has fibers assigned.
 - planned: Date the tile is first included in the observing plan.
 
All dates are specified as an integer number of days from the survey start date (defined by [this utility function](http://desisurvey.readthedocs.io/en/latest/api.html#desisurvey.utils.day_number)).  As an example, plot the number of days into the survey that each tile became available for fiber assignment:

In [None]:
desisurvey.plots.plot_sky_passes(tiles.tileRA, tiles.tileDEC, tiles.passnum, tilestats['avail'], label='Day when tile became available');

Note that the depth-first strategy has all tiles planned (=0) at the start of the survey, but other strategies have more complex dependencies between different regions of the sky in each pass.

Alternatively, let's look at what tiles in the survey were completed...

In [None]:
desisurvey.plots.plot_sky_passes(tiles.tileRA, tiles.tileDEC, tiles.passnum, tilestats['snr2frac'], label='snr2frac');

All but a small area of the survey in the last bright pass was completed.

### Exercises

In [None]:
# Plot a histogram of the number of exposures of each tile in the full survey.

In [None]:
# Plot histograms of snr2frac after year-1 separately for the DARK, GRAY, BRIGHT programs.

In [None]:
# Create all-sky plots of the mean airmass that each tile was observed at in the full survey.

In [None]:
# Study the tile "overhead", defined as 86400 * (mjd_max - mjd_min) - exptime.

## Exposures List

The exposures list is a table with rows corresponding to each simulated exposure, in increasing time order, with columns for their simulated observing conditions. Note that column names are all UPPER CASE.

In [None]:
exposures[:3]

To see the distribution of individual exposure times (and compare with the total exposure time plot above), use:

In [None]:
plt.hist(exposures['EXPTIME'] / 60, range=(0, 25), bins=25)
plt.xlabel('Individual Exposure Time [min]')
plt.axvline(np.median(exposures['EXPTIME'] / 60), c='r');

To see the distribution of atmospheric seeing during the simulated survey, use:

In [None]:
plt.hist(exposures['SEEING'], bins=25)
plt.xlabel('Per-Exposure FWHM Seeing [arcsec]')
plt.axvline(np.median(exposures['SEEING']), c='r');

To study the correlation between  exposure time and seeing in the first DARK pass, use:

In [None]:
maxpass0 = numpy.max(tiles.tileID[tiles.passnum == 0])
pass1 = exposures[exposures['tileid'] <= maxpass0]
plt.scatter(pass1['EXPTIME'] / 60, pass1['SEEING'], c=pass1['AIRMASS'], lw=0, s=5);
plt.colorbar().set_label('Airmass')
plt.xlabel('Exposure Time [min]')
plt.ylabel('Atmospheric FWHM Seeing [arcsec]');

## Nightly survey statistics

Now let's look at the efficiency of the survey over time.  The statistics file tracks survey statistics on each **night** of observations.

In [None]:
stats = surveysim.stats.SurveyStatistics(restore=os.path.join(os.environ['DESISURVEY_OUTPUT'], 'stats.fits'))

Under the hood, the stats object has information on each of 1826 nights contributing to the survey---for instance, the amount of time the dome was open, the amount of time used for science, the number of completed tiles, etc.

In [None]:
print(stats._data.dtype)
print('Number of nights: {}'.format(len(stats._data)))

The `SurveyStatistics` class makes it easy to visualize survey completion with time and see survey completion statistics.

In [None]:
stats.plot();

The `summarize` method gives a text summary of the survey completeness and efficiency.

In [None]:
stats.summarize()

The survey was completed in all passes except for the last bright pass, of which 1903 of 2010 passes were completed, in this simulation.  Only a small number of exposures were aborted.  An average of 10 minutes a night was lost to dead time in dark time, mostly at the end of the survey when tiles are not available at all LSTs.

## Using many weather realizations

The above tutorial has focused entirely on a single realization of the weather for the survey.  Another item of interest is how sensitive we expect the survey completion to be on the weather.  Let's try to figure out how much the weather affects completion statistics in the first year...

In [None]:
def completed_in_timerange(exposures, startmjd, stopmjd):
    m = (exposures['mjd'] > startmjd) & (exposures['mjd'] < stopmjd)
    tilepass = tiles.passnum[tiles.index(exposures['tileid'])]
    return [numpy.sum(exposures['snr2frac'][m & (tilepass == pass0)] >= 1)
            for pass0 in tiles.passes]

In [None]:
print('First day: {}'.format(int(numpy.min(exposures['mjd']))))

In [None]:
completed = []

parentdir = '/global/projecta/projectdirs/desi/datachallenge/surveysim2018/weather'

for direc in range(100):
    exposures0 = fits.getdata(os.path.join(parentdir, '{:03}'.format(direc), 'exposures.fits'))
    completed.append(completed_in_timerange(exposures0, 58821, 58821+365))

In [None]:
print('pass, fraction complete, standard deviation')
for tpass, ttilecomplete, ttilestd in zip(tiles.passes, numpy.mean(completed, axis=0), numpy.std(completed, axis=0)):
    ntile = tiles.pass_ntiles[tpass]
    print('{} {:5.1%} {:5.1%}'.format(tpass, ttilecomplete/ntile, ttilestd/ntile))

There is not a big difference in survey completion among the various different realizations of the weather.  The bright time first pass completion is most affected, finishing 53%, plus or minus 4 percent.  In this strategy, passes 2 & 3, the last two dark passes, do not finish much area, 5% plus or minus 1%.

### Exercises

In [None]:
# Study the correlation between exposure time and moon altitude (which is underestimated in these simulations)

In [None]:
# Plot histograms of the number of exposures per night in each program.

In [None]:
# Study how often GRAY and BRIGHT exposures are taken with no moon in the sky.

In [None]:
# Study which of the 3 moon parameters correlates most strongly with exposure time.