# Redshift Database Tutorial

## Abstract

This tutorial will cover the basics of using the redshift database, which is loaded from the outputs of the DESI pipeline.  Currently, this is based on reference run 18.3, and uses a SQLite database.  However, by using [SQLAlchemy](http://www.sqlalchemy.org/), we abstract away the details of the database.  In other words only tiny changes to the initial configuration are needed to run the same code with a [PostgreSQL](https://www.postgresql.org/) database.

## Initial Setup

This just imports everything we need and sets up paths and environment variables so we can find things.  The paths are based on the [minitest notebook](https://github.com/desihub/desitest/blob/master/mini/minitest.ipynb).

In [1]:
#
# Imports
#
%matplotlib inline
import os
from argparse import Namespace

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.font_manager import fontManager, FontProperties

import desispec.database.redshift as db
#
# Paths to files, etc.
#
reference_run = '18.3'
basedir = os.path.join('/global/project/projectdirs/desi/datachallenge/reference_runs', reference_run)
surveydir = os.environ['DESISURVEY_OUTPUT'] = os.path.join(basedir, 'survey')
targetdir = os.path.join(basedir, 'targets')
fibassigndir = os.path.join(basedir, 'fiberassign')
os.environ['DESI_SPECTRO_REDUX'] = os.path.join(basedir, 'spectro', 'redux')
os.environ['DESI_SPECTRO_SIM'] = os.path.join(basedir, 'spectro', 'sim')
os.environ['PIXPROD'] = 'mini'
os.environ['SPECPROD'] = 'mini'
reduxdir = os.path.join(os.environ['DESI_SPECTRO_REDUX'], os.environ['SPECPROD'])
simdatadir = os.path.join(os.environ['DESI_SPECTRO_SIM'], os.environ['PIXPROD'])
os.environ['DESI_SPECTRO_DATA'] = simdatadir
#
# Working directory.
#
workingdir = os.getcwd()

## Loading the Database

Although there is already a database loaded from the 18.3 results, the schema of that database is already out-of-date, so we'll load a new database directly from the 18.3 files.  It should take less than one minute to load.

In [2]:
#
# We'll be using a SQLite database, ignore the return value of db.setup_db().
#
postgresql = db.setup_db(dbfile=os.path.join(workingdir, 'minitest-{0}.db'.format(reference_run)),
                         overwrite=True)
#
# The list of exposures.
# The expand option renames the column 'PASS' to 'passnum' in the database.
# This is to prevent any collisions with the Python statement 'pass'.
#
db.load_file(os.path.join(surveydir, 'exposures.fits'), db.ObsList, hdu='EXPOSURES', expand={'PASS': 'passnum'})
#
# The truth and target tables.
#
db.load_file(os.path.join(targetdir, 'truth.fits'), db.Truth, hdu='TRUTH')
db.load_file(os.path.join(targetdir, 'targets.fits'), db.Target, hdu='TARGETS')
#
# The redshift catalog.
# In this case the expand option expands an array-valued column into corresponding scalar database columns.
#
db.load_file(os.path.join(reduxdir, 'zcatalog-mini.fits'), db.ZCat, hdu="ZCATALOG",
             expand={'COEFF': ('coeff_0', 'coeff_1', 'coeff_2', 'coeff_3', 'coeff_4',
                               'coeff_5', 'coeff_6', 'coeff_7', 'coeff_8', 'coeff_9',)})
#
# The fiberassign outputs are not contained in a single file so a special loading function is needed.
#
db.load_fiberassign(fibassigndir)

INFO:redshift.py:724:setup_db: Removing file: /global/u2/b/bweaver/Documents/Code/git/desihub/tutorials/minitest-18.3.db.
INFO:redshift.py:733:setup_db: Begin creating tables.
INFO:redshift.py:737:setup_db: Finished creating tables.
INFO:redshift.py:406:load_file: Read data from /global/project/projectdirs/desi/datachallenge/reference_runs/18.3/survey/exposures.fits.
INFO:redshift.py:418:load_file: Integrity check complete on obslist.
INFO:redshift.py:421:load_file: Initial column conversion complete on obslist.
INFO:redshift.py:443:load_file: Column expansion complete on obslist.
INFO:redshift.py:449:load_file: Column conversion complete on obslist.
INFO:redshift.py:456:load_file: Converted columns into rows on obslist.
INFO:redshift.py:463:load_file: Inserted 42 rows in obslist.
INFO:redshift.py:406:load_file: Read data from /global/project/projectdirs/desi/datachallenge/reference_runs/18.3/targets/truth.fits.
INFO:redshift.py:418:load_file: Integrity check complete on truth.
INFO:re

In [3]:
q = db.dbSession.query(db.Truth, db.ZCat).filter(db.Truth.targetid == db.ZCat.targetid).all()

In [4]:
q[0][0].truez, q[0][1].z

(0.5218037366867065, 0.5214079152662674)