Detections Database and API
=======================

This notebook demonstrates how to access the HDF5 container for the HETDEX line detections database. This database is a catalog of line emission detections and their associated 1D, aperture summed, psf-weighted spectra. There are three tables contained within this HDF5 file: 

1. Detections - this is the main database of line detection sources. It provides the position and central wavelength of each detection and corresponding line fluxes. A source detection corresponds to an emission line so it is possible to have multiple line detections at different wavelengths for a single source. There can also be multiple observations of the same line if it has been observed in multiple shots or if it is associated with a large source. 


2. Fibers - for each source detection, this table lists information about each fiber used to extract the flux measurment and weighted spectrum. This allows a user to return to the processed data products (ie. the shot HDF5 files) to investigate the source further. 


3. Spectra - for each source, this table contains arrays of wavelength and 1D flux-weighted aperture summed spectral data and corresponding errors. Non-calibrated spectra is also provided in counts


In [1]:
%matplotlib inline

import numpy as np
import tables as tb
import matplotlib.pyplot as plt

from astropy.table import Table, Column, join
from astropy.coordinates import SkyCoord
import astropy.units as u

from hetdex_api.config import HDRconfig
from hetdex_api.detections import Detections
from hetdex_api.elixer_widget_cls import ElixerWidget

In [2]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

<IPython.core.display.Javascript object>

### Use the latest curated catalog: 

In [3]:
D = Detections( curated_version='3.0.1')

In [4]:
# this is a suggested query to find LAEs:
sel_lae = (D.sn > 5.5) * (D.plya_classification > 0.75) * (np.logical_not(D.gmag<20))

In [5]:
#spectra can be accessed using get_spectrum()
spec = D.get_spectrum(D.detectid[sel_lae][10000])

In [6]:
spec

wave1d,spec1d,spec1d_err
Angstrom,1e-17 erg / (Angstrom cm2 s),1e-17 erg / (Angstrom cm2 s)
float32,float32,float32
3470.0,-0.14205462,3.122314
3472.0,-0.14190048,3.1189263
3474.0,-0.14174606,3.1155322
3476.0,-0.1415909,3.1121216
3478.0,-0.14143282,3.1086473
3480.0,-0.14126955,3.1050587
3482.0,-0.14109667,3.1012588
3484.0,-0.14090763,3.0971038
3486.0,-0.14069328,3.0923922
...,...,...


### Explore Using ElixerWidget:

In [7]:
elix_widget = ElixerWidget(detectlist = D.detectid[sel_lae])

interactive(children=(Text(value='3000000485', description='DetectID:', placeholder='3000000485'), Output()), …

# Access the Detections H5 file but do not load in any data. This is the most memory efficient way to access Detection Info, Spectral Data and Fiber Information

It is also important to use this for HDR3 as a few variables are updated based on calibration corrections outlined at https://op1srv.mpe.mpg.de/wikihetdex/index.php/HDR3.0.1_Catalog

Access Basic Detection Info. You will need to know what survey and catalog type you want to access. 301XXXXXXX detectids are for line emission sources 309XXXXXXX detectids are for continuum sources.

In [8]:
D = Detections(survey='hdr3', catalog_type='lines', searchable=False, loadtable=False)

Get updated Detection Info

In [9]:
det_info = D.get_detection_info(3001001637)

In [10]:
det_info

array([(3001001637, 20180511015, 189.06462, 62.119335, 20180511, 15, 4557.63, 0.46, 72.84, 5.26, 6.82, 0.58, 0.12, 0.16, 17.86, 0.66, 1.19, 0.22, b'multi_301_015_038_RL', 2, 529, 25, b'RL', 0.919, 1.65, 3, b'20180511015_3_multi_301_015_038_RL_002', 2.759, b'038', b'015', b'20180511v015_301_015_038_001', 0.95, 14.48, 14.42, 17.99, b'301', 0.2256, -16.53, 13.22)],
      dtype=[('detectid', '<i8'), ('shotid', '<i8'), ('ra', '<f4'), ('dec', '<f4'), ('date', '<i4'), ('obsid', '<i4'), ('wave', '<f4'), ('wave_err', '<f4'), ('flux', '<f4'), ('flux_err', '<f4'), ('linewidth', '<f4'), ('linewidth_err', '<f4'), ('continuum', '<f4'), ('continuum_err', '<f4'), ('sn', '<f4'), ('sn_err', '<f4'), ('chi2', '<f4'), ('chi2_err', '<f4'), ('multiframe', 'S20'), ('fibnum', '<i4'), ('x_raw', '<i4'), ('y_raw', '<i4'), ('amp', 'S2'), ('apcor', '<f4'), ('chi2fib', '<f4'), ('expnum', '<i4'), ('fiber_id', 'S38'), ('flux_noise_1sigma', '<f4'), ('ifuid', 'S3'), ('ifuslot', 'S3'), ('inputid', 'S40'), ('noise_ratio',

Get Fiber Info:

In [11]:
fib_info = D.get_fiber_info(3001001637)

In [12]:
fib_info

array([(3001001637, 189.06473, 62.120068, b'multi_301_015_038_RU', b'20180511015_1_multi_301_015_038_RU_093', -12.71, 15.42, 20180511, 15, 1, 2.643, b'20180511T053817.5', 4558.8, 0, 0.0116, [0., 0., 0., 0., 0.], b'RU',  93, b'038', b'015', b'301', 529,  848),
       (3001001637, 189.06561, 62.11949 , b'multi_301_015_038_RU', b'20180511015_1_multi_301_015_038_RU_094', -15.25, 15.42, 20180511, 15, 1, 1.762, b'20180511T053817.5', 4558.8, 0, 0.0566, [0., 0., 0., 0., 0.], b'RU',  94, b'038', b'015', b'301', 529,  858),
       (3001001637, 189.06323, 62.12    , b'multi_301_015_038_RU', b'20180511015_1_multi_301_015_038_RU_112', -11.44, 13.22, 20180511, 15, 1, 3.343, b'20180511T053817.5', 4558.8, 0, 0.0036, [0., 0., 0., 0., 0.], b'RU', 112, b'038', b'015', b'301', 529, 1014),
       (3001001637, 189.06412, 62.119423, b'multi_301_015_038_RL', b'20180511015_1_multi_301_015_038_RL_001', -13.98, 13.22, 20180511, 15, 1, 0.905, b'20180511T053817.5', 4558.8, 0, 0.1597, [0., 0., 0., 0., 0.], b'RL',  

### Initiate the API and Access the full database

When you call `Detections()` you intiate the Detections Class object which takes columns from the Detections Table in the HDF5 file and adds them as array attributes to the Detections class object. It also converts ra/dec into astropy skycoords in the `coords` attribute, calculates an approximate gband magnitude using the 1D spectra and adds elixer probabilities for each detection. If you append the call with `refine()` then a number of downselections are applied to the database to return a more robust list of line emitters. `refine()` removes spurious detections found in bad amps or at the edges of the CCD or in shots that are not deemed appropriate for HETDEX analysis. It can also remove all bright objects above a specific gband magnitude if desired (default to None if no option is given).

In [None]:
# To access the latest HDRX.X lines database (ie. the full H5 file):

detects = Detections()

# to remove the latest bad amps and pixels (this isn't needed if you are using a curated catalog)

# detects = Detections(survey='hdr2.1', catalog_type='lines').refine()

# or if you want to open the continuum source catalog:
# detects = Detections(survey='hdr2.1', catalog_type='continuum')

### Note if you do not want to load the whole table, but just access spectra for a specific detectid:

In [None]:
det_object = Detections('hdr2.1', loadtable=False)

In [None]:
spec = det_object.get_spectrum(2100191119)

In [None]:
spec

Here are a list of attributes built into the Detections class:

In [None]:
detects.__dict__.keys()

If you prefer working in astropy tables, you can grab it this way:

In [None]:
detect_table = detects.return_astropy_table()

In [None]:
detect_table

## How we made the subset catalog for the team:

In [None]:
sel_field = (detects.field == 'cosmos') | (detects.field == 'dex-fall') | (detects.field == 'dex-spring') | (detects.field == 'egs') | (detects.field == 'goods-n')
sel_chi2 = detects.chi2 < 1.2
sel_wave = ( detects.wave >= 3510 ) * (detects.wave <= 5490)
sel_lw = (detects.linewidth <= 6)
sel_cont = detects.continuum > -3
sel_sn = detects.sn >= 4.8
sel_chi2fib = (detects.chi2fib < 4.5)

sel_cat = sel_field * sel_chi2 * sel_wave * sel_lw * sel_cont * sel_sn * sel_chi2fib

det_table = detects.return_astropy_table()

In [None]:
team_table = detect_table[sel_cat]

## Querying by sky coordinates

Upon initialization of the Detections Class, sky coordinates are converted to an Astropy sky coordinates array to allow for easy querying:

In [None]:
detects.coords

To query a region of the sky, you can use the Detections function `query_by_coords` which takes an astropy coords objects as an argument as well as a radius represented by an astropy quantity. It returns a boolean mask to index the Detections class object.

In [None]:
obj_coords = SkyCoord(199.35704 * u.deg, 51.06718 * u.deg, frame='icrs')

In [None]:
maskregion = detects.query_by_coords(obj_coords, 10. * u.arcsec)

The Detections class allows slicing so that a boolean mask applied to the class will slice each array attribute accordingly:

In [None]:
detects_in_region = detects[maskregion]
print(np.size(detects_in_region.detectid))

## Find a direct line match

If you want to find an exact line match you can use the function `find_match()`

In [None]:
obj_coords = SkyCoord(199.35704 * u.deg, 51.06718 * u.deg, frame='icrs')

In [None]:
wave_obj = 3836.

In [None]:
idx = detects.find_match(obj_coords, wave=wave_obj, radius=5.*u.arcsec, dwave=5 )

In [None]:
detects.detectid[idx]

In [None]:
detect_table[idx]

## Check out matched sources in the ElixerWidget

For this example, we have found 12 detections in this region, we can examine these via the ELiXer reports using the `ElixerWidget()` class from `hetdex_api.elixer_widget_cls.py`. To do so we need to save the detectid list to examine in the widget.

In [None]:
#np.savetxt('detects_obj.txt', detects_in_region.detectid)

You can the run the elixer_widget to scan through the ELiXer reports for this object. Use the "Next DetectID" button to scan the list. The "DetectID" text widget will give access to all reports interactively and scans in increasing single digit increments, but the green Next DetectID button will go in order of the ingest list from 'detects_obj.txt'.

In [None]:
elix_widget = ElixerWidget(detectlist = detects_in_region.detectid)
#elix_widget = ElixerWidget(detectfile='detects_obj.txt')

For more information on using the Elixer Widgets GUI go to Notebook 12. We will discuss team classification efforts there. But for quick investigation its helpful to pull the GUI up to just scan through a detection list.

## Accessing 1D Spectra

Spectra in counts and flux-calibrated units are stored in the Spectra Table of the Detection HDF5 file, it can be accessed directly through the Detections class object which stores the detect HDF5 as an attribute:

In [None]:
print(detects.hdfile)

In [None]:
spectra = detects.hdfile.root.Spectra

This is a very large table so its not advised to read it in all at once. The columns are:

In [None]:
spectra.cols

Flux calibrated, psf-weighted 1D spectra can be retrieved via the API for a single detectid through the function `get_spectrum`:

In [None]:
detectid_nice_lae = 2100744791
spec_table = detects.get_spectrum(detectid_nice_lae) 

In [None]:
detects.plot_spectrum(detectid_nice_lae)

or if we want to zoom in on the emission line:

In [None]:
cw = detects.wave[detects.detectid == detectid_nice_lae]
detects.plot_spectrum(detectid_nice_lae, xlim=(cw-50, cw+50))

You can also save the spectrum to a text file. It is automatically saved as spec_##detectid##.dat, but you can also use the argument `outfile`

In [None]:
detects.save_spectrum(detectid_nice_lae)
# or
# detects.save_spectrum(detectid_nice_lae, outfile='tmp.txt')

## Getting Fiber information for a detection

You can find a list of all fibers used in the measurement in the Fibers table. The Fibers table and its associated columns can be accessed similar to the Spectra table by searching for a match in the the detectid column. 

In [None]:
fibers = detects.hdfile.root.Fibers
fibers.cols

Access the fiber table for the above source:

In [None]:
fiber_table = fibers.read_where("detectid == detectid_nice_lae") 

In [None]:
Table(fiber_table)

When you are done with the HDF5 file, close it. The data that you extracted into tables and arrays will remain.

In [None]:
detects.hdfile.close()

## Accessing the ELiXer Classifications

In [None]:
config = HDRconfig(survey='hdr2.1')
file_elix = tb.open_file(config.elixerh5)

In [None]:
file_elix.root.Detections

Note: these are also appended to the Detections() class object. Each column in the above table can be accessed as an attribute of the Detections() class object. For example, the probability of LAE to OII measured from the HETDEX continuum is:

In [None]:
#detects.plae_poii_hetdex

or the nearest neighbour magnitude in an ancillary photometric catalog is:

In [None]:
#detects.mag_match

and this comes from the filter:

In [None]:
#detects.cat_filter