### AGNQSO Summary catalogue generator

This notebook demonstrates the catalog generation.

All functions and final wrapper script to live in py/ * .py.

If you are on NERSC please select 'DESI main' as your kernel.

Notebook direct contirbutions:

* Alexander, D (Univ. of Durham, Durham, UK) (VI merging done by DA)
* Alfarsy, R (Univ. of Portsmouth, Portsmouth, UK)
* Canning, B (Univ. of Portsmouth, Portsmouth, UK)
* Chaussidon, E (CEA Saclay, Paris, France) (QSO catalogs generated by EC et al.)
* Juneau, S (NOIRLab, Arizona, USA)
* Mezcua, M (Institut de Ciències de l'Espai, Barcelona, Spain)
* Moustakas, J (Siena College, New York, USA) (FastSpecFit catalogues by JM)
* Pucha, R (Univ. of Arizona, Arizona, USA) 

## Docs:
Readme:
Wiki: 
Github: 

Directions for VACs: 
- EDR: https://desi.lbl.gov/trac/wiki/Pipeline/Releases/EDR/Planning/ValueAdded
- DR1: https://desi.lbl.gov/trac/wiki/Pipeline/Releases/DR1/Planning/ValueAdded

Our VAC directory: /global/cfs/cdirs/desi/science/gqp/agncatalog/  

## Imports

In [1]:
# General imports
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import os.path
#import yaml

# Import Astropy libraries - useful for many astronomy related function
from astropy.table import Table, join, Column, hstack
from astropy.io import fits
# Fast FITS file I/O access
import fitsio

# DESI modules
from desispec.zcatalog import find_primary_spectra, create_summary_catalog  # at NERSC needs DESI master
from desitarget.targetmask import desi_mask, bgs_mask, scnd_mask      # For the main survey
#from desiutil.bitmask import BitMask
from desiutil.annotate import annotate_table, annotate_fits

# GQP_CODE
import sys
sys.path.append("../py/")
import set_agn_masksDESI

#https://www.legacysurvey.org/viewer?ra=10.1572&dec=-0.3316&layer=ls-dr9&zoom=16

In [2]:
## Making the matplotlib plots look nicer - from SJ
settings = {
    'font.size':16,
    'axes.linewidth':2.0,
    'xtick.major.size':6.0,
    'xtick.minor.size':4.0,
    'xtick.major.width':2.0,
    'xtick.minor.width':1.,
    'xtick.direction':'in', 
    'xtick.minor.visible':True,
    'xtick.top':True,
    'ytick.major.size':6.0,
    'ytick.minor.size':4.0,
    'ytick.major.width':2.0,
    'ytick.minor.width':1.,
    'ytick.direction':'in', 
    'ytick.minor.visible':True,
    'ytick.right':True
}

plt.rcParams.update(**settings)

# Workflow

Deveopment code in: /global/homes/b/bcanning/AGNQSO_summary_catalog/

1. Read Edmond threshold catalog
   * OBJ_TYPE = TGT
   * low-z star cut   
   
   * Note: Fuji has defaut 0.95 c_thresh for all targets
     - If BGS target (not QSO, not ELG, not LRG): RR SPECTYPE=QSO and (QN C_LINE_BEST>0.6 or MgII)
     - If ELG (not QSO): RR SPECTYPE=QSO and QN C_LINE_BEST>0.6
     - If QSO threshold 'QN_C_LINE_BEST' > 0.95 (Default)

2. Generate (or read) summary catalog
   * Join to add 'TSNR2_LRG','SV_NSPEC','SV_PRIMARY','ZCAT_NSPEC','ZCAT_PRIMARY'
   * Generate 'QN_C_LINE_BEST' and 'QN_C_LINE_SECOND_BEST'

3. Join FastSpecFit v2 - specific columns

4. Read yaml file

5. Set QSO_MASKBITS part of AGN_MASKBITS

6. Set BPT bits 
   * Update AGN_MASKBITS 

7. Join multiwave survey

8. Write catalog

## Basic Info (edit for choosing data release)

In [3]:
GQP_AGNcat_dir='/global/cfs/cdirs/desi/science/gqp/agncatalog/'

# Which spectroscopic release
#specprod = 'fuji'
#specprod = 'guadalupe'
specprod = 'iron'

## EDR version
if specprod=='fuji':
    
    #### QSO-maker file
    # Edmonds catalogue from QSO maker keeping all columns
    path_qsom = f'/global/cfs/cdirs/desi/users/edmondc/QSO_catalog/{specprod}/'  #NERSC
    file_qsom = path_qsom+f'QSO_cat_{specprod}_healpix_all_targets_v2.fits'
    
    # FastSpecFit file
    fast_dir = f'/global/cfs/cdirs/desi/spectro/fastspecfit/{specprod}/v3.2/catalogs/'
    fastspec_file = fast_dir+f'fastspec-{specprod}.fits'
    #fastphot_file = fast_dir+f'fastphot-{specprod}.fits'
    
    # Redshift catalog
    #file_zpix_sum_cat=dir_for_tmp+'zpix-'+specprod+'-summary.fits'
    # Using the public EDR version of the zcat VAC
    file_zpix_sum_cat = '/global/cfs/cdirs/desi/public/edr/vac/edr/zcat/fuji/v1.0/zall-pix-edr-vac.fits'

## DR1 version
if specprod=='iron':

    ## QSO-maker path (NOTE: from merge_QSOmaker.ipynb)
    path_qsom = f'/global/cfs/cdirs/desi/science/gqp/agncatalog/qsomaker/iron/'

    # Needed to make DR1 version after asking Edmond to run on all targets / all surveys
    file_qsom = path_qsom+'QSO_cat_iron_healpix_all_targets_v1.fits'

    # FastSpecFit file
    fast_dir = f'/global/cfs/cdirs/desi/spectro/fastspecfit/{specprod}/v2.1/catalogs/'
    # File with all DR1
    fastspec_file = fast_dir+f'fastspec-{specprod}.fits'

    # zcat
    file_zpix_sum_cat = '/global/cfs/cdirs/desi/spectro/redux/iron/zcatalog/v1/zall-pix-iron.fits'

    
## Put catalog in this temporary location then copy over to correct DR and 
## version number when happy with a new version
dir_for_cat=GQP_AGNcat_dir+'catalog/'
dir_for_tmp=GQP_AGNcat_dir+'tmp/'

## Choice to use healpix-based catalogs
filetype = 'healpix'
specgroup_type = 'zpix'

In [4]:
# Main identifiers for Joins
keys_for_join=['TARGETID','SURVEY','PROGRAM']

In [5]:
# Data model
#
# Question about whether to keep: 
#  - LS_ID - this one from FSF
#  - other Tractor cols (MORPHTYPE, MASKBITS, PHOTSYS)
#  - EBV_1 (where does this one come from? from joining two tables with EBV??)
#  - FIBERFLUX* and FIBERTOTFLUX*
#  + replaced HPXPIXEL with HEALPIX (updated name); added BGS targeting cols to find BGS_WISE
#  + mjd information min, mean, max, zcat and primary stuff: https://github.com/desihub/desispec/blob/master/py/desispec/zcatalog.py
#  + removed QSO_MASKBITS as repeated in AGN_MASKBITS
#  + where does the Z come from - FSF or QSO maker - make it FSF!
#
#final_cols=['TARGETID','SURVEY','PROGRAM','HEALPIX','Z','ZERR','ZWARN','SPECTYPE','COADD_FIBERSTATUS','TARGET_RA','TARGET_DEC',\
#            'MORPHTYPE','EBV_1','MASKBITS',\
#            'DESI_TARGET','SCND_TARGET','BGS_TARGET','COADD_NUMEXP','COADD_EXPTIME','CMX_TARGET',\
#            'SV1_DESI_TARGET','SV2_DESI_TARGET','SV3_DESI_TARGET','SV1_BGS_TARGET','SV2_BGS_TARGET','SV3_BGS_TARGET',\
#            'SV1_SCND_TARGET','SV2_SCND_TARGET','SV3_SCND_TARGET',\
#            'TSNR2_LYA','TSNR2_QSO','TSNR2_LRG',\
#            'DELTA_CHI2_MGII','A_MGII','SIGMA_MGII','B_MGII','VAR_A_MGII','VAR_SIGMA_MGII','VAR_B_MGII',\
#            'Z_RR','Z_QN','C_LYA','C_CIV','C_CIII','C_MgII','C_Hbeta','C_Halpha','Z_LYA','Z_CIV','Z_CIII','Z_MgII','Z_Hbeta','Z_Halpha',\
#            'SV_NSPEC','SV_PRIMARY','ZCAT_NSPEC','ZCAT_PRIMARY',\
#            'QN_C_LINE_BEST','QN_C_LINE_SECOND_BEST','QSO_MASKBITS','AGN_MASKBITS','AGN_TYPE',\
#            'PHOTSYS','LS_ID','FIBERFLUX_G','FIBERFLUX_R','FIBERFLUX_Z','FIBERTOTFLUX_G','FIBERTOTFLUX_R','FIBERTOTFLUX_Z'
#.           'MIN_MJD','MEAN_MJD','MAX_MJD']

## SJ: should we keep QN_C_LINE_BEST?
## SJ: Do we need 'MEAN_MJD'?
final_cols=['TARGETID', 'SURVEY', 'PROGRAM', 'HEALPIX', 'Z', 'ZERR', 'ZWARN', 'SPECTYPE', \
            'AGN_MASKBITS', 'OPT_UV_TYPE', 'IR_TYPE', 'COADD_FIBERSTATUS', \
            'TARGET_RA', 'TARGET_DEC', 'LS_ID', 'MIN_MJD','MEAN_MJD','MAX_MJD','COADD_NUMEXP', 'COADD_EXPTIME', \
            'SV_PRIMARY','ZCAT_PRIMARY','DESI_TARGET', 'SCND_TARGET', 'BGS_TARGET', 'CMX_TARGET', \
            'SV1_DESI_TARGET', 'SV2_DESI_TARGET', 'SV3_DESI_TARGET', 'SV1_BGS_TARGET', 'SV2_BGS_TARGET', 'SV3_BGS_TARGET', \
            'SV1_SCND_TARGET', 'SV2_SCND_TARGET', 'SV3_SCND_TARGET']

## Column for "convenience" extension with values for plotting
ext2_cols=['TARGETID','SURVEY','PROGRAM','LOGMSTAR',\
           'FLUX_W1','FLUX_W2','FLUX_W3',\
           'FLUX_IVAR_W1','FLUX_IVAR_W2','FLUX_IVAR_W3',\
           'CIV_1549_FLUX','CIV_1549_FLUX_IVAR', 'CIV_1549_SIGMA',\
           'MGII_2796_FLUX','MGII_2796_FLUX_IVAR','MGII_2796_SIGMA',\
           'MGII_2803_FLUX','MGII_2803_FLUX_IVAR', 'MGII_2803_SIGMA',\
           'OII_3726_FLUX','OII_3726_FLUX_IVAR','OII_3726_EW','OII_3726_EW_IVAR',\
           'OII_3729_FLUX','OII_3729_FLUX_IVAR','OII_3729_EW','OII_3729_EW_IVAR',\
           'NEV_3426_FLUX','NEV_3426_FLUX_IVAR',\
           'HEII_4686_FLUX','HEII_4686_FLUX_IVAR',\
           'HBETA_EW','HBETA_EW_IVAR','HBETA_FLUX','HBETA_FLUX_IVAR',\
           'HBETA_BROAD_FLUX', 'HBETA_BROAD_FLUX_IVAR', 'HBETA_BROAD_SIGMA','HBETA_BROAD_CHI2',\
           'OIII_5007_FLUX','OIII_5007_FLUX_IVAR','OIII_5007_SIGMA',\
           'OI_6300_FLUX','OI_6300_FLUX_IVAR',\
           'HALPHA_EW', 'HALPHA_EW_IVAR', 'HALPHA_FLUX','HALPHA_FLUX_IVAR', \
           'HALPHA_BROAD_FLUX','HALPHA_BROAD_FLUX_IVAR','HALPHA_BROAD_VSHIFT','HALPHA_BROAD_SIGMA',\
           'NII_6584_FLUX','NII_6584_FLUX_IVAR',\
           'SII_6716_FLUX','SII_6716_FLUX_IVAR',\
           'SII_6731_FLUX','SII_6731_FLUX_IVAR']

# SJ: removed: 'HALPHA_SIGMA'
#     added: 'HALPHA_EW_IVAR'

# Longer list (all that we use?)

# ext2_cols=['TARGETID','SURVEY','PROGRAM','LOGMSTAR',\
#            'CIV_1549_FLUX','CIV_1549_FLUX_IVAR', 'CIV_1549_SIGMA',\
#            'MGII_2796_FLUX','MGII_2796_FLUX_IVAR','MGII_2796_SIGMA',\
#            'MGII_2803_FLUX','MGII_2803_FLUX_IVAR', 'MGII_2803_SIGMA',\
#            'OII_3726_FLUX','OII_3726_FLUX_IVAR','OII_3726_EW',\
#            'NEV_3426_FLUX','NEV_3426_FLUX_IVAR',\
#            'HEII_4686_FLUX','HEII_4686_FLUX_IVAR',\
#            'HBETA_EW','HBETA_FLUX','HBETA_FLUX_IVAR',\
#            'HBETA_BROAD_FLUX', 'HBETA_BROAD_FLUX_IVAR', 'HBETA_BROAD_VSHIFT','HBETA_BROAD_SIGMA','HBETA_BROAD_CHI2',\
#            'OIII_5007_FLUX','OIII_5007_FLUX_IVAR','OIII_5007_SIGMA',\
#            'OI_6300_FLUX','OI_6300_FLUX_IVAR',\
#            'HALPHA_EW', 'HALPHA_FLUX','HALPHA_FLUX_IVAR','HALPHA_SIGMA', \
#            'HALPHA_BROAD_FLUX','HALPHA_BROAD_FLUX_IVAR','HALPHA_BROAD_VSHIFT','HALPHA_BROAD_SIGMA',\
#            'NII_6584_FLUX','NII_6584_FLUX_IVAR',\
#            'SII_6716_FLUX','SII_6716_FLUX_IVAR',\
#            'SII_6731_FLUX','SII_6731_FLUX_IVAR']

print(final_cols)

['TARGETID', 'SURVEY', 'PROGRAM', 'HEALPIX', 'Z', 'ZERR', 'ZWARN', 'SPECTYPE', 'AGN_MASKBITS', 'OPT_UV_TYPE', 'IR_TYPE', 'COADD_FIBERSTATUS', 'TARGET_RA', 'TARGET_DEC', 'LS_ID', 'MIN_MJD', 'MEAN_MJD', 'MAX_MJD', 'COADD_NUMEXP', 'COADD_EXPTIME', 'SV_PRIMARY', 'ZCAT_PRIMARY', 'DESI_TARGET', 'SCND_TARGET', 'BGS_TARGET', 'CMX_TARGET', 'SV1_DESI_TARGET', 'SV2_DESI_TARGET', 'SV3_DESI_TARGET', 'SV1_BGS_TARGET', 'SV2_BGS_TARGET', 'SV3_BGS_TARGET', 'SV1_SCND_TARGET', 'SV2_SCND_TARGET', 'SV3_SCND_TARGET']


# Define cuts that might be wanted

The below function should be run on the final joined catalogs as not all keywords exist otherwise.

We are not making any cuts currnetly. 

In [6]:
# def cut_fiberstatus(T):
#     ''' 
#     keep only objects with 'COADD_FIBERSTATUS' == 0
#     '''
#     keep = (T['COADD_FIBERSTATUS']==0)
#     return T[keep]

# def cut_npixels(T):
#     ''' 
#     keep only objects with 'NPIXELS' > 0 (signifying they have a coadded spectrum)
#     '''
#     keep = (T['NPIXELS']>0)
#     return T[keep]

# def cut_zwarn(T):
#     ''' 
#     keep only objects with                  (zb['ZWARN'] & ZWarningMask.NODATA == 0))[0]
# https://fastspecfit.readthedocs.io/en/latest/vacs.html#sample-selection
#     '''
#     keep = (T['ZWARN'] & ZWarningMask.NODATA ==0)  # might not be written correctly...
#     return T[keep]

def cut_objtype(T):
    ''' 
    keep only objects with 'OBJTYPE' == 'TGT'
    '''
    keep = (T['OBJTYPE']=='TGT')
    return T[keep]

def cut_lowz_star(T):
    ''' 
    keep only objects with redshift greater than 0.001
    '''
    keep = (T['Z']>0.001)
    return T[keep]

# def cut_lowz_galfragments(T):
#     ''' 
#     keep only objects with 
#     '''
#     keep = (T['']==)
#     return T[keep]    

### cuts added by EC but leaving in here for discussion with SJ

# #### Notes/Questions
# - How to treat objects that might have more than one target type?
# - Correct bump at z~3.7:
# ```
#     sel_pb_redshift = (QSO_cat['Z'] > 3.65) & ((QSO_cat['C_LYA']<0.95) | (QSO_cat['C_CIV']<0.95))
# ```

In [7]:
#test1=cut_objtype(T_qsom)
#test2=cut_lowz_star(T_qsom)
#print(len(test2))

## 1. QSO-maker Cat

In [8]:
## SJ: will exclude the targeting cols because we'll add them from the zcat VAC instead 
#qsom_cols=['TARGETID','Z','ZERR','ZWARN','SPECTYPE','COADD_FIBERSTATUS','TARGET_RA','TARGET_DEC','MORPHTYPE','EBV','MASKBITS','DESI_TARGET','SCND_TARGET','COADD_NUMEXP','COADD_EXPTIME','CMX_TARGET','SV1_DESI_TARGET','SV2_DESI_TARGET','SV3_DESI_TARGET','SV1_SCND_TARGET','SV2_SCND_TARGET','SV3_SCND_TARGET','TSNR2_LYA','TSNR2_QSO','DELTA_CHI2_MGII','A_MGII','SIGMA_MGII','B_MGII','VAR_A_MGII','VAR_SIGMA_MGII','VAR_B_MGII','Z_RR','Z_QN','C_LYA','C_CIV','C_CIII','C_MgII','C_Hbeta','C_Halpha','Z_LYA','Z_CIV','Z_CIII','Z_MgII','Z_Hbeta','Z_Halpha','QSO_MASKBITS','HPXPIXEL','SURVEY','PROGRAM']

# Current choice for Iron
qsom_cols=['TARGETID','Z','ZERR','ZWARN','SPECTYPE','COADD_FIBERSTATUS','TARGET_RA','TARGET_DEC',\
           'MORPHTYPE','MASKBITS','COADD_NUMEXP','COADD_EXPTIME','TSNR2_LYA','TSNR2_QSO',\
           'Z_RR','Z_QN','C_LYA','C_CIV','C_CIII','C_MgII','C_Hbeta','C_Halpha',\
           'QSO_MASKBITS','SURVEY','PROGRAM']
# Trying without these (they're not used):  
#   'DELTA_CHI2_MGII','A_MGII','SIGMA_MGII','B_MGII','VAR_A_MGII','VAR_SIGMA_MGII','VAR_B_MGII',\
#   'Z_LYA','Z_CIV','Z_CIII','Z_MgII','Z_Hbeta','Z_Halpha'

print(qsom_cols)

['TARGETID', 'Z', 'ZERR', 'ZWARN', 'SPECTYPE', 'COADD_FIBERSTATUS', 'TARGET_RA', 'TARGET_DEC', 'MORPHTYPE', 'MASKBITS', 'COADD_NUMEXP', 'COADD_EXPTIME', 'TSNR2_LYA', 'TSNR2_QSO', 'Z_RR', 'Z_QN', 'C_LYA', 'C_CIV', 'C_CIII', 'C_MgII', 'C_Hbeta', 'C_Halpha', 'QSO_MASKBITS', 'SURVEY', 'PROGRAM']


In [9]:
%%time
T_qsom = Table(fitsio.read(file_qsom, columns=qsom_cols, ext=1)) 

CPU times: user 18.7 s, sys: 7.59 s, total: 26.3 s
Wall time: 26.3 s


In [10]:
print(len(T_qsom))

## Remove stars with a low redshift cut
T_qsom = cut_lowz_star(T_qsom)

print(len(T_qsom))
print(T_qsom.columns)

18260646
18165695
<TableColumns names=('TARGETID','Z','ZERR','ZWARN','SPECTYPE','COADD_FIBERSTATUS','TARGET_RA','TARGET_DEC','MORPHTYPE','MASKBITS','COADD_NUMEXP','COADD_EXPTIME','TSNR2_LYA','TSNR2_QSO','Z_RR','Z_QN','C_LYA','C_CIV','C_CIII','C_MgII','C_Hbeta','C_Halpha','QSO_MASKBITS','SURVEY','PROGRAM')>


## 2. Use the Redshift Summary (zcat) VAC

The Redshift Summary Catalog VAC supersedes the original redshift catalogs for Fuji (EDR) as described [here](https://data.desi.lbl.gov/doc/releases/edr/vac/zcat/).

In [11]:
# from https://github.com/desihub/desispec/blob/master/py/desispec/zcatalog.py
# for fugi 2Gb file (approx 5 mins)
if os.path.isfile(file_zpix_sum_cat):
   print('zpix summary file exists - using existing copy')
else:
   print('ERROR: zpix summary file NOT FOUND')
#   print('Creating zpix summary file - this is a couple of Gb for fuji and may take 5 mins')
#   create_summary_catalog(specprod, specgroup = specgroup_type, all_columns = True, columns_list = None, output_filename = file_zpix_sum_cat)

zpix summary file exists - using existing copy


In [12]:
%%time
if specprod == 'fuji':
    need_cols = ['TARGETID','SURVEY','PROGRAM','HEALPIX','TSNR2_LRG','SV_NSPEC','SV_PRIMARY',\
                 'ZCAT_NSPEC','ZCAT_PRIMARY','MIN_MJD','MEAN_MJD','MAX_MJD', 'OBJTYPE'] # fuji
if specprod == 'guadalupe':
    need_cols = ['TARGETID','SURVEY','PROGRAM','HEALPIX','TSNR2_LRG','ZCAT_NSPEC','ZCAT_PRIMARY',\
                 'MIN_MJD','MEAN_MJD','MAX_MJD','OBJTYPE'] # guadalupe
    # for guadalupe, iron, etc.: also add MAIN_PRIMARY and MAIN_NSPEC
if specprod == 'iron':
    need_cols = ['TARGETID','SURVEY','PROGRAM','HEALPIX','TSNR2_LRG','ZCAT_NSPEC','ZCAT_PRIMARY',\
                 'SV_NSPEC','SV_PRIMARY','MAIN_PRIMARY','MAIN_NSPEC','MIN_MJD','MEAN_MJD','MAX_MJD','OBJTYPE']    
    
# Replace targeting columns with updated version from zcat VAC (for Fuji), keeping BGS to find BGS_WISE targets
target_cols = ['DESI_TARGET','BGS_TARGET','SCND_TARGET','CMX_TARGET','SV1_DESI_TARGET','SV1_BGS_TARGET','SV1_SCND_TARGET',\
              'SV2_DESI_TARGET','SV2_BGS_TARGET','SV2_SCND_TARGET','SV3_DESI_TARGET','SV3_BGS_TARGET','SV3_SCND_TARGET']

## SJ: for faster performance, only read the desired columns with fitsio() from the zpix_sum file
T_zpixsum_cut = Table(fitsio.read(file_zpix_sum_cat, columns=need_cols+target_cols, ext=1))

CPU times: user 30.4 s, sys: 21 s, total: 51.4 s
Wall time: 51.5 s


In [13]:
%%time
## This is slow! Would it be faster to work with Pandas DataFrames?
## Takes >2min for Iron/DR1 without indexing tables
T_qsom_zpixsum = join(T_qsom, T_zpixsum_cut, keys=keys_for_join, join_type='left')

CPU times: user 1min 59s, sys: 13.5 s, total: 2min 12s
Wall time: 2min 12s


In [14]:
# making sure this is the sane size as before
print(len(T_qsom_zpixsum))
if len(T_qsom) == len(T_qsom_zpixsum):
    print('Same length - all good')
else:
    print('The joined QSO maker and summary catalog df is not the same size as the QSO maker catalog')
    print('Something went wrong!')

18165695
Same length - all good


In [15]:
# free memory
del T_qsom
del T_zpixsum_cut

In [18]:
## Already applied for Iron/DR1 but will run anyway for now
T_qsom_zpixsum = cut_objtype(T_qsom_zpixsum)
print(len(T_qsom_zpixsum))

18165695


In [19]:
## SJ: Where is this used??
# it's now added to QSO-maker for Iron (DR1) so could read it and save it

## Adding two columns we need for the cuts
#a = np.array([T_qsom_zpixsum['C_LYA'], T_qsom_zpixsum['C_CIV'], T_qsom_zpixsum['C_CIII'], \
#              T_qsom_zpixsum['C_MgII'], T_qsom_zpixsum['C_Hbeta'], T_qsom_zpixsum['C_Halpha']])
#T_qsom_zpixsum['QN_C_LINE_BEST'] = [max(l) for l in (a.T).tolist()]
#T_qsom_zpixsum['QN_C_LINE_SECOND_BEST'] = [sorted(l)[-2] for l in (a.T).tolist()]

## 3. Join FastSpecFit

Original list but let's remove anything unnecessary:
```
fsf_data_cols=['TARGETID','SURVEY','PROGRAM','LOGMSTAR',\
               'CIV_1549_FLUX','CIV_1549_FLUX_IVAR', 'CIV_1549_VSHIFT','CIV_1549_SIGMA',\
               'MGII_2796_FLUX','MGII_2796_FLUX_IVAR', 'MGII_2796_VSHIFT','MGII_2796_SIGMA',\
               'MGII_2803_FLUX','MGII_2803_FLUX_IVAR', 'MGII_2803_SIGMA',\
               'OII_3726_FLUX','OII_3726_FLUX_IVAR','OII_3726_EW','OII_3726_EW_IVAR',\
               'OII_3729_FLUX','OII_3729_FLUX_IVAR','OII_3729_EW','OII_3729_EW_IVAR',\
               'NEV_3426_FLUX','NEV_3426_FLUX_IVAR','NEV_3426_VSHIFT','NEV_3426_SIGMA',\
               'HEII_4686_FLUX','HEII_4686_FLUX_IVAR',\
               'HBETA_EW','HBETA_EW_IVAR','HBETA_FLUX','HBETA_FLUX_IVAR','HBETA_VSHIFT','HBETA_SIGMA',\
               'HBETA_BROAD_FLUX', 'HBETA_BROAD_FLUX_IVAR', 'HBETA_BROAD_VSHIFT','HBETA_BROAD_SIGMA','HBETA_BROAD_CHI2',\
               'OIII_5007_FLUX','OIII_5007_FLUX_IVAR','OIII_5007_VSHIFT','OIII_5007_SIGMA',\
               'OI_6300_FLUX','OI_6300_FLUX_IVAR','OI_6300_VSHIFT','OI_6300_SIGMA',\
               'HALPHA_EW', 'HALPHA_FLUX','HALPHA_FLUX_IVAR','HALPHA_VSHIFT','HALPHA_SIGMA', \
               'HALPHA_BROAD_FLUX','HALPHA_BROAD_FLUX_IVAR','HALPHA_BROAD_VSHIFT','HALPHA_BROAD_SIGMA',\
               'NII_6584_FLUX','NII_6584_FLUX_IVAR','NII_6584_VSHIFT','NII_6584_SIGMA',\
               'SII_6716_FLUX','SII_6716_FLUX_IVAR','SII_6716_VSHIFT','SII_6716_SIGMA',\
               'SII_6731_FLUX','SII_6731_FLUX_IVAR','SII_6731_VSHIFT','SII_6731_SIGMA']
```

In [20]:
## SJ: downselected closer to the list for Extension 2 (e.g., removed the VSHIFT and unnecessary cols)
fsf_data_cols=['TARGETID','SURVEY','PROGRAM','LOGMSTAR',\
               'CIV_1549_FLUX','CIV_1549_FLUX_IVAR', 'CIV_1549_SIGMA',\
               'MGII_2796_FLUX','MGII_2796_FLUX_IVAR', 'MGII_2796_SIGMA',\
               'MGII_2803_FLUX','MGII_2803_FLUX_IVAR', 'MGII_2803_SIGMA',\
               'OII_3726_FLUX','OII_3726_FLUX_IVAR','OII_3726_EW','OII_3726_EW_IVAR',\
               'OII_3729_FLUX','OII_3729_FLUX_IVAR','OII_3729_EW','OII_3729_EW_IVAR',\
               'NEV_3426_FLUX','NEV_3426_FLUX_IVAR',\
               'HEII_4686_FLUX','HEII_4686_FLUX_IVAR',\
               'HBETA_EW','HBETA_EW_IVAR','HBETA_FLUX','HBETA_FLUX_IVAR',\
               'HBETA_BROAD_FLUX', 'HBETA_BROAD_FLUX_IVAR', 'HBETA_BROAD_SIGMA','HBETA_BROAD_CHI2',\
               'OIII_5007_FLUX','OIII_5007_FLUX_IVAR','OIII_5007_SIGMA',\
               'OI_6300_FLUX','OI_6300_FLUX_IVAR',\
               'HALPHA_EW','HALPHA_EW_IVAR', 'HALPHA_FLUX','HALPHA_FLUX_IVAR', \
               'HALPHA_BROAD_FLUX','HALPHA_BROAD_FLUX_IVAR','HALPHA_BROAD_VSHIFT','HALPHA_BROAD_SIGMA',\
               'NII_6584_FLUX','NII_6584_FLUX_IVAR',\
               'SII_6716_FLUX','SII_6716_FLUX_IVAR',\
               'SII_6731_FLUX','SII_6731_FLUX_IVAR']

fsf_meta_cols=['TARGETID','SURVEY','PROGRAM','PHOTSYS','LS_ID',\
               'FIBERFLUX_G','FIBERFLUX_R','FIBERFLUX_Z','FIBERTOTFLUX_G','FIBERTOTFLUX_R','FIBERTOTFLUX_Z',\
               'FLUX_G','FLUX_R','FLUX_Z','FLUX_W1','FLUX_W2','FLUX_W3','FLUX_W4',\
               'FLUX_IVAR_G','FLUX_IVAR_R','FLUX_IVAR_Z','FLUX_IVAR_W1','FLUX_IVAR_W2','FLUX_IVAR_W3','FLUX_IVAR_W4',\
               'EBV','MW_TRANSMISSION_G','MW_TRANSMISSION_R','MW_TRANSMISSION_Z',\
               'MW_TRANSMISSION_W1','MW_TRANSMISSION_W2','MW_TRANSMISSION_W3','MW_TRANSMISSION_W4']

In [21]:
%%time
## Directly only read the columns of interest (faster)
fsf_t_cut = Table(fitsio.read(fastspec_file, columns=fsf_data_cols, ext=1))
fsf_m_cut = Table(fitsio.read(fastspec_file, columns=fsf_meta_cols, ext=2))

CPU times: user 52.6 s, sys: 1min 1s, total: 1min 53s
Wall time: 5min 20s


In [22]:
%%time
## Really slow with join; is it faster to use hstack? Yes, much faster!
T_fsf_cut = hstack([fsf_m_cut, fsf_t_cut], join_type='inner')
#T_fsf_cut = join(fsf_m_cut, fsf_t_cut, join_type='left', keys=keys_for_join)


CPU times: user 907 ms, sys: 4.44 s, total: 5.34 s
Wall time: 5.45 s


In [23]:
T_fsf_cut.remove_columns(['TARGETID_2','SURVEY_2','PROGRAM_2'])

In [24]:
T_fsf_cut.rename_columns(['TARGETID_1','SURVEY_1','PROGRAM_1'],\
                         ['TARGETID','SURVEY','PROGRAM'])

In [25]:
# free memory
del fsf_m_cut
del fsf_t_cut

In [26]:
print(len(T_fsf_cut))
T_fsf_cut[:4]

17995820


TARGETID,SURVEY,PROGRAM,PHOTSYS,LS_ID,FIBERFLUX_G,FIBERFLUX_R,FIBERFLUX_Z,FIBERTOTFLUX_G,FIBERTOTFLUX_R,FIBERTOTFLUX_Z,FLUX_G,FLUX_R,FLUX_Z,FLUX_W1,FLUX_W2,FLUX_W3,FLUX_W4,FLUX_IVAR_G,FLUX_IVAR_R,FLUX_IVAR_Z,FLUX_IVAR_W1,FLUX_IVAR_W2,FLUX_IVAR_W3,FLUX_IVAR_W4,EBV,MW_TRANSMISSION_G,MW_TRANSMISSION_R,MW_TRANSMISSION_Z,MW_TRANSMISSION_W1,MW_TRANSMISSION_W2,MW_TRANSMISSION_W3,MW_TRANSMISSION_W4,LOGMSTAR,CIV_1549_FLUX,CIV_1549_FLUX_IVAR,CIV_1549_SIGMA,MGII_2796_FLUX,MGII_2796_FLUX_IVAR,MGII_2796_SIGMA,MGII_2803_FLUX,MGII_2803_FLUX_IVAR,MGII_2803_SIGMA,NEV_3426_FLUX,NEV_3426_FLUX_IVAR,OII_3726_FLUX,OII_3726_FLUX_IVAR,OII_3726_EW,OII_3726_EW_IVAR,OII_3729_FLUX,OII_3729_FLUX_IVAR,OII_3729_EW,OII_3729_EW_IVAR,HEII_4686_FLUX,HEII_4686_FLUX_IVAR,HBETA_FLUX,HBETA_FLUX_IVAR,HBETA_EW,HBETA_EW_IVAR,HBETA_BROAD_FLUX,HBETA_BROAD_FLUX_IVAR,HBETA_BROAD_SIGMA,HBETA_BROAD_CHI2,OIII_5007_FLUX,OIII_5007_FLUX_IVAR,OIII_5007_SIGMA,OI_6300_FLUX,OI_6300_FLUX_IVAR,HALPHA_FLUX,HALPHA_FLUX_IVAR,HALPHA_EW,HALPHA_EW_IVAR,HALPHA_BROAD_FLUX,HALPHA_BROAD_FLUX_IVAR,HALPHA_BROAD_VSHIFT,HALPHA_BROAD_SIGMA,NII_6584_FLUX,NII_6584_FLUX_IVAR,SII_6716_FLUX,SII_6716_FLUX_IVAR,SII_6731_FLUX,SII_6731_FLUX_IVAR
int64,str7,str6,str1,int64,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32
6448025174016,sv1,dark,S,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02702388,0.9231658,0.94756305,0.97030807,0.9954307,0.99719137,0.9994003,0.9997735,5.7503724,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.6238377,0.85704327,17.111118,0.007470975,1.807744,0.8405532,8.535847,0.017309776,0.309453,7.294379,1.4292774,9.880638,5.5711555,0.46141025,0.0,0.0,0.0,0.0,1.3321167,10.856768,19.122723,0.18862273,27.341253,4.604713,28.085308,28.084356,0.046584874,0.0,0.0,0.0,0.0,0.24049304,31.402851,0.8449386,19.053684,0.5540397,25.671543
6515536691200,sv1,dark,S,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01879585,0.9459128,0.96323067,0.9792538,0.99681973,0.9980457,0.9995829,0.99984246,10.368333,7.8016224,0.024368886,1541.6433,0.72361237,0.20298417,1725.48,0.29113925,0.20255274,1725.48,0.09566734,4.7798166,0.19582722,1.3057101,1.062753,0.043683283,3.706198,0.30292132,17.849749,0.008760508,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6521555517440,sv1,dark,S,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010616658,0.9690802,0.9790621,0.9882283,0.9982024,0.99889565,0.9997644,0.999911,5.365223,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.2538439,0.5783931,-18.110195,0.00035003977,1.0811018,0.6222537,3.2539685,0.057409715,0.82116187,4.4459853,0.5218625,6.4645667,4.451909,0.06359003,0.0,0.0,0.0,0.0,0.03933927,6.509375,22.468706,0.7771606,8.315614,1.8644084,25.459866,11.663673,0.16935621,0.0,0.0,0.0,0.0,0.2906037,27.160423,0.5697137,27.087254,0.25842628,30.182331
6536638234624,sv1,dark,S,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019745098,0.9432602,0.96141005,0.9782176,0.9966594,0.9979471,0.9995618,0.99983454,7.7580156,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.3897076,0.62566507,28.633286,0.00029584387,4.8107038,0.6180273,21.842768,0.007200804,0.0,0.0,2.0843449,4.329796,5.552319,0.36292842,0.0,0.0,0.0,0.0,3.316977,2.3238227,67.17602,0.36053726,8.516312,5.595137,12.43606,15.255598,0.51686275,0.0,0.0,0.0,0.0,0.3940954,7.620519,1.802883,8.351368,3.8377724e-08,2.1409955


In [27]:
%%time
T_qsom_zpixsum.add_index(['TARGETID','SURVEY','PROGRAM'])
T_fsf_cut.add_index(['TARGETID','SURVEY','PROGRAM'])

CPU times: user 4.15 s, sys: 3.96 s, total: 8.11 s
Wall time: 8.2 s


In [28]:
%%time
T = join(T_qsom_zpixsum, T_fsf_cut, join_type='left', keys=keys_for_join)
print(len(T))

18165695
CPU times: user 2min 43s, sys: 20.4 s, total: 3min 4s
Wall time: 3min 3s


In [29]:
print(len(T_fsf_cut))

17995820


In [31]:
del T_fsf_cut
del T_qsom_zpixsum

NameError: name 'T_fsf_cut' is not defined

## 5. Read Yaml

In [32]:
import importlib
importlib.reload(set_agn_masksDESI)
from set_agn_masksDESI import get_agn_maskbits
from set_agn_masksDESI import update_AGN_MASKBITS

AGN_MASKBITS, OPT_UV_TYPE, IR_TYPE = get_agn_maskbits('../py/agnmask.yaml')

In [33]:
AGN_MASKBITS

AGN_MASKBITS:
  - [AGN_ANY,          0, "any AGN classification is set"]
  - [RR,               1, "RR determines this to be a QSO from template fitting"]
  - [MGII,             2, "MgII afterburner detects broad line"]
  - [QN,               3, "Quasar Net reclassifies as a QSO"]
  - [QN_NEW_RR,        4, "Quasar Net prompts different RR redshift"]
  - [QN_BGS,           5, "Quasar Net reclassifies BGS target as a QSO"]
  - [QN_ELG,           6, "Quasar Net reclassifies ELG target as a QSO"]
  - [QN_VAR_WISE,      7, "Quasar Net reclassifies VAR_WISE_QSO target as a QSO"]
  - [BPT_ANY_SY,      10, "At least one BPT diagnostic indicates SEYFERT (robust AGN)"]
  - [BPT_ANY_AGN,     11, "At least one BPT diagnostic indicates SEYFERT, LINER or COMPOSITE"]
  - [BROAD_LINE,      12, "Lines with FWHM >=1200 km/s in Halpha, Hbeta, MgII and/or CIV line"]
  - [OPT_OTHER_AGN,   13, "Rest frame optical emission lines diagnostic not BPT (4000-10000 ang) indicate AGN"]
  - [UV,              14, "Re

In [34]:
OPT_UV_TYPE

OPT_UV_TYPE:
  - [NII_BPT,          0, "NII BPT diagnostic is available (update_AGNTYPE_NIIBPT)"]
  - [NII_SF,           1, "NII BPT Star-forming (update_AGNTYPE_NIIBPT)"]
  - [NII_COMP,         2, "NII BPT Composite (update_AGNTYPE_NIIBPT)"]
  - [NII_SY,           3, "NII BPT Seyfert (update_AGNTYPE_NIIBPT)"]
  - [NII_LINER,        4, "NII BPT LINER (update_AGNTYPE_NIIBPT)"]
  - [SII_BPT,          5, "SII BPT diagnostic is available (update_AGNTYPE_SIIBPT)"]
  - [SII_SF,           6, "SII BPT Star-forming (update_AGNTYPE_SIIBPT)"]
  - [SII_SY,           7, "SII BPT Seyfert (update_AGNTYPE_SIIBPT)"]
  - [SII_LINER,        8, "SII BPT LINER (update_AGNTYPE_SIIBPT)"]
  - [OI_BPT,           9, "OI BPT diagnostic is available (update_AGNTYPE_OIBPT)"]
  - [OI_SF,           10, "OI BPT Star-forming (update_AGNTYPE_OIBPT)"]
  - [OI_SY,           11, "OI BPT Seyfert (update_AGNTYPE_OIBPT)"]
  - [OI_LINER,        12, "OI BPT LINER (update_AGNTYPE_OIBPT)"]
  - [WHAN,            13, "WHAN is avai

In [35]:
IR_TYPE

IR_TYPE:
  - [WISE_W12,         0, "WISE W1 and W2 available (update_AGNTYPE_WISE_colors)"]
  - [WISE_W123,        1, "WISE W1, W2 and W3 available"]
  - [WISE_AGN_J11,     2, "WISE diagnostic Jarrett et al. 2011 is AGN (based on W1,W2,W3)"]
  - [WISE_SF_J11,      3, "WISE diagnostic Jarrett et al. 2011 is not an AGN (based on W1,W2,W3)"]
  - [WISE_AGN_S12,     4, "WISE diagnostic Stern et al. 2012 is AGN (based on W1,W2)"]
  - [WISE_SF_S12,      5, "WISE diagnostic Stern et al. 2012 is not an AGN (based on W1,W2)"]
  - [WISE_AGN_M12,     6, "WISE diagnostic Mateos et al. 2012 is AGN (based on W1,W2,W3)"]
  - [WISE_SF_M12,      7, "WISE diagnostic Mateos et al. 2012 is not an AGN (based on W1,W2,W3)"]
  - [WISE_AGN_A18,     8, "WISE diagnostic Assef et al. 2018 is AGN (based on W1,W2)"]
  - [WISE_SF_A18,      9, "WISE diagnostic Assef et al. 2018 is not an AGN (based on W1,W2)"]
  - [WISE_AGN_Y20,    10, "WISE diagnostic Yao et al. 2020 is AGN (based on W1,W2,W3)"]
  - [WISE_SF_Y20,   

## 6. Set QSO_MASKBITS part of AGN_MASKBITS

In [36]:
%%time
T = update_AGN_MASKBITS(T, AGN_MASKBITS, snr=3, snrOI=1, Kewley01=False, mask=None)

CPU times: user 29.5 s, sys: 14.8 s, total: 44.3 s
Wall time: 46.1 s


In [37]:
T.columns

<TableColumns names=('TARGETID','Z','ZERR','ZWARN','SPECTYPE','COADD_FIBERSTATUS','TARGET_RA','TARGET_DEC','MORPHTYPE','MASKBITS','COADD_NUMEXP','COADD_EXPTIME','TSNR2_LYA','TSNR2_QSO','Z_RR','Z_QN','C_LYA','C_CIV','C_CIII','C_MgII','C_Hbeta','C_Halpha','QSO_MASKBITS','SURVEY','PROGRAM','HEALPIX','OBJTYPE','CMX_TARGET','DESI_TARGET','BGS_TARGET','SCND_TARGET','SV1_DESI_TARGET','SV1_BGS_TARGET','SV1_SCND_TARGET','SV2_DESI_TARGET','SV2_BGS_TARGET','SV2_SCND_TARGET','SV3_DESI_TARGET','SV3_BGS_TARGET','SV3_SCND_TARGET','MIN_MJD','MAX_MJD','MEAN_MJD','TSNR2_LRG','MAIN_NSPEC','MAIN_PRIMARY','SV_NSPEC','SV_PRIMARY','ZCAT_NSPEC','ZCAT_PRIMARY','PHOTSYS','LS_ID','FIBERFLUX_G','FIBERFLUX_R','FIBERFLUX_Z','FIBERTOTFLUX_G','FIBERTOTFLUX_R','FIBERTOTFLUX_Z','FLUX_G','FLUX_R','FLUX_Z','FLUX_W1','FLUX_W2','FLUX_W3','FLUX_W4','FLUX_IVAR_G','FLUX_IVAR_R','FLUX_IVAR_Z','FLUX_IVAR_W1','FLUX_IVAR_W2','FLUX_IVAR_W3','FLUX_IVAR_W4','EBV','MW_TRANSMISSION_G','MW_TRANSMISSION_R','MW_TRANSMISSION_Z','MW_TRANSM

In [None]:
## Sanity check: are there several different values for the new AGN_MASKBITS column?
print(np.unique(T['AGN_MASKBITS']))

## 7. Set diagnostic bits 

In [38]:
from set_agn_masksDESI import *

In [39]:
%%time
## OPT_UV_TYPE
T = update_AGNTYPE_NIIBPT(T, OPT_UV_TYPE, snr=3, mask=None)
T = update_AGNTYPE_SIIBPT(T, OPT_UV_TYPE, snr=3, Kewley01=False, mask=None)
T = update_AGNTYPE_OIBPT(T, OPT_UV_TYPE, snr=3, snrOI=1, Kewley01=False, mask=None)
T = update_AGNTYPE_WHAN(T, OPT_UV_TYPE, snr=3, mask=None)
T = update_AGNTYPE_BLUE(T, OPT_UV_TYPE, snr=3, snrOII=3, mask=None)
T = update_AGNTYPE_MEX(T, OPT_UV_TYPE, snr=3, mask=None)
T = update_AGNTYPE_KEX(T, OPT_UV_TYPE, snr=3, mask=None)
T = update_AGNTYPE_HeII(T, OPT_UV_TYPE, snr=3, mask=None)
T = update_AGNTYPE_NeV(T, OPT_UV_TYPE, snr=2.5, mask=None)
## IR_TYPE
T = update_AGNTYPE_WISE_colors(T, IR_TYPE, snr=3, mask=None)

CPU times: user 1min 1s, sys: 9.76 s, total: 1min 11s
Wall time: 1min 13s


In [None]:
print(np.unique(T['OPT_UV_TYPE']))

In [None]:
## Example case to check that some number are set as [NII]-BPT LINERs
is_nii_liner = (T['OPT_UV_TYPE'] & OPT_UV_TYPE.NII_SF !=0)
len(T[is_nii_liner])

In [None]:
T.columns

## 8. Join multiwavelength surveys

### NOTE: this should be done OUTSIDE of this workflow and could be run once on all targets (observed and unobserved)

In [None]:
# sdss, X-ray, IR

## 9. Save catalog

In [40]:
## Units for Extension 1
cols_ext1 = ['TARGET_RA', 'TARGET_DEC', 'MEAN_MJD', 'MIN_MJD', 'MAX_MJD', 'COADD_EXPTIME']
units_ext1 = ['deg', 'deg', 'd', 'd', 'd', 's']
units_dict1 = dict(zip(cols_ext1,units_ext1))

units_dict1

{'TARGET_RA': 'deg',
 'TARGET_DEC': 'deg',
 'MEAN_MJD': 'd',
 'MIN_MJD': 'd',
 'MAX_MJD': 'd',
 'COADD_EXPTIME': 's'}

In [41]:
%%time
T_final_cols = T[final_cols]

CPU times: user 743 ms, sys: 2.76 s, total: 3.5 s
Wall time: 3.55 s


In [42]:
%%time
T_sec_ext = T[ext2_cols]

CPU times: user 804 ms, sys: 3.69 s, total: 4.5 s
Wall time: 4.58 s


In [44]:
## Annotate_table only works for extension 1 (will need to re-write the file for ext 2)
T_final_cols = annotate_table(T_final_cols, units_dict1)

INFO:annotate.py:306:annotate_table: Column 'TARGETID' not found in units argument.
INFO:annotate.py:306:annotate_table: Column 'SURVEY' not found in units argument.
INFO:annotate.py:306:annotate_table: Column 'PROGRAM' not found in units argument.
INFO:annotate.py:306:annotate_table: Column 'HEALPIX' not found in units argument.
INFO:annotate.py:306:annotate_table: Column 'Z' not found in units argument.
INFO:annotate.py:306:annotate_table: Column 'ZERR' not found in units argument.
INFO:annotate.py:306:annotate_table: Column 'ZWARN' not found in units argument.
INFO:annotate.py:306:annotate_table: Column 'SPECTYPE' not found in units argument.
INFO:annotate.py:306:annotate_table: Column 'AGN_MASKBITS' not found in units argument.
INFO:annotate.py:306:annotate_table: Column 'OPT_UV_TYPE' not found in units argument.
INFO:annotate.py:306:annotate_table: Column 'IR_TYPE' not found in units argument.
INFO:annotate.py:306:annotate_table: Column 'COADD_FIBERSTATUS' not found in units argum

In [43]:
del T

In [None]:
%%time

## PROBLEM WITH DR1: Kernel keeps dying

## Write to a temporary path then copy over to an official version number 
## and don't allow to overwrite 
# e.g., /global/cfs/cdirs/desi/science/gqp/agncatalog/edr/v1.0/agnqso_sum.fits

## This will become the official catalog: "agnqso_sum_v1.x.fits" with an updated Version number
primary_hdu = fits.PrimaryHDU()
table1_hdu = fits.BinTableHDU(T_final_cols)
table2_hdu = fits.BinTableHDU(T_sec_ext)
table1_hdu.name = 'AGNCAT'
table2_hdu.name = 'AUXDATA'
hdulist = fits.HDUList([primary_hdu, table1_hdu, table2_hdu])
hdulist.writeto(dir_for_cat+'agnqso_sum_dr1_v0.9.fits', overwrite=True)
#hdulist.writeto(dir_for_cat+'agnqso_sum_v1.8.fits', overwrite=False)
#hdulist.writeto(dir_for_cat+'agnqso_sum.fits', overwrite=True)

## 10. Add units to existing catalog (for EDR)

In [None]:
## Input file
infile = dir_for_cat+'agnqso_sum_v1.8.fits'

## Test: write a different file in case there are issues
outfile = dir_for_cat+'agnqso_sum_v1.9.fits'

In [None]:
%%time
#T = Table.read(infile)

## Catalog HDU List
agn_hdul = fits.open(infile, format='fits')

# Load the catalog into Astropy tables
T = Table(agn_hdul[1].data)
T2 = Table(agn_hdul[2].data)

In [None]:
## Extension 2
cols_readme = 'cols_units_ext2.txt'
tcols_ext2 = Table.read(cols_readme, format='ascii.basic')
tcols_ext2.remove_columns(['col0', 'Format', '_1'])
tcols_ext2[:3]

In [None]:
T2_cols = Table()
T2_cols['Name'] = T2.colnames
print(len(T2_cols))
T2_cols[:3]

In [None]:
T2_cols_info = join(T2_cols, tcols_ext2)

In [None]:
print(len(T2_cols_info))
T2_cols_info

In [None]:
## Convert to dictionary to use annotate_table
cols_ext2 = list(T2_cols_info['Name'].data)
units_ext2 = list(T2_cols_info['Units'].data)
units_dict2 = dict(zip(cols_ext2,units_ext2))

#units_dict2

In [None]:
T2 = annotate_table(T2, units_dict2, validate=True)

In [None]:
%%time
T2.write('test.fits',overwrite=True)

In [None]:
T2

In [None]:
T = annotate_table(T, units_dict1)

In [None]:
primary_hdu = fits.PrimaryHDU()
table1_hdu = fits.BinTableHDU(T)
table2_hdu = fits.BinTableHDU(T2)
table1_hdu.name = 'AGNCAT'
table2_hdu.name = 'AUXDATA'
hdulist = fits.HDUList([primary_hdu, table1_hdu, table2_hdu])
hdulist.writeto(outfile, overwrite=False)

## Check the file

In [None]:
hdul = fits.open(outfile)

In [None]:
hdul[1].header

In [None]:
hdul[2].header

In [None]:
outfile = dir_for_cat+'agnqso_sum_v1.9.fits'

annotate_fits(outfile, 2, outfile, units=units_dict2, overwrite=True)
