# Final stage of HELP data processing

This notebook collates the final output files ready for writing to csv for ingestion to a VO server. At the bottom of the notebook we also summarise the pipeline products which are processed on a given field. This are generated using the dmu32 meta_main.yml files which contain links to the XID+, CIGALE and photo-z catalogues which feed in to the final catalogues for publishing.

Summary of notebook:

- Take DR1 masterlist suffixes from overview table
- Find dmu32 full table names and write to a file
- Create summary of all the data products per field using the dmu32 meta_main.yml files

In [2]:
from  herschelhelp_internal  import git_version
print("This notebook was run with herschelhelp_internal version: \n{}".format(git_version()))
import datetime
print("This notebook was executed on: \n{}".format(datetime.datetime.now()))

This notebook was run with herschelhelp_internal version: 
1407877 (Mon Feb 4 12:56:29 2019 +0000)
This notebook was executed on: 
2020-10-29 15:27:07.370035


In [3]:
from astropy.table import Table, Column
from astropy import units as u
import numpy as np
import glob
from pymoc import MOC
import hashlib
from herschelhelp_internal.masterlist import find_last_ml_suffix

import yaml

import os
import time

import humanfriendly

The examples.directory rcparam was deprecated in Matplotlib 3.0 and will be removed in 3.2. In the future, examples will be found relative to the 'datapath' directory.
  self[key] = other[key]
The savefig.frameon rcparam was deprecated in Matplotlib 3.1 and will be removed in 3.3.
  self[key] = other[key]
The text.latex.unicode rcparam was deprecated in Matplotlib 3.0 and will be removed in 3.2.
  self[key] = other[key]
The verbose.fileo rcparam was deprecated in Matplotlib 3.1 and will be removed in 3.3.
  self[key] = other[key]
The verbose.level rcparam was deprecated in Matplotlib 3.1 and will be removed in 3.3.
  self[key] = other[key]


In [4]:
TODAY = os.environ.get('SUFFIX', time.strftime("_%Y%m%d"))

## The definition of HELP PDR1
Here we take the DR1 definition from the dmu32 yaml files which are the definition of the final and official files. 

In [5]:
yaml_files = glob.glob('./*/meta_main.yml')

In [6]:
field_yamls = [yaml.load(open(f, 'r')) for f in yaml_files]

  """Entry point for launching an IPython kernel.


In [7]:
field_yamls[0]

{'field': 'COSMOS',
 'region': 'dmu_products/dmu2/dmu2_field_coverages/COSMOS_MOC.fits',
 'surveys': ['CANDELS-3D-HST',
  'CFHTLS',
  'DECaLS',
  'HSC-DEEP',
  'HSC-UDEEP',
  'KIDS',
  'PanSTARRS-3SS',
  'UKIDSS-LAS',
  'CFHT-WIRDS',
  'COSMOS2015'],
 'masterlist': 'dmu_products/dmu1/dmu1_ml_COSMOS/data/master_catalogue_cosmos_20180516.fits',
 'depths': 'dmu_products/dmu1/dmu1_ml_COSMOS/data/depths_cosmos_20180516.fits',
 'flags': 'dmu_products/dmu6/dmu6_v_COSMOS/data/cosmos_20180516_flags.fits',
 'xid': ['dmu_products/dmu26/dmu26_XID+MIPS_COSMOS/data/dmu26_XID+MIPS_COSMOS_20170213.fits',
  'dmu_products/dmu26/dmu26_XID+PACS_COSMOS/data/dmu26_XID+PACS_COSMOS_20170303.fits',
  'dmu_products/dmu26/dmu26_XID+SPIRE_COSMOS/data/dmu26_XID+SPIRE_COSMOS_20161129.fits'],
 'photoz': 'dmu_products/dmu24/dmu24_COSMOS/data/COSMOS2015-HELP_selected_20160613_photoz_v1.0.fits',
 'cigale': 'dmu_products/dmu28/dmu28_COSMOS/data/zphot/final_results.fits',
 'cigale_ldust_prediction': 'dmu_products/dmu28/d

In [8]:
GAVO_FOLDER = '/mnt/hedam/data_vo/'
stilts_command = 'stilts tpipe {in_file} omode=out ofmt=csv out={GAVO_FOLDER}{out_file}'

final_data = open('help_to_vo.sh', 'w+')
for y in field_yamls:
    print(y['field'])
    final_help_product = y['final'].replace('dmu_products', '..')
    cigale_input = y['cigale']

    if os.path.exists(final_help_product):
        print(final_help_product)
    
        #Test with Cigale input files
        final_data.write(stilts_command.format(
            in_file=final_help_product, 
            GAVO_FOLDER=GAVO_FOLDER, 
            out_file='herschelhelp/main/{}.csv \n'.format(
                final_help_product.split('/')[-1].replace('.fits', '.csv')
            )
        ))
        
    else:
        final_data.write('# No data for {} \n'.format(y['field']))
        
    #final_data.write('./dmu32_{}/data/{}_{}.fits'.format(field[0], field[0], field[1]))
    
final_data.close()

COSMOS
ELAIS-S1
GAMA-09
Lockman-SWIRE
ELAIS-N2
Herschel-Stripe-82
SA13
GAMA-12
GAMA-15
AKARI-SEP
SGP
xFLS
CDFS-SWIRE
SSDF
XMM-LSS
SPIRE-NEP
AKARI-NEP
ELAIS-N1
HATLAS-NGP
XMM-13hr
Bootes
HDF-N
../dmu32/dmu32_HDF-N/data/HDF-N_20180427_cigale.fits
EGS


The out put of this notebook is a shell script which will write all the fits files to csv files in the vo folder

In [19]:
depths_to_vo = open('depths_to_vo.sh', 'w+')
for y in field_yamls:
    final_depth_product = y['depths'].replace('dmu_products', '..')
    

    if os.path.exists(final_depth_product):
        print(final_depth_product)
    
        #Test with Cigale input files
        depths_to_vo.write(stilts_command.format(
            in_file=final_depth_product, 
            GAVO_FOLDER=GAVO_FOLDER, 
            out_file='depth/{}.csv \n'.format(final_depth_product.split('/')[-1].replace('.fits', '.csv'))
        ))

        
    else:
        depths_to_vo.write('# No depths for {} \n'.format(y['field']))
        
    #final_data.write('./dmu32_{}/data/{}_{}.fits'.format(field[0], field[0], field[1]))
    
depths_to_vo.close()

## Summarise completeness of HELP data sets

Here we get information about what is available on each field to summarise the data products available per field. We take the cigale, xid+ and photo-z filenames from the per field meta_main.yml files here and check they are there and how large they are. This then given a summary of all the data present.

In [13]:
dr1 = Table()
dr1['field'] = [y['field'] for y in field_yamls]
dr1.sort('field')

In [8]:
fields_info = yaml.load(open("../dmu2/meta_main.yml", 'r'))

In [42]:
dr1['objects']             =np.full(len(dr1), 0, dtype=int)
dr1['dr1_file']            =np.full(len(dr1), 0, dtype=np.dtype('U250'))
dr1['dr1_file_hash']       =np.full(len(dr1), 0, dtype=np.dtype('U250'))
dr1['area_sq_degrees']     =np.full(len(dr1), 0, dtype='float64')
dr1['file_size_bytes']     =np.full(len(dr1), 0, dtype=int)
dr1['file_size_readable']  =np.full(len(dr1), 0, dtype=np.dtype('U250'))
dr1['xid_objects']         =np.full(len(dr1), 0, dtype=int)
dr1['photoz_objects']      =np.full(len(dr1), 0, dtype=int)
dr1['cigale_objects']      =np.full(len(dr1), 0, dtype=int)

In [43]:

def file_as_bytes(file):
    with file:
        return file.read()



In [44]:
for y in field_yamls:
    print(y['field'] + ':')
    this_row = dr1['field'] == y['field']
    final = y['final'].replace('dmu_products/', '../')
    moc = y['region'].replace('dmu_products/', '../')
    try:
        cat = Table.read(final)
        
        
        dr1['objects'][this_row]            = len(cat)
        dr1['dr1_file'][this_row]           = y['final']
        dr1['dr1_file_hash'][this_row]      = hashlib.md5(file_as_bytes(open(final, 'rb'))).hexdigest()
        dr1['area_sq_degrees'][this_row]    = help_moc = MOC(filename=moc).area_sq_deg
        size = os.stat(final).st_size
        dr1['file_size_bytes'][this_row]    = size

        dr1['file_size_readable'][this_row] = humanfriendly.format_size(size)
        dr1['xid_objects'][this_row]        = np.sum(cat['f_spire_500']>0)
        dr1['photoz_objects'][this_row]     =  np.sum(cat['redshift']>0)
        dr1['cigale_objects'][this_row]     = np.sum(cat['cigale_sfr']>0)
    
    except:
        print('Problem reading {}'.format(y['final']))

COSMOS:
Problem reading dmu_products/dmu32/dmu32_COSMOS/data/COSMOS_20180516_cigale.fits
ELAIS-S1:
Problem reading dmu_products/dmu32/dmu32_ELAIS-S1/data/ELAIS-S1_20180416_cigale.fits
GAMA-09:
Problem reading dmu_products/dmu32/dmu32_AKARI-NEP/data/AKARI-NEP_20180215_cigale.fits
Lockman-SWIRE:
Problem reading dmu_products/dmu32/dmu32_AKARI-NEP/data/AKARI-NEP_20180215_cigale.fits
ELAIS-N2:
Problem reading dmu_products/dmu32/dmu32_ELAIS-N2/data/ELAIS-N2_20180218_cigale.fits
Herschel-Stripe-82:
Problem reading dmu_products/dmu32/dmu32_Herschel-Stripe-82/data/Herschel-Stripe-82_20180307_cigale.fits
SA13:
Problem reading dmu_products/dmu32/dmu32_SA13/data/SA13.fits
GAMA-12:
Problem reading dmu_products/dmu32/dmu32_GAMA-12/data/GAMA-12_20180218.fits
GAMA-15:
Problem reading dmu_products/dmu32/dmu32_GAMA-15/data/GAMA-15_20180213.fits
AKARI-SEP:
Problem reading dmu_products/dmu32/dmu32_AKARI-SEP/data/AKARI-SEP.fits
SGP:
Problem reading dmu_products/dmu32/dmu32_SGP/data/SGP_20180221_cigale.fits

In [45]:
final

'../dmu32/dmu32_EGS/data/EGS.fits'

In [46]:
dr1.show_in_notebook()

idx,field,objects,dr1_file,dr1_file_hash,area_sq_degrees,file_size_bytes,file_size_readable,xid_objects,photoz_objects,cigale_objects
0,AKARI-NEP,0,0,0,0.0,0,0,0,0,0
1,AKARI-SEP,0,0,0,0.0,0,0,0,0,0
2,Bootes,0,0,0,0.0,0,0,0,0,0
3,CDFS-SWIRE,0,0,0,0.0,0,0,0,0,0
4,COSMOS,0,0,0,0.0,0,0,0,0,0
5,EGS,0,0,0,0.0,0,0,0,0,0
6,ELAIS-N1,0,0,0,0.0,0,0,0,0,0
7,ELAIS-N2,0,0,0,0.0,0,0,0,0,0
8,ELAIS-S1,0,0,0,0.0,0,0,0,0,0
9,GAMA-09,0,0,0,0.0,0,0,0,0,0


In [53]:
print("""Totals:
Area:              {} Square degrees
Objects :          {}
XID+ objects :     {} (= {} percent)
Redshifts :        {} (= {} percent)
CIGALE objects :   {} (= {} percent)
Final tables size: {}
""".format(
    round(np.sum(dr1['area_sq_degrees'])),
    np.sum(dr1['objects']),
    np.sum(dr1['xid_objects']), round(100*np.sum(dr1['xid_objects'])/np.sum(dr1['objects'])),
    np.sum(dr1['photoz_objects']), round(100*np.sum(dr1['photoz_objects'])/np.sum(dr1['objects'])),
    np.sum(dr1['cigale_objects']), round(100*np.sum(dr1['cigale_objects'])/np.sum(dr1['objects'])),
    humanfriendly.format_size(np.sum(dr1['file_size_bytes']))
))

Totals:
Area:              1 Square degrees
Objects :          130679
XID+ objects :     834 (= 1 percent)
Redshifts :        7435 (= 6 percent)
CIGALE objects :   0 (= 0 percent)
Final tables size: 190.63 MB



In [13]:
dr1.write('dr1_data_products_overview{}.csv'.format(TODAY), overwrite=True)