# Gaap pipeline 

Use this notebook to run `gaap` photometry on Merian-reduced data.

Make sure that you are in the right environment! When activating the jupyter notebook:

        module load anaconda3/2022.5
        . /scratch/gpfs/am2907/Merian/gaap/lambo/scripts/setup_env_w40.sh
        jupyter notebook

In [1]:
import lsst.daf.butler as dafButler
import numpy as np
import glob
import os, sys
sys.path.append(os.path.join(os.getenv('LAMBO_HOME'), 'lambo/scripts/'))
from hsc_gaap.deploy_gaap_array import deploy_training_job
from hsc_gaap.check_gaap_run import checkRun
from hsc_gaap.find_patches_to_reduce import * 
from hsc_gaap.compile_catalogs import compileCatalogs

%load_ext autoreload
%autoreload 2

Overriding default configuration file with /scratch/gpfs/HSC/LSST/stack_20230302/conda/envs/lsst-scipipe-4.0.1/share/eups/Linux64/dustmaps_cachedata/g41a3ec361e+ac198e9f13/config/.dustmapsrc


---
# Step 1: What patches do we need to reduce?

We want to identify patches that have the necessary merian data products for `gaap` processing and have not already been processed. Patches need to have:

- deepCoadd_ref
- deepCoadd_meas
- deepCoadd_scarletModelData
- deepCoadd_calexp


Get a list of all Merian tracts with reduced data, and we will search through them to see which patches fit our criteria:

In [13]:
repo = '/scratch/gpfs/am2907/Merian/gaap'

In [14]:
output_collection = "DECam/runs/merian/dr1_wide"
data_type = "deepCoadd_calexp"
skymap = "hsc_rings_v1"
butler = dafButler.Butler('/projects/MERIAN/repo/', collections=output_collection, skymap=skymap)

In [15]:

patches = np.array([[data_id['tract'], data_id["patch"]] for data_id in butler.registry.queryDataIds (['tract','patch'], datasets=data_type, 
                                                 collections=output_collection, skymap=skymap)])
patches = patches[patches[:, 0].argsort()]
tracts, idx = np.unique(patches[:,0], return_index=True) 
patches_by_tract = np.split(patches[:,1] ,idx[1:])

In [17]:
tracts_n708 = []
for tract in tracts:
    patches = findReducedPatches(tract)
    if len(patches) > 0:
        tracts_n708.append(tract)

tracts_n708 = np.array(tracts_n708)
print(f"{len(tracts_n708)} tracts with necessary data products")

285 tracts with necessary data products


Save a csv with the info if you want:

In [8]:
# saveMerianReducedPatchList(tracts, os.path.join(repo, "reducedPatches_N708.csv"))

Now find patches that haven't yet been `gaap` processed:

In [18]:
tracts_n708_nogaap = []
for tract in tracts:
    patches_mer  = findReducedPatches(tract)
    patches_gaap = findGaapReducedPatches(tract, repo=repo)
    if len(set(patches_mer) - set(patches_gaap)) > 0:
        tracts_n708_nogaap.append(tract)
        
print(f"{len(tracts_n708_nogaap)} tracts to be reduced")

114 tracts to be reduced


In [None]:
# saveGaapReducedPatchList(tracts, os.path.join(repo, "GaapReduced.csv"))

In [12]:
# saveGaapNotReducedPatchList(tracts, os.path.join(repo, "notGaapReduced.csv"), notionformat=False)

Saved file to /scratch/gpfs/am2907/Merian/gaap/notGaapReduced.csv.


If you want, you can see how many patches need to be reduced for each tract:

In [19]:
npatches = [len(list(set(findReducedPatches(tract))- set(findGaapReducedPatches(tract, repo=repo)))) 
            for tract in tracts_n708_nogaap]
print(f"{sum(npatches)} patches to be reduced")

5543 patches to be reduced


In [20]:
print(f"It will take ~ {sum(npatches)/60:.1f} hours to download HSC images for {sum(npatches)} patches")
print(f"It will take ~ {sum(npatches)*.6/1000:.1f} TBs to download HSC images for {sum(npatches)} patches")
print(f"Once the data has been downloaded, it will take ~ {sum(npatches)/20/2:.1f} hours to run gaap on {sum(npatches)} patches")
print(f"It will take ~ {sum(npatches)*.212/1000:.1f} TBs to save the gaap catalogs for {sum(npatches)} patches")


It will take ~ 92.4 hours to download HSC images for 5543 patches
It will take ~ 3.3 TBs to download HSC images for 5543 patches
Once the data has been downloaded, it will take ~ 138.6 hours to run gaap on 5543 patches
It will take ~ 1.2 TBs to save the gaap catalogs for 5543 patches


In [17]:
for tract, npatch in zip(tracts_n708_nogaap, npatches):
    if npatch > 0:
        print (f'TRACT:{tract}, {npatch}')

TRACT:9618, 8
TRACT:9619, 48
TRACT:9620, 81
TRACT:9621, 29
TRACT:9697, 25
TRACT:9698, 80
TRACT:9699, 80
TRACT:9700, 79
TRACT:9701, 79
TRACT:9702, 73
TRACT:9703, 8
TRACT:9707, 27
TRACT:9708, 72
TRACT:9709, 77
TRACT:9710, 76
TRACT:9711, 76
TRACT:9712, 42
TRACT:9713, 42
TRACT:9714, 14
TRACT:9798, 13
TRACT:9799, 74
TRACT:9800, 81
TRACT:9801, 81
TRACT:9802, 81
TRACT:9803, 81
TRACT:9804, 81
TRACT:9805, 81
TRACT:9806, 81
TRACT:9807, 81
TRACT:9808, 80
TRACT:9809, 79
TRACT:9810, 79
TRACT:9811, 75
TRACT:9812, 4
TRACT:9814, 2
TRACT:9815, 81
TRACT:9816, 80
TRACT:9817, 80
TRACT:9818, 80
TRACT:9819, 81
TRACT:9820, 54
TRACT:9821, 2
TRACT:9828, 37
TRACT:9833, 81
TRACT:9837, 69
TRACT:9838, 64
TRACT:9839, 5
TRACT:9862, 21
TRACT:9863, 9
TRACT:9939, 4
TRACT:9940, 50
TRACT:9941, 81
TRACT:9942, 81
TRACT:9943, 81
TRACT:9944, 64
TRACT:9945, 2
TRACT:9949, 9
TRACT:9950, 41
TRACT:9951, 41
TRACT:9952, 41
TRACT:9953, 41
TRACT:10040, 5
TRACT:10041, 65
TRACT:10042, 81
TRACT:10043, 81
TRACT:10044, 79
TRACT:10045, 79


---
# Step 2: Download the data

We need to download the HSC data for all of the tracts we need to reduce. *Be warned, this takes a while and uses a lot of storage.*

It is recommended to run the following in a bash screen because depending on how much data you need to download, it can take many hours.

The following will download images for tract 9813 to `/scratch/gpfs/am2907/Merian/gaap/S20A/deepCoadd_calexp/9813` and the blendedness catalogs to `/scratch/gpfs/am2907/Merian/gaap/S20A/gaapTable/9813`:
- Unless `--only_merian=False`, this will only download the patches that have been reduced by Merian.
- You can download all of the Merian-reduced data in one go if you set `--alltracts=True`. Be careful with this, because it is ****lots**** of data!

    screen -L -S downloadtract    
    
    cd /scratch/gpfs/am2907/Merian/gaap
    . lambo/scripts/setup_env_w40.sh
    python3 lambo/scripts/hsc_gaap/download_S20A.py --tract=9813 --outdir="/scratch/gpfs/am2907/Merian/gaap/"


To exit screen do `ctrl a d` and to reattach do `screen -r downloaddata`

---
# Step 3: Make slurm scripts and submit

Write one slurm script for each tract – each of which is a job array with one job for each patch. 
You can submit the scripts as you write them if you want, but beware that there is an upper limit for the number of jobs you can submit at once to the queue.

In [21]:
for tract in tracts_n708_nogaap[:1]:
    deploy_training_job(tract, filter_jobs=5,
                        python_file='lambo/scripts/hsc_gaap/run_gaap.py',
                        name='gaap', email="am2907@princeton.edu", outname = None, 
                        repo='/scratch/gpfs/am2907/Merian/gaap', scriptdir="/scratch/gpfs/am2907/Merian/gaap/", 
                        submit=False, fixpatches=False)

The gaap reduction will save one catalog for each patch to (for example):

        /scratch/gpfs/am2907/Merian/gaap/S20A/gaapTable/9813/0,0/objectTable_9813_0,0_S20A.fits

---
# Step 4: Check on it!

You can check on the logs while the jobs are running to check for any glaring problems:
- `logs/gaapPhot_array_9813_0.o` 
- `logs/gaap_9813_0.log`

One the jobs are done running (for a given tract), you can check how things went. 

In [None]:
for tract in tracts_n708_nogaap:
    problems = checkRun(tract)

In [25]:
checkRun(9617)

TRACT: 9617
NO PROBLEMS



array([], dtype=float64)

You might get issues like "Failed for 3 bands" - this could be because HSC images don't exist for all bands. So it might not be an issue you can fix!

---
# Step 4: Merge catalogs

If everything is looking good, you can merge the patch catalogs into a tract-level catalog. 

It's recommended to run this step in a screen in terminal, because it takes some time!

But here is an example:

In [28]:
compileCatalogs([9617], repo, alltracts=False, rewrite=False)

COMPILING CATALOG FOR TRACT 9617 WITH 13 PATCHES
COMPILED TABLE OF 279484 ROWS and 69 COLUMNS
WROTE TABLE TO /scratch/gpfs/am2907/Merian/gaap/S20A/gaapTable/9617/objectTable_9617_S20A.fits


        python3 lambo/scripts/hsc_gaap/compile_catalogs.py --tracts=="[9327,9328,9329,9813,9812]"

This will save a catalog to (for example):
        
        /scratch/gpfs/am2907/Merian/gaap/S20A/gaapTable/9813/objectTable_9813_S20A.fits

If you want to change the columns that are used for the compiled catalog, edit these files:

        lambo/scripts/hsc_gaap/keep_table_columns_gaap.txt
        lambo/scripts/hsc_gaap/keep_table_columns_merian.txt

And you're all done!