## Clean the input shear catalogs

This is a simple utility notebook showing how to split and clean the input Shear catalogs.

The will take input Shear catalogs that are just the concatanation of the per-patch metadetect processing and 1) do the de-duplication at the "tract" and "patch" levels, (but not the "cell" level unless `clean` is True).

If `clean` is false, this will leave sources in a 1" buffer around the central part of each patch, if it is true it will leave no buffer.

*Note this notebook is run on unsplit/uncleaned input file taken directly from concatanating per-patch metadetect files which are not provided as part of the test data*

#### Standard imports

In [None]:
import numpy as np
import tables_io
from hpmcm import shear_utils

#### Set up the configuration

In [None]:
DATADIR = "sv38"                                                # Input data directory
shear_value_strs = ['0p0025', '0p005', '0p01', '0p02', '0p04']  # Applied shears as a string
shear_values = [0.0025, 0.005, 0.01, 0.02, 0.04]                # Decimal versions of applied shear      
cat_types = ['wmom', 'gauss', 'pgauss']                         # which object characterization to use 
tracts = [10463, 10705]                                         # Tracts to loop over
clean = True                                                    # Fully clean patches for de-duplication

#### Loop over inputs and run the split and clean function

In [None]:
for tract_ in tracts:
    for shear_st_, shear_ in zip(shear_value_strs, shear_values):
        for cat_type_ in cat_types:
            outFile = f"{DATADIR}/shear_{cat_type_}_{shear_st_}.parq"
            print(f"Running {outFile} {tract_}")
            shear_utils.splitByTypeAndClean(outFile, tract_, shear=shear_, catType=cat_type_, clean=clean)