# Matching catalogs based on proximity (detailed)
Here we show the specific steps of matching two catalogs based on proximity

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#ClCatalogs" data-toc-modified-id="ClCatalogs-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>ClCatalogs</a></span></li><li><span><a href="#Matching" data-toc-modified-id="Matching-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Matching</a></span><ul class="toc-item"><li><span><a href="#Prepare-the-catalogs" data-toc-modified-id="Prepare-the-catalogs-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Prepare the catalogs</a></span></li><li><span><a href="#Multiple-matching" data-toc-modified-id="Multiple-matching-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Multiple matching</a></span></li><li><span><a href="#Unique-matching" data-toc-modified-id="Unique-matching-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Unique matching</a></span></li><li><span><a href="#Cross-matching" data-toc-modified-id="Cross-matching-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Cross matching</a></span></li></ul></li><li><span><a href="#Save-and-Load" data-toc-modified-id="Save-and-Load-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Save and Load</a></span></li><li><span><a href="#Getting-Matched-Pairs" data-toc-modified-id="Getting-Matched-Pairs-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Getting Matched Pairs</a></span></li><li><span><a href="#Outputing-matched-catalogs" data-toc-modified-id="Outputing-matched-catalogs-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Outputing matched catalogs</a></span><ul class="toc-item"><li><span><a href="#Outputing-matching-information-to-original-catalogs" data-toc-modified-id="Outputing-matching-information-to-original-catalogs-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Outputing matching information to original catalogs</a></span></li></ul></li></ul></div>

In [None]:
%load_ext autoreload
%autoreload 2

## ClCatalogs
Given some input data

In [None]:
import numpy as np
from astropy.table import Table

input1 = Table(
    {
        "ID": [f"CL{i}" for i in range(5)],
        "RA": [0.0, 0.0001, 0.00011, 25, 20],
        "DEC": [0.0, 0.0, 0.0, 0.0, 0.0],
        "Z": [0.2, 0.3, 0.25, 0.4, 0.35],
        "MASS": [10**13.5, 10**13.4, 10**13.3, 10**13.8, 10**14],
        "RADIUS_ARCMIN": [1.0, 1.0, 1.0, 1.0, 1.0],
    }
)
input2 = Table(
    {
        "ID": ["CL0", "CL1", "CL2", "CL3"],
        "RA": [0.0, 0.0001, 0.00011, 25],
        "DEC": [0.0, 0, 0, 0],
        "Z": [0.3, 0.2, 0.25, 0.4],
        "MASS": [10**13.3, 10**13.4, 10**13.5, 10**13.8],
        "RADIUS_ARCMIN": [1.0, 1.0, 1.0, 1.0],
    }
)
display(input1)
display(input2)

Create two `ClCatalog` objects, they have the same properties of `astropy` tables with additional functionality. You can tag the main properties of the catalog, or have columns with those names (see `catalogs.ipynb` for detailts). For the proximity matching, the main tags/columns to be included are:
- `id` - if not included, one will be assigned
- `ra` (in degrees) - necessary
- `dec` (in degrees) - necessary
- `z` - necessary if used as matching criteria or for angular to physical convertion
- `mass` (or mass proxy) - necessary if used as preference criteria for unique matches
- `radius` - necessary if used as a criteria of matching (also requires `radius_unit` to be passed)

In [None]:
from clevar.catalog import ClCatalog

tags = {"id": "ID", "ra": "RA", "dec": "DEC", "z": "Z", "mass": "MASS"}
c1 = ClCatalog("Cat1", data=input1, tags=tags)
c2 = ClCatalog("Cat2", data=input2, tags=tags)
# Format for nice display
for c in ("ra", "dec", "z"):
    c1[c].info.format = ".2f"
    c2[c].info.format = ".2f"
for c in ("mass",):
    c1[c].info.format = ".2e"
    c2[c].info.format = ".2e"
display(c1)
display(c2)

The `ClCatalog` object can also be read directly from a file,
for details, see <a href='catalogs.ipynb'>catalogs.ipynb</a>.

## Matching
Import the `ProximityMatch` and create a object for matching

In [None]:
from clevar.match import ProximityMatch

mt = ProximityMatch()

### Prepare the catalogs
The first step is to prepare each catalog with the matching configuration:

- `delta_z`: Defines redshift window for matching. The possible values are:
  - `'cat'`: uses redshift properties of the catalog
  - `'spline.filename'`: interpolates data in `'filename'` assuming (z, zmin, zmax) format
  - `float`: uses `delta_z*(1+z)`
  - `None`: does not use z
- `match_radius`: Radius of the catalog to be used in the matching. If `'cat'` uses the radius in the catalog, else must be in format `'value unit'`. (ex: `'1 arcsec'`, `'1 Mpc'`)

In this case, because one of the configuraion radius has physical units, we also need a cosmology (`cosmo`) object to convert it to angular size (this is done internally).

In [None]:
from clevar.cosmology import AstroPyCosmology

mt_config1 = {"delta_z": 0.2, "match_radius": "1 mpc", "cosmo": AstroPyCosmology()}
mt_config2 = {"delta_z": 0.2, "match_radius": "1 arcsec"}
mt.prep_cat_for_match(c1, **mt_config1)
mt.prep_cat_for_match(c2, **mt_config2)

This will add values to the `mt_input` attribute of the catalogs:

In [None]:
display(c1.mt_input)
display(c2.mt_input)

### Multiple matching
The next step is to match the catalogs and store all candidates that pass the matching criteria. You can also pass the argument:
- `radius_selection`: Given a pair of clusters, which radius will be used for the matching.

In [None]:
%%time
mt.multiple(c1, c2)
mt.multiple(c2, c1)

This will fill the `mt_multi_self` and `mt_multi_other` columns:

In [None]:
display(c1)
display(c2)

### Unique matching
Once all candidates are stored in each catalog, we can find the best candidates. You can also pass the argument:
- `preference`: In cases where there are multiple matched, how the best candidate will be chosen.

In [None]:
%%time
mt.unique(c1, c2, preference="angular_proximity")
mt.unique(c2, c1, preference="angular_proximity")

This will fill the `mt_self` and `mt_other` columns:

In [None]:
display(c1)
display(c2)

### Cross matching
If you want to make sure the same pair was found in both directions:

In [None]:
c1.cross_match()
c2.cross_match()

This will fill the `mt_cross` column:

In [None]:
display(c1)
display(c2)

## Save and Load
The results of the matching can easily be saved and load using `ClEvaR` tools:

In [None]:
mt.save_matches(c1, c2, out_dir="temp", overwrite=True)

In [None]:
mt.load_matches(c1, c2, out_dir="temp")
display(c1)
display(c2)

## Getting Matched Pairs

There is functionality inbuilt in `clevar` to plot some results of the matching, such as:
- Recovery rates
- Distances (anguar and redshift) of cluster centers
- Scaling relations (mass, redshift, ...)
for those cases, check the <a href='match_metrics.ipynb'>match_metrics.ipynb</a> and <a href='match_metrics_advanced.ipynb'>match_metrics_advanced.ipynb</a> notebooks.

If those do not provide your needs, you can get directly the matched pairs of clusters: 

In [None]:
from clevar.match import get_matched_pairs

mt1, mt2 = get_matched_pairs(c1, c2, "cross")

These will be catalogs with the corresponding matched pairs:

In [None]:
import pylab as plt

plt.scatter(mt1["mass"], mt2["mass"])

## Outputing matched catalogs

To save the current catalogs, you can use the `write` inbuilt function:

In [None]:
c1.write("c1_temp.fits", overwrite=True)

This will allow you to save the catalog with its current labels and matching information.

### Outputing matching information to original catalogs

Assuming your input data came from initial files,
`clevar` also provides functions create output files 
that combine all the information on them with the matching results.

To add the matching information to an input catalog, use:

```
from clevar.match import output_catalog_with_matching
output_catalog_with_matching('input_catalog.fits', 'output_catalog.fits', c1)
```

- note: `input_catalog.fits` must have the same number of rows that `c1`.


To create a matched catalog containig all columns of both input catalogs, use:

```
from clevar.match import output_matched_catalog
output_matched_catalog('input_catalog1.fits', 'input_catalog2.fits',
    'output_catalog.fits', c1, c2, matching_type='cross')
```

where `matching_type` must be `cross`, `cat1` or `cat2`.

- note: `input_catalog1.fits` must have the same number of rows that `c1` (and the same for `c2`).