# Matching catalogs based on membership (detailed)
Here we show the specific steps of matching two catalogs based on proximity

In [None]:
%load_ext autoreload
%autoreload 2

## ClCatalogs
Given some input data

In [None]:
import numpy as np
from astropy.table import Table

input1 = Table({"ID": ["CL0a", "CL1a", "CL2a", "CL3a", "CL4a"]})
input1["MASS"] = 1e14 * np.arange(1, 6) * 10
input2 = Table({"ID": ["CL0b", "CL1b", "CL2b", "CL3b"]})
input2["MASS"] = 1e14 * np.arange(1, 5) * 10
display(input1)
display(input2)
# Create members
members_list = np.array(
    [
        # MEM_ID  CL1_ID  CL2_ID
        ["MEM0", "CL0a", "CL3b"],
        ["MEM1", "CL0a", "CL0b"],
        ["MEM2", "CL0a", "CL0b"],
        ["MEM3", "CL0a", "CL0b"],
        ["MEM4", "CL0a", "CL0b"],
        ["MEM5", "CL1a", "CL1b"],
        ["MEM6", "CL1a", "CL1b"],
        ["MEM7", "CL1a", "CL1b"],
        ["MEM8", "CL1a", "CL1b"],
        ["MEM9", "CL2a", "CL2b"],
        ["MEM10", "CL2a", "CL2b"],
        ["MEM11", "CL2a", "CL2b"],
        ["MEM12", "CL3a", "CL3b"],
        ["MEM13", "CL3a", "CL3b"],
        ["MEM14", "CL4a", "None"],
    ]
)

input1_mem = Table({"ID": members_list[:, 0], "ID_CLUSTER": members_list[:, 1]})
input2_mem = Table({"ID": members_list[:-1, 0], "ID_CLUSTER": members_list[:-1, 2]})
input1_mem["RA"] = np.arange(len(input1_mem)) * 10.0
input2_mem["RA"] = np.arange(len(input2_mem)) * 10.0
input1_mem["DEC"] = 0.0
input2_mem["DEC"] = 0.0
input1_mem["Z"] = 0.1
input2_mem["Z"] = 0.1
input1_mem["PMEM"] = 1.0
input2_mem["PMEM"] = 1.0
display(input1_mem)
display(input2_mem)

Create two `ClCatalog` objects, they have the same properties of `astropy` tables with additional functionality. You can tag the main properties of the catalog, or have columns with those names (see `catalogs.ipynb` for detailts). For the membership matching, the main columns to be included are:
- `id` - must correspond to `id_cluster` in the cluster member catalog.
- `mass` (or mass proxy) - necessary for proxity matching if `shared_member_fraction` used as preference criteria for unique matches, default use.


All of the columns can be added when creating the `ClCatalog` object passing them as keys:
```
cat = ClCatalog('Cat', ra=[0, 1])
```
or passing the whole data table:

```
cat = ClCatalog('Cat', data={'ra': [0, 1]})
```
and can also be added afterwards:
```
cat = ClCatalog('Cat')
cat['ra'] = [0, 1]
```

In [None]:
from clevar.catalog import ClCatalog

tags = {"id": "ID", "mass": "MASS"}
c1 = ClCatalog("Cat1", data=input1, tags=tags)
c2 = ClCatalog("Cat2", data=input2, tags=tags)

# Format for nice display
c1["mass"].info.format = ".2e"
c2["mass"].info.format = ".2e"

display(c1)
display(c2)

The members can be added to the cluster object using the `add_members` function.
It has a similar instanciating format of a `ClCatalog` object, where the data can be added with tags, or the columns are added by keyword arguments (the tag/key `id_cluster` is always necessary and must correspond to `id` in the main cluster catalog).

In [None]:
mem_tags = {"id": "ID", "id_cluster": "ID_CLUSTER"}
c1.add_members(data=input1_mem, tags=mem_tags)
c2.add_members(data=input2_mem, tags=mem_tags)

display(c1.members)
display(c2.members)

The catalogs can also be read directly from files, for more details see <a href='catalogs.ipynb'>catalogs.ipynb</a>.

## Matching
Import the `MembershipMatch` and create a object for matching

In [None]:
from clevar.match import MembershipMatch

mt = MembershipMatch()

### Prepare the matching object
Before matching the clusters it is necessary to match the members catalogs and then filling the clusters with information about of the shared members.

The matching of members can be done by `id` if both member catalogs share the same `id`s or by angular proximity.






The first step is to prepare each catalog with the matching configuration:

- `delta_z`: Defines redshift window for matching. The possible values are:
  - `'cat'`: uses redshift properties of the catalog
  - `'spline.filename'`: interpolates data in `'filename'` assuming (z, zmin, zmax) format
  - `float`: uses `delta_z*(1+z)`
  - `None`: does not use z
- `match_radius`: Radius of the catalog to be used in the matching. If `'cat'` uses the radius in the catalog, else must be in format `'value unit'`. (ex: `'1 arcsec'`, `'1 Mpc'`)

In this case, because one of the configuraion radius has physical units, we also need a cosmology (`cosmo`) object to convert it to angular size (this is done internally).

To match the members by `id`, just run the function:

In [None]:
%%time
mt.match_members(c1.members, c2.members, method="id")

To match the members by angular proximity you also have to provide:
  - `radius`(`str`, `None`). Radius for matching, with format `'value unit'` (ex: `1 arcsec`, `1 Mpc`).
  - `cosmo`(`clevar.Cosmology`, `None`). Cosmology object for when radius has physical units.
Then call the same function with these arguments

In [None]:
from clevar.cosmology import AstroPyCosmology

mt.match_members(
    c1.members, c2.members, method="angular_distance", radius="0.1 kpc", cosmo=AstroPyCosmology()
)

This function adds a `matched_mems` attribute to your matching object (`mt.matched_mems` in this case) that contains the indices of the matched members.
This attribute can be saved and loaded so you don't have to redo this step.
Just use the functions:

In [None]:
mt.save_matched_members(filename="mem_mt.txt", overwrite=False)
mt.load_matched_members(filename="mem_mt.txt")

Now we fill the catalogs with the information regarding the matched members. In this step, each cluster catalog will have a `ClData` table in its `mt_input` attibute with the number of members in each cluster (`nmem`) and a dictionary containing the number of shaded objects with the clusters of the other catalog (`shared_mems`).

If `pmem` is provided to the members, these quantities are computed as:

<center>$nmem=\sum_{cluster\;members} Pmem_i$</center>

<center>$shared\_mems=\sum_{shared\;members} Pmem_i$</center>

In [None]:
mt.fill_shared_members(c1, c2)

In [None]:
display(c1.mt_input)
display(c2.mt_input)

Again, these results can be saved and loaded so you don't have to redo this step.
Just use the functions:

In [None]:
mt.save_shared_members(c1, c2, fileprefix="mem_share")
mt.load_shared_members(c1, c2, fileprefix="mem_share")

Once this step is done, you can actually start matching the clusters.

### Multiple matching
The next step is to match the catalogs and store all candidates that pass the matching criteria.

In [None]:
%%time
mt.multiple(c1, c2)
mt.multiple(c2, c1)

This will fill the `mt_multi_self` and `mt_multi_other` columns:

In [None]:
display(c1)
display(c2)

### Unique matching
Once all candidates are stored in each catalog, we can find the best candidates. You can also pass the argument:
- `preference`: In cases where there are multiple matched, how the best candidate will be chosen. Options are: `'more_massive'`, `'angular_proximity'`, `'redshift_proximity'`, `'shared_member_fraction'` (default value).


In [None]:
%%time
mt.unique(c1, c2, preference="shared_member_fraction")
mt.unique(c2, c1, preference="shared_member_fraction")

This will fill the matching columns:
- `mt_self`: Best candidate found
- `mt_other`: Best candidate found by the other catalog
- `mt_frac_self`: Fraction of shared members with the best candidate found
- `mt_frac_other`: Fraction of shared members by the best candidate found by the other catalog, relative to the other catalog


If `pmem` is present in the members catalogs, the shared fractions are computed by:
<h1><center>$\frac{\sum_{shared\;members}Pmem_i}{\sum_{cluster\;members}Pmem_i}$</center></h1>

In [None]:
display(c1)
display(c2)

### Cross matching
If you want to make sure the same pair was found in both directions:

In [None]:
c1.cross_match()
c2.cross_match()

This will fill the `mt_cross` column:

In [None]:
display(c1)
display(c2)

## Save and Load
The results of the matching can easily be saved and load using `ClEvaR` tools:

In [None]:
mt.save_matches(c1, c2, out_dir="temp", overwrite=True)

In [None]:
mt.load_matches(c1, c2, out_dir="temp")
display(c1)
display(c2)

## Getting Matched Pairs

There is functionality inbuilt in `clevar` to plot some results of the matching, such as:
- Recovery rates
- Distances (anguar and redshift) of cluster centers
- Scaling relations (mass, redshift, ...)
for those cases, check the <a href='match_metrics.ipynb'>match_metrics.ipynb</a> and <a href='match_metrics_advanced.ipynb'>match_metrics_advanced.ipynb</a> notebooks.

If those do not provide your needs, you can get directly the matched pairs of clusters: 

In [None]:
from clevar.match import get_matched_pairs

mt1, mt2 = get_matched_pairs(c1, c2, "cross")

These will be catalogs with the corresponding matched pairs:

In [None]:
import pylab as plt

plt.scatter(mt1["mass"], mt2["mass"])

### Members of matched pairs

The members also carry the information on the matched clusters.
The column `match` shows to which clusters of the other catalog this member also belongs.
The column `in_mt_sample` says if those clusters are presented in the matched sample:

In [None]:
mt1.members

## Outputing matched catalogs

To save the current catalogs, you can use the `write` inbuilt function:

In [None]:
c1.write("c1_temp.fits", overwrite=True)

This will allow you to save the catalog with its current labels and matching information.

### Outputing matching information to original catalogs

Assuming your input data came from initial files,
`clevar` also provides functions create output files 
that combine all the information on them with the matching results.

To add the matching information to an input catalog, use:

```
from clevar.match import output_catalog_with_matching
output_catalog_with_matching('input_catalog.fits', 'output_catalog.fits', c1)
```

- note: `input_catalog.fits` must have the same number of rows that `c1`.


To create a matched catalog containig all columns of both input catalogs, use:

```
from clevar.match import output_matched_catalog
output_matched_catalog('input_catalog1.fits', 'input_catalog2.fits',
    'output_catalog.fits', c1, c2, matching_type='cross')
```

where `matching_type` must be `cross`, `cat1` or `cat2`.

- note: `input_catalog1.fits` must have the same number of rows that `c1` (and the same for `c2`).