# Matching catalogs based on membership (simple)
Matching two catalogs based on membseship using a configuration dictionary

In [None]:
%load_ext autoreload
%autoreload 2

## ClCatalogs
Given some input data

In [None]:
import numpy as np
from astropy.table import Table
input1 = Table({'ID': ['CL0a', 'CL1a', 'CL2a', 'CL3a', 'CL4a']})
input1['MASS'] = 1e14*np.arange(1, 6)*10
input2 = Table({'ID': ['CL0b', 'CL1b', 'CL2b', 'CL3b']})
input2['MASS'] = 1e14*np.arange(1, 5)*10
display(input1)
display(input2)
input1_mem = Table(
    {'ID':[
        'MEM0', 'MEM1', 'MEM2', 'MEM3', 'MEM4',
        'MEM5', 'MEM6', 'MEM7', 'MEM8', 'MEM9',
        'MEM10', 'MEM11', 'MEM12', 'MEM13', 'MEM14'],
     'ID_CLUSTER': [
         'CL0a', 'CL0a', 'CL0a', 'CL0a', 'CL0a',
         'CL1a', 'CL1a', 'CL1a', 'CL1a', 'CL2a',
         'CL2a', 'CL2a', 'CL3a', 'CL3a', 'CL4a'],
    })
input2_mem = Table(
    {'ID':[
        'MEM0', 'MEM1', 'MEM2', 'MEM3', 'MEM4',
        'MEM5', 'MEM6', 'MEM7', 'MEM8', 'MEM9',
        'MEM10', 'MEM11', 'MEM12', 'MEM13'],
     'ID_CLUSTER': [
         'CL3b', 'CL0b', 'CL0b', 'CL0b', 'CL0b',
         'CL1b', 'CL1b', 'CL1b', 'CL1b', 'CL2b',
         'CL2b', 'CL2b', 'CL3b', 'CL3b'],
    })
input1_mem['RA'] = np.arange(len(input1_mem))*10.0
input2_mem['RA'] = np.arange(len(input2_mem))*10.0
input1_mem['DEC'] = 0.0
input2_mem['DEC'] = 0.0
input1_mem['Z'] = 0.1
input2_mem['Z'] = 0.1
input1_mem['PMEM'] = 1.0
input2_mem['PMEM'] = 1.0
display(input1_mem)
display(input2_mem)

Create two `ClCatalog` objects, they have the same properties of `astropy` tables with additional functionality. For the membership matching, the main columns to be included are:
- `id` - must correspond to `id_cluster` in the cluster member catalog.
- `mass` (or mass proxy) - necessary for proxity matching if `shared_member_fraction` used as preference criteria for unique matches, default use.


All of the columns can be added when creating the `ClCatalog` object passing them as keys:
```
cat = ClCatalog('Cat', ra=[0, 1])
```
or can also be added afterwards:
```
cat = ClCatalog('Cat')
cat['ra'] = [0, 1]
```

In [None]:
from clevar.catalog import ClCatalog
c1 = ClCatalog('Cat1', id=input1['ID'], mass=input1['MASS'])
c2 = ClCatalog('Cat2', id=input2['ID'], mass=input2['MASS'])

# Format for nice display
c1['mass'].info.format = '.2e'
c2['mass'].info.format = '.2e'

display(c1)
display(c2)

The members can be added to the cluster object using the `add_members` function.
It has a similar instanciating format of a `ClCatalog` object, where the columns are added by keyword arguments (the key `id_cluster` is always necessary and must correspond to `id` in the main cluster catalog).

In [None]:
c1.add_members(id=input1_mem['ID'], id_cluster=input1_mem['ID_CLUSTER'],
               ra=input1_mem['RA'], dec=input1_mem['DEC'], pmem=input1_mem['PMEM'])
c2.add_members(id=input2_mem['ID'], id_cluster=input2_mem['ID_CLUSTER'],
               ra=input2_mem['RA'], dec=input2_mem['DEC'], pmem=input2_mem['PMEM'])

display(c1.members)
display(c2.members)

The catalogs can also be read directly from files, for more details see <a href='catalogs.ipynb'>catalogs.ipynb</a>.

## Matching
Import the `MembershipMatch` and create a object for matching

In [None]:
from clevar.match import MembershipMatch
mt = MembershipMatch()

Prepare the configuration. The main values are:

- `type`: Type of matching to be considered. Can be a simple match of ClCatalog1->ClCatalog2 (`cat1`), ClCatalog2->ClCatalog1 (`cat2`) or cross matching.
- `preference`: In cases where there are multiple matched, how the best candidate will be chosen.
- `minimum_share_fraction1`: Minimum share fraction of catalog 1 to consider in matches (default=`0`).
- `minimum_share_fraction2`: Minimum share fraction of catalog 2 to consider in matches (default=`0`).
- `match_members`: Match the members catalogs (default=`True`), necessary if not already made.
- `match_members_kwargs`: dictionary of arguments to match members, needed if `match_members=True`. Keys are:
  - `method`(str): Method for matching. Options are `id` or `angular_distance`.
  - `radius`(str, None): For `method='angular_distance'`. Radius for matching, with format `'value unit'` (ex: `1 arcsec`, `1 Mpc`).
  - `cosmo`(clevar.Cosmology, None): For `method='angular_distance'`. Cosmology object for when radius has physical units.
- `match_members_save`: saves file with matched members (default=`False`).
- `match_members_load`: load matched members (default=`False`), if `True` skips matching (and save) of members.
- `match_members_file`: file to save matching of members, needed if `match_members_save` or `match_members_load` is `True`.
- `shared_members_fill`: Adds shared members dicts and nmem to mt_input in catalogs (default=`True`), necessary if not already made.
- `shared_members_save`: saves files with shared members (default=`False`).
- `shared_members_load`: load files with shared members (default=`False`), if `True` skips matching (and save) of members and fill (and save) of shared members.
- `shared_members_file`: Prefix of file names to save shared members, needed if `shared_members_save` or `shared_members_load` is `True`.
- `verbose`: Print result for individual matches (default=`True`).

In [None]:
match_config = {
    'type': 'cross', # options are cross, cat1, cat2
    'preference': 'shared_member_fraction', # other options are more_massive, angular_proximity or redshift_proximity
    'minimum_share_fraction': 0,
    'match_members_kwargs': {'method':'id'},
}

Once the configuration is prepared, the whole process can be done with one call:

In [None]:
%%time
mt.match_from_config(c1, c2, match_config)

This will fill the matching columns in the catalogs:
- `mt_multi_self`: Multiple matches found
- `mt_multi_other`: Multiple matches found by the other catalog
- `mt_self`: Best candidate found
- `mt_other`: Best candidate found by the other catalog
- `mt_frac_self`: Fraction of shared members with the best candidate found
- `mt_frac_other`: Fraction of shared members by the best candidate found by the other catalog, relative to the other catalog
- `mt_cross`: Best candidate found in both directions


If `pmem` is present in the members catalogs, the shared fractions are computed by:
<h1><center>$\frac{\sum_{shared\;members}Pmem_i}{\sum_{cluster\;members}Pmem_i}$</center></h1>

In [None]:
display(c1)
display(c2)

## Save and Load
The results of the matching can easily be saved and load using `ClEvaR` tools:

In [None]:
mt.save_matches(c1, c2, out_dir='temp', overwrite=True)

In [None]:
mt.load_matches(c1, c2, out_dir='temp')
display(c1)
display(c2)

## Getting Matched Pairs

There is functionality inbuilt in `clevar` to plot some results of the matching, such as:
- Recovery rates
- Distances (anguar and redshift) of cluster centers
- Scaling relations (mass, redshift, ...)
for those cases, check the <a href='match_metrics.ipynb'>match_metrics.ipynb</a> and <a href='match_metrics_advanced.ipynb'>match_metrics_advanced.ipynb</a> notebooks.

If those do not provide your needs, you can get directly the matched pairs of clusters: 

In [None]:
from clevar.match import get_matched_pairs
mt1, mt2 = get_matched_pairs(c1, c2, 'cross')

These will be catalogs with the corresponding matched pairs:

In [None]:
import pylab as plt
plt.scatter(mt1['mass'], mt2['mass'])

### Members of matched pairs

The members also carry the information on the matched clusters.
The column `match` shows to which clusters of the other catalog this member also belongs.
The column `in_mt_sample` says if those clusters are presented in the matched sample:

In [None]:
mt1.members

## Outputing matched catalogs

To save the current catalogs, you can use the `write` inbuilt function:

In [None]:
c1.write('c1_temp.fits', overwrite=True)

This will allow you to save the catalog with its current labels and matching information.

### Outputing matching information to original catalogs

Assuming your input data came from initial files,
`clevar` also provides functions create output files 
that combine all the information on them with the matching results.

To add the matching information to an input catalog, use:

```
from clevar.match import output_catalog_with_matching
output_catalog_with_matching('input_catalog.fits', 'output_catalog.fits', c1)
```

- note: `input_catalog.fits` must have the same number of rows that `c1`.


To create a matched catalog containig all columns of both input catalogs, use:

```
from clevar.match import output_matched_catalog
output_matched_catalog('input_catalog1.fits', 'input_catalog2.fits',
    'output_catalog.fits', c1, c2, matching_type='cross')
```

where `matching_type` must be `cross`, `cat1` or `cat2`.

- note: `input_catalog1.fits` must have the same number of rows that `c1` (and the same for `c2`).