# Galaxy Cluster Catalogs
The main object for galaxy cluster catalogs is `ClCatalog`, it has same properties of `astropy` tables, with additional functionality.

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#ClCatalog-attributes" data-toc-modified-id="ClCatalog-attributes-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>ClCatalog attributes<a id="cat"></a></a></span></li><li><span><a href="#Creating-a-catalog" data-toc-modified-id="Creating-a-catalog-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Creating a catalog<a id="creating"></a></a></span><ul class="toc-item"><li><span><a href="#From-columns" data-toc-modified-id="From-columns-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>From columns<a id="from_cols"></a></a></span></li><li><span><a href="#From-data-table" data-toc-modified-id="From-data-table-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>From data table</a></span></li><li><span><a href="#Create-a-catalog-from-fits-files" data-toc-modified-id="Create-a-catalog-from-fits-files-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Create a catalog from <code>fits</code> files<a id="creating_fits"></a></a></span></li></ul></li><li><span><a href="#ClCatalog-necessary-columns" data-toc-modified-id="ClCatalog-necessary-columns-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>ClCatalog necessary columns</a></span><ul class="toc-item"><li><span><a href="#Important-inputs-of-ClCatalog" data-toc-modified-id="Important-inputs-of-ClCatalog-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Important inputs of <code>ClCatalog</code><a id="clcat_input"></a></a></span></li><li><span><a href="#Reserved-keyword-arguments" data-toc-modified-id="Reserved-keyword-arguments-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Reserved keyword arguments<a id="clcat_input_special"></a></a></span></li><li><span><a href="#Catalog-lables" data-toc-modified-id="Catalog-lables-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Catalog lables</a></span></li><li><span><a href="#Catalog-mt_input" data-toc-modified-id="Catalog-mt_input-3.4"><span class="toc-item-num">3.4&nbsp;&nbsp;</span>Catalog mt_input</a></span></li></ul></li><li><span><a href="#Saving-catalogs" data-toc-modified-id="Saving-catalogs-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Saving catalogs<a id="saving"></a></a></span></li><li><span><a href="#Accessing-catalog-data" data-toc-modified-id="Accessing-catalog-data-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Accessing catalog data<a id="data"></a></a></span></li><li><span><a href="#Inbuilt-function-of-catalogs" data-toc-modified-id="Inbuilt-function-of-catalogs-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Inbuilt function of catalogs<a id="funcs"></a></a></span></li><li><span><a href="#Adding-members-to-cluster-catalogs" data-toc-modified-id="Adding-members-to-cluster-catalogs-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Adding members to cluster catalogs<a id="memcat"></a></a></span><ul class="toc-item"><li><span><a href="#Read-members-from-fits-files" data-toc-modified-id="Read-members-from-fits-files-7.1"><span class="toc-item-num">7.1&nbsp;&nbsp;</span>Read members from <code>fits</code> files<a id="memcat_fits"></a></a></span></li><li><span><a href="#Important-inputs-of-members-catalog" data-toc-modified-id="Important-inputs-of-members-catalog-7.2"><span class="toc-item-num">7.2&nbsp;&nbsp;</span>Important inputs of members catalog<a id="memcat_input"></a></a></span></li><li><span><a href="#Reserved-keyword-arguments" data-toc-modified-id="Reserved-keyword-arguments-7.3"><span class="toc-item-num">7.3&nbsp;&nbsp;</span>Reserved keyword arguments<a id="memcat_input_special"></a></a></span></li><li><span><a href="#Saving-members" data-toc-modified-id="Saving-members-7.4"><span class="toc-item-num">7.4&nbsp;&nbsp;</span>Saving members<a id="memcat_saving"></a></a></span></li><li><span><a href="#Memory-consuption" data-toc-modified-id="Memory-consuption-7.5"><span class="toc-item-num">7.5&nbsp;&nbsp;</span>Memory consuption<a id="memcat_memory"></a></a></span></li></ul></li></ul></div>

In [None]:
%load_ext autoreload
%autoreload 2
from IPython.display import HTML


## ClCatalog attributes<a id='cat'/>

The `ClCatalog` has the following internal attributes:
- `name`: ClCatalog name
- `data`: Table with main catalog data (ex: id, ra, dec, z) and matching data (mt_self, mt_other, mt_cross, mt_multi_self, mt_multi_other)
- `tags`: Dictionary that tells which are the default columns to be used
- `mt_input`: Table containing the necessary inputs for the match (added by Match objects)
- `size`: Number of objects in the catalog
- `id_dict`: Dictionary of indicies given the object id
- `labels`: Labels of data columns for plots
- `members`: Members of clusters (optional)
- `leftover_members`: Galaxies in the input members not hosted by the cluster catalog (optional)

## Creating a catalog<a id='creating'/>
The catalog can be created by passing individual columns or a whole data table. Below we show how each case can be used.

In [None]:
from clevar import ClCatalog

### From columns<a id='from_cols'/>
To create a catalog fom columns, you have to pass the name as the initial argument and the data columns for the table as keyword arguments:

In [None]:
cat = ClCatalog('cluster', id=['c1', 'c2'], mass=[1e13, 1e14])
cat['mass'].info.format = '.2e' # Format for nice display
cat

You can also pass a `tags` dictionary as input if you want your catalog to have names that are not default for `ClEvaR` use:

In [None]:
cat = ClCatalog('cluster', ID_CLUSTER=['c1', 'c2'], M200=[1e13, 1e14],
                tags={'id':'ID_CLUSTER', 'mass':'M200'})
cat['mass'].info.format = '.2e' # Format for nice display
cat

Almost all keyword arguments will become columns of the catalog (see exeptions in [Important inputs of `ClCatalog`](#clcat_input)):

In [None]:
cat = ClCatalog('test name', id=['c1', 'c2'], test_column=[1, 2],
                other=[True, False], third=[None, []])
cat

### From data table
You can also create a `ClCatalog` passing directly a full data table:

In [None]:
from astropy.table import Table
ap_table = Table([['c1', 'c2'],[1e13, 1e14]], names=['id', 'mass'])
cat = ClCatalog('cluster', data=ap_table)
cat['mass'].info.format = '.2e' # Format for nice display
cat

You can also pass a tags dictionary as input if you want your catalog to have names that are not default for `ClEvaR` use:

In [None]:
from astropy.table import Table
ap_table = Table([['c1', 'c2'],[1e13, 1e14]], names=['ID_CLUSTER', 'M200'])
cat = ClCatalog('cluster', data=ap_table, tags={'id':'ID_CLUSTER', 'mass':'M200'})
cat['mass'].info.format = '.2e' # Format for nice display
cat

You can also pass a dictionary or a `numpy` array with names:

In [None]:
cat = ClCatalog('cluster', data={'id':['c1', 'c2'], 'mass':[1e13, 1e14]})
cat['mass'].info.format = '.2e' # Format for nice display
cat

In [None]:
import numpy as np
np_table = np.array([('c1', 1e13),('c2', 1e14)],
                    dtype=[('id', 'U10'), ('mass', 'f4')])
cat = ClCatalog('cluster', data=np_table)
cat['mass'].info.format = '.2e' # Format for nice display
cat

### Create a catalog from `fits` files<a id='creating_fits'/>
The catalogs objects can also be read directly from file, by passing the fits file as the first argument, the catalog name as the second, and the `tag` argument listing the main columns to be read:

In [None]:
cat = ClCatalog.read('../demo/cat1.fits', 'my cluster',
                     tags={'id':'ID', 'mass':'MASS'})
cat['mass'].info.format = '.2e' # Format for nice display
cat

If you want to read all columns in the `.fits` file, set the argument `full=True`.

In [None]:
cat = ClCatalog.read('../demo/cat1.fits', 'my cluster', full=True,
                     tags={'id':'ID', 'mass':'MASS'})
cat['mass'].info.format = '.2e' # Format for nice display
cat

## ClCatalog necessary columns
There are a few columns that will aways be present on `ClCatalog` objects, and are added when not provided.
For instance, the matching columns (with prefix `mt_`):

In [None]:
cat = ClCatalog('cluster', id=['c1', 'c2'], mass=[1e13, 1e14])
cat['mass'].info.format = '.2e' # Format for nice display
cat

All catalogs have an `id` column. If it is not included in the input, one will be created:

In [None]:
cat = ClCatalog('cluster', mass=[1e13, 1e14])
cat['mass'].info.format = '.2e' # Format for nice display
cat

Each cluster must have an unique `id`. Repetitions will have an suffix `_r#` added:

In [None]:
cat = ClCatalog('cluster', id=['cluster', 'cluster'])
cat

### Important inputs of `ClCatalog`<a id='clcat_input'/>

As shown above, `ClCatalog` can have any column in its main data table.
There are a few key columns that must exist (or be tagged) to be used for matching:

- `id` - necessary in membership matching (must correspond to `id_cluster` in the cluster member catalog).
- `ra` (in degrees) - necessary for proxity matching.
- `dec` (in degrees) - necessary for proxity matching.
- `z` - necessary for proxity matching if used as matching criteria (or for angular to physical convertion).
- `mass` (or mass proxy) - necessary for proxity matching if `shared_member_fraction` used as preference criteria for unique matches (default use in membership matching).
- `radius` - necessary for proxity matching if used as a criteria of matching (also requires `radius_unit` to be passed)

### Reserved keyword arguments<a id='clcat_input_special'/>

There is some keyword arguments that have a fixed meaning and do not become columns in the cluster data table:

- `radius_unit`: can be in angular units (`radians`, `degrees`, `arcmin`, `arcsec`) or physical units (`Mpc`, `kpc`, `pc`) or can enven be given by mass overdensity units (`m200b`, `m500c`) and are case insensitive. In the proximity matching the radius is converted to angular distances (degrees).
- `data`: Data table to be added to the catalog.
- `tags`: Dictionary that tags the important columns in the catalog.
- `labels`: Dictionary with labels of data columns to be used in plots.
- `members`: Members of clusters, see [cluster members](#memcat) section for details.
- `members_warning`: Warn if the members catalog contains galaxies not hosted by the cluster catalog.
- `mt_input`: Table containing the necessary inputs for the match. This attribute is usually added during the matching process, but it can be passed in the `ClCatalog` construction.

### Catalog lables
The catalogs have a `label` attribute that is used for plots. If it is not provided as argument, a default value is assigned:

In [None]:
cat = ClCatalog('cluster', id=['c1', 'c2'], mass=[1e13, 1e14])
cat.labels

In [None]:
cat = ClCatalog('cluster', id=['c1', 'c2'], mass=[1e13, 1e14],
                labels={'id':'cluster ID', 'mass':'cluster M_200'})
cat.labels

### Catalog mt_input
Here are some examples of information being added to `mt_input` after the catalog creation. In the proximity matching, it will add an angular distance and min/max redshift when `delta_z` is not `None`:

In [None]:
from clevar.match import ProximityMatch
mt = ProximityMatch()

In [None]:
cat = ClCatalog('Cat',id=['c1', 'c2'],  radius=[0.01, 0.02], radius_unit='radians')
mt.prep_cat_for_match(cat, delta_z=None, match_radius='cat')
cat.mt_input['ang']

This information is also show directly when displaing the catalog:

In [None]:
cat = ClCatalog('Cat',id=['c1', 'c2'],  radius=[0.01, 0.02], radius_unit='degrees')
mt.prep_cat_for_match(cat, delta_z=None, match_radius='cat')
cat

Using physical units (requires a cosmology):

In [None]:
from clevar.cosmology import AstroPyCosmology
cosmo = AstroPyCosmology()

display(HTML('<h3>Radius in Mpc</h3>'))
cat = ClCatalog('Cat',id=['c1', 'c2'],  radius=[1, 1.5], z=[.4, .5], radius_unit='mpc')
mt.prep_cat_for_match(cat, delta_z=None, match_radius='cat', cosmo=cosmo)
display(cat)

display(HTML('<h3>Radius from M200c</h3>'))
cat = ClCatalog('Cat', id=['c1', 'c2'], mass=[1e13, 1e14], z=[.4, .5],
                tags={'radius':'mass'}, radius_unit='m200c')
mt.prep_cat_for_match(cat, delta_z=None, match_radius='cat', cosmo=cosmo)
cat['mass'].info.format = '.2e' # Format for nice display
display(cat)

## Saving catalogs<a id='saving'/>

The `ClCatalog` object has a `write` inbuilt function to save them to `.fits` files.
This function also take the argument `add_header` that add the name and labels informations to those files.
If the file was saved with this argument, it can be read without the requirement of a `name` argument:

In [None]:
cat = ClCatalog('cluster', ID_CLUSTER=['c1', 'c2'], M200=[1e13, 1e14],
                tags={'id':'ID_CLUSTER', 'mass':'M200'},
                labels={'id':'cluster ID', 'mass':'cluster M_200'},
               )
cat.write('cat1_with_info.fits', overwrite=True)

In [None]:
cat_temp = cat.read_full('cat1_with_info.fits')
cat_temp['mass'].info.format = '.2e' # Format for nice display
cat_temp

## Accessing catalog data<a id='data'/>

The main data table of the catalog can be accessed with `[]` operations in the same way as `astropy` tables. The output is a new `ClCatalog` object, exept when only 1 row or column is required, then the row/column is returned:

In [None]:
cat = ClCatalog('cluster', ID_CLUSTER=['c1', 'c2'], M200=[1e13, 1e14],
                tags={'id':'ID_CLUSTER', 'mass':'M200'},
                labels={'id':'cluster ID', 'mass':'cluster M_200'},
               )
cat['mass'].info.format = '.2e' # Format for nice display

In [None]:
cat['ID_CLUSTER']

In [None]:
cat['ID_CLUSTER', 'M200']

In [None]:
cat[[1, 0]]

In [None]:
cat[[True, False]]

In [None]:
cat[:1]

In [None]:
cat[0]

An important detail is that when the catalog has tags, passing a string that is tagged will return the tagged column:

In [None]:
cat['id']

In [None]:
cat['id_cluster', 'M200']

## Inbuilt function of catalogs<a id='funcs'/>
The `ClCatalog` object has some inbuilt functionality to facilitate the matching. `ids2inds` returns the indicies of objects given an id list. Other functions are related to footprint computations, see <a href='footprint.ipynb'>footprint.ipynb</a> for information on those.

In [None]:
cat = ClCatalog('cluster', id=['c1', 'c2'], mass=[1e13, 1e14])
cat['mass'].info.format = '.2e' # Format for nice display

display(HTML('<h3>Catalog</h3>'))
display(cat)

display(HTML('<h3>Catalog sorted by id list</h3>'))
inds = cat.ids2inds(['c2', 'c1'])
display(cat[inds])

## Adding members to cluster catalogs<a id='memcat'/>

The members are used as an internal table like object of `ClCatalog`, accessed by `.members`.
This object have the following attributes:
- `name`: ClCatalog name
- `data`: Table with main catalog data (ex: id, id_cluster, ra, dec, z)
- `size`: Number of objects in the catalog
- `id_dict`: Dictionary of indicies given the object id
- `labels`: Labels of data columns for plots
- `id_dict_list`: Dictionary of indicies given the object id, retiruning lists to account members with repeated `id`.

The members can be added to the cluster object using the `add_members` function.
It has a similar instanciating format of a `ClCatalog` object, where the columns are added by keyword arguments (the key `id_cluster` is always necessary and must correspond to `id` in the main cluster catalog):

In [None]:
cat = ClCatalog('cluster', id=['c1', 'c2'], mass=[1e13, 1e14])
cat['mass'].info.format = '.2e' # Format for nice display
cat.add_members(id=['m1', 'm2', 'm3'], id_cluster=['c1', 'c2', 'c1'])
display(cat)
display(cat.members)

The same can be done using `tags`:

In [None]:
cat = ClCatalog('cluster', id=['c1', 'c2'], mass=[1e13, 1e14])
cat['mass'].info.format = '.2e' # Format for nice display
cat.add_members(
    ID=['m1', 'm2', 'm3'], IDCL=['c1', 'c2', 'c1'],
    tags={'id':'ID', 'id_cluster':'IDCL'})
display(cat)
display(cat.members)

### Read members from `fits` files<a id='memcat_fits'/>
The catalogs objects can also be read directly from file, by passing the fits file as the first argument, the catalog name as the second, and the names of the columns in the fits files as keyword arguments:

In [None]:
cat = ClCatalog.read(
    '../demo/cat1.fits', 'my cluster',
    tags={'id':'ID', 'mass':'MASS'})
cat.read_members(
    '../demo/cat1_mem.fits',
    tags={'id':'ID', 'id_cluster':'ID_CLUSTER'})
cat['mass'].info.format = '.2e' # Format for nice display
display(cat)
display(cat.members)

Again, passing `full=True` will read all columns in the file:

In [None]:
cat = ClCatalog.read(
    '../demo/cat1.fits', 'my cluster',
    tags={'id':'ID', 'mass':'MASS'}, full=True)
cat.read_members(
    '../demo/cat1_mem.fits',
    tags={'id':'ID', 'id_cluster':'ID_CLUSTER'}, full=True)
cat['mass'].info.format = '.2e' # Format for nice display
display(cat)
display(cat.members)

### Important inputs of members catalog<a id='memcat_input'/>

There are a few key columns these catalogs must have to be used for matching:

- `id` - necessary in membership matching of members.
- `id_cluster` - always necessary and must correspond to `id` in the main cluster catalog.
- `ra` (in degrees) - necessary for proxity matching of members.
- `dec` (in degrees) - necessary for proxity matching of members.
- `pmem` - Probability of the galaxy being a member, must be [0, 1]. If not provided, it will assing 1 for all members.

### Reserved keyword arguments<a id='memcat_input_special'/>

There are three keyword arguments with specific uses:

- `data`: Data table to be added to the catalog.
- `tags`: Dictionary that tags the important columns in the catalog.
- `labels`: Dictionary with labels of data columns to be used in plots.
- `members_consistency`: Require that all input members belong to this cluster catalog.
- `members_warning`: Raise warning if members are do not belong to this cluster catalog, and save them in leftover_members attribute.
- `members_catalog`: Members catalog if avaliable, mostly for internal use.

When `members_consistency=True`, only galaxies hosted by the cluster catalog is kept. If `members_warning=True`, a warning is raised and the clusters not hosted are stored in `leftover_members`:

In [None]:
cat = ClCatalog('cluster', id=['c1'], mass=[1e13])
cat['mass'].info.format = '.2e' # Format for nice display
cat.add_members(id=['m1', 'm2', 'm3'], id_cluster=['c1', 'c2', 'c1'])
display(cat)
display(cat.members)
display(cat.leftover_members)

### Saving members<a id='memcat_saving'/>

The `member` object has a `write` inbuilt function to save them to `.fits` files.
This function also take the argument `add_header` that add the name and labels informations to those files.
If the file was saved with this argument, it can be read without the requirement of a `name` argument:

In [None]:
cat.members.write('mem1_with_info.fits', overwrite=True)

### Memory consuption<a id='memcat_memory'/>

IMPORTANT! The member catalogs are usually hundreds of times larger than the cluster catalogs. Therefore it is advised not to add it unless you are using it for a specific goal (ex: membership matching). This catalog also can lead to memory overload and makes the other functions slower.

There are two options to handle this, you can either pass a member free version of the catalog or remove the members altogether. To use the member free version of the catalog, use the `raw` function:

In [None]:
cat_raw = cat.raw()
print("Original:")
display(cat.members)
print("Raw:")
display(cat.raw().members)

To remove the members from the cluster catalog, use the `remove_members` function:

In [None]:
cat.remove_members()
print(cat.members, cat.leftover_members)