# Read And Play with redMaPPer Catalog from cosmoDC2 v1.1.4 and DC2 DR6

This tutorial describes access to the redMaPPer v0.8.1 run on cosmoDC2 (truth input catalog) and DC2 DR6 (observed catalog).  Access to the catalogs is via GCRCatalogs.

In [None]:
import GCRCatalogs
from astropy.table import Table
import esutil
import numpy as np
import matplotlib.pyplot as plt


## Working with the cosmoDC2 redMaPPer Catalog

First we read in the "truth" cluster catalog. The catalog that is distributed has been cut to lambda>20, which means that there are the equivalent of 20 red galaxies brighter than 0.2L* within the optimal cluster radius. This corresponds to a mass threshold of appriximately 1e14 M_sun.

In [None]:
# Get the redMaPPer catalog
gc = GCRCatalogs.load_catalog('cosmoDC2_v1.1.4_redmapper_v0.8.1')

In [None]:
# Select out the cluster and member quantities into different lists
quantities = gc.list_all_quantities()
# These are the quantities that describe the clusters and the central galaxies
cluster_quantities = [q for q in quantities if 'member' not in q]
# These are the quantities that describe the members
member_quantities = [q for q in quantities if 'member' in q]


In [None]:
# The clusters and members are linked via "cluster_id"
print(cluster_quantities)

In [None]:
print(member_quantities)

In [None]:
# Read in the cluster and member data
cluster_data = Table(gc.get_quantities(cluster_quantities))
member_data = Table(gc.get_quantities(member_quantities))

In [None]:
# Compare the cluster photo-z to the central galaxy true redshift
plt.hexbin(cluster_data['redshift'], cluster_data['redshift_true_cg'], bins='log')
plt.plot([0.1, 1.15], [0.1, 1.15], 'k--')
plt.xlabel('z_lambda')
plt.ylabel('z_spec_central')

This is a plot comparing the cluster photo-z (z_lambda) which is computed by fitting all the members to the red-sequence model simultaneously, to the true (spectroscopic) redshift of the central galaxy (z_spec_central). The photo-z performance is very good, made possible by the fact that (a) cluster redshift finding can be very precise by fitting 20 red galaxies simultaneously, and (b) the extragalactic catalog is blissfully free of systematics. However, there are some outliers. As shown below these outliers are where redMaPPer has chosen an incorrect central galaxy, so the central redshift doesn't agree with the cluster redshift, but the average of the member redshifts is consistent with the cluster redshift.

Now let's match the cluster centrals and members by the galaxy ID to look at the central colors

In [None]:
# First, we will want to read in the "truth" as a comparison
# Note that we're just reading in the small subset for a faster comparison
# Nevertheless, this takes a bit of time.
gc_truth = GCRCatalogs.load_catalog('cosmoDC2_v1.1.4_small')
quantities_wanted = ['mag_true_g_lsst', 'mag_true_r_lsst', 'mag_true_i_lsst', 'mag_true_z_lsst', 'mag_true_y_lsst', 'redshift']
query = GCRCatalogs.GCRQuery('(is_central == True) & (halo_mass > 5e13)')
truth_data = Table(gc_truth.get_quantities(quantities_wanted, [query]))

In [None]:
# Match by id
uid, ind = np.unique(cluster_data['id_cen_0'], return_index=True)
a, b = esutil.numpy_util.match(cluster_data['id_cen_0'][ind], member_data['id_member'])
print(a.size)

In [None]:
plt.plot(truth_data['redshift'], truth_data['mag_true_g_lsst'] - truth_data['mag_true_r_lsst'], 'b+', label='True Centrals')
plt.plot(cluster_data['redshift'][ind[a]], member_data['mag_g_lsst_member'][b] - member_data['mag_r_lsst_member'][b], 'r.', label='redMaPPer Centrals')
plt.xlim(0.0, 1.2)
plt.ylim(0.6,2.0)
plt.legend()
plt.xlabel('cluster redshift')
plt.ylabel('g-r of central galaxy')

The above plot is a comparison of the the g-r color as a function of redshift of "true" centrals (from the extragalactic catalog), in blue, to the color as a function of redshift of redMaPPer centrals (in red). redMaPPer only finds the red centrals, and thus any halo/cluster with a blue central will be miscentered by redMaPPer (the true rate of clusters with blue centrals at z<1.0 is of some debate). Other than that, the color distribution of centrals found by redMaPPer is consistent with that of the true centrals. The g-r color is most reliable at z<0.4 due to the location of the 4000A break.

In [None]:
plt.plot(truth_data['redshift'], truth_data['mag_true_r_lsst'] - truth_data['mag_true_i_lsst'], 'b+', label='True Centrals')
plt.plot(cluster_data['redshift'][ind[a]], member_data['mag_r_lsst_member'][b] - member_data['mag_i_lsst_member'][b], 'r.', label='redMaPPer Centrals')
plt.xlim(0.0, 1.2)
plt.ylim(0.4, 1.3)
plt.legend()
plt.xlabel('cluster redshift')
plt.ylabel('r-i of central galaxy')

Same as the plot above, except for the r-i color. This is more reliable at z<0.8. The hitch between 0.35<z<0.40 is a feature of the extragalactic catalog that is not seen in real data.

In [None]:
plt.plot(truth_data['redshift'], truth_data['mag_true_i_lsst'] - truth_data['mag_true_z_lsst'], 'b+', label='True Centrals')
plt.plot(cluster_data['redshift'][ind[a]], member_data['mag_i_lsst_member'][b] - member_data['mag_z_lsst_member'][b], 'r.', label='redMaPPer Centrals')
plt.xlim(0.0, 1.2)
plt.ylim(0.2, 1.0)
plt.legend()
plt.xlabel('cluster redshift')
plt.ylabel('i-z of central galaxy')

Same as the plots above, except for the i-z color. This is most useful at 0.7<z<1.0.

In [None]:
plt.plot(truth_data['redshift'], truth_data['mag_true_z_lsst'] - truth_data['mag_true_y_lsst'], 'b+', label='True Centrals')
plt.plot(cluster_data['redshift'][ind[a]], member_data['mag_z_lsst_member'][b] - member_data['mag_y_lsst_member'][b], 'r.', label='redMaPPer Centrals')
plt.xlim(0.0, 1.2)
plt.ylim(0.1, 0.8)
plt.legend()
plt.xlabel('cluster redshift')
plt.ylabel('z-y of central galaxy')

Same as the plots above, except for the z-y color. If the y-band galaxy measurements are reliable in real LSST data, this will allow robust red-sequence cluster finding to z<~1.2.

### Match Clusters and Members to look at the median member redshift

In this section we are going to compare clusters to members and look at the median member redshift. This is only possible with synthetic catalogs where we have true redshifts for all members.

In [None]:
# Clean out any members without ztrue information
# There are a few of these due to a very small id-matching bug that was fixed in post-processing for ~99.9% of the members.
ok, = np.where(member_data['redshift_true_member'] > 0.0)
mem = member_data[ok]

In [None]:
# match clusters to members using cluster_id
a, b = esutil.numpy_util.match(cluster_data['cluster_id'], mem['cluster_id_member'])
# Use the reverse indicies from esutil histogram to group these into cluster bins
h, rev = esutil.stat.histogram(a, rev=True)
mem_zmedian = np.zeros(len(cluster_data))
for i in range(len(cluster_data)):
    i1a = rev[rev[i]: rev[i + 1]]
    mem_zmedian[i] = np.median(mem['redshift_true_member'][i1a])

In [None]:
plt.plot(cluster_data['redshift'], mem_zmedian, 'r.')
plt.plot([0.1, 1.2], [0.1, 1.2], 'k--')
plt.xlabel('z_lambda')
plt.ylabel('Median true redshift from members')

This is a plot of the median true redshift of the members vs the cluster photo-z (z_lambda). The outliers seen above are gone, showing that these outliers where the central galaxy did not agree with the cluster photo-z were due to miscentering and not due to problems with the photo-zs.

## Looking at redMaPPer from DC2 DR6

In [None]:
# Get the redMaPPer catalog
gc_dr6 = GCRCatalogs.load_catalog('dc2_redmapper_run2.2i_dr6_wfd_v0.8.1')

In [None]:
# Read in the cluster and member data
cluster_data_dr6 = Table(gc_dr6.get_quantities(cluster_quantities))
member_data_dr6 = Table(gc_dr6.get_quantities(member_quantities))

In [None]:
# Do some simple ra/dec matching
htm = esutil.htm.HTM(11)
matcher = esutil.htm.Matcher(11, cluster_data['ra'], cluster_data['dec'])
matches = matcher.match(cluster_data_dr6['ra'], cluster_data_dr6['dec'],
                        10/60.0,
                        maxmatch=1)
i1 = matches[1]
i2 = matches[0]
print(i1.size)

In [None]:
gd, = np.where(np.abs(cluster_data['redshift'][i1] - cluster_data_dr6['redshift'][i2]) < 0.05)

In [None]:
plt.plot(cluster_data['redshift'][i1], cluster_data_dr6['redshift'][i2], 'r.')
plt.plot(cluster_data['redshift'][i1[gd]], cluster_data_dr6['redshift'][i2[gd]], 'b+')
plt.xlabel('z_lambda_cosmodc2')
plt.ylabel('z_lambda_dr6')
plt.plot([0.1, 1.2], [0.1, 1.2], 'k--')


In [None]:
plt.hexbin(cluster_data['richness'][i1[gd]], cluster_data_dr6['richness'][i2[gd]], bins='log')
plt.xlabel('lambda_cosmodc2')
plt.ylabel('lambda_dr6')
plt.plot([20, 200], [20, 200], 'r--')

In [None]:
plt.hexbin(cluster_data['redshift'][i1[gd]], cluster_data_dr6['richness'][i2[gd]]/cluster_data['richness'][i1[gd]], bins='log', extent=[0.15, 1.2, 0, 2])
plt.xlabel('z_lambda_cosmodc2')
plt.ylabel('lambda_dr6/lambda_cosmodc2')
plt.plot([0.15, 1.2], [1.0, 1.0], 'r--')