## Joining catalogs

In this tutorial we join a small cone region of Gaia with Gaia Early Data Release 3 (EDR3) and compute the ratio between the distances given by their `parallax` and `r_med_geo` columns, respectively.

In [None]:
import lsdb
from lsdb import ConeSearch

First we load Gaia with its objects `source_id`, their positions and `parallax` columns.

In [None]:
gaia = lsdb.read_hats(
    "https://data.lsdb.io/hats/gaia_dr3/gaia",
    margin_cache="https://data.lsdb.io/hats/gaia_dr3/gaia_10arcs",
    columns=["source_id", "ra", "dec", "parallax"],
    search_filter=ConeSearch(ra=0, dec=0, radius_arcsec=10 * 3600),
)
gaia

We will do the same with Gaia EDR3 but the distance column we will use is called `r_med_geo`, the median of the geometric distance estimate.

In [None]:
gaia_edr3 = lsdb.read_hats(
    "https://data.lsdb.io/hats/gaia_dr3/gaia_edr3_distances",
    margin_cache="https://data.lsdb.io/hats/gaia_dr3/gaia_edr3_distances_10arcs",
    columns=["source_id", "ra", "dec", "r_med_geo"],
    search_filter=ConeSearch(ra=0, dec=0, radius_arcsec=10 * 3600),
)
gaia_edr3

We are now able to join both catalogs on the `source_id` column, as follows:

In [None]:
joined = gaia.join(gaia_edr3, left_on="source_id", right_on="source_id")
joined

Let's calculate a histogram with the ratio in catalog distances.

In [None]:
results = (1e3 / joined["parallax_gaia"]) / joined["r_med_geo_gaia_edr3_distances"]
ratios = results.compute().to_numpy()
ratios

In [None]:
import numpy as np
import matplotlib.pyplot as plt

plt.hist(ratios, bins=np.linspace(0.8, 1.2, 100))
plt.title("Histogram of Gaia distance / Gaia EDR3 distance")
plt.show()