# Computational Astrophysics 2021
---
## Eduard Larrañaga

Observatorio Astronómico Nacional\
Facultad de Ciencias\
Universidad Nacional de Colombia

---

## 04. Crossmatching with `Astropy`
### About this notebook

In this worksheet we use `astropy` package to make the crossmatching of two catalogues.

---

Since crossmatching is a common task in astrophysics, there are many optimised implementations to do it. One of these can be found in the `astropy` package, using objects called **k-d trees** to perform crossmatching incredibly quickly.



### k-d Trees

Consider again our case of study, that uses the BSS as first catalogue and the SuperCOSMOS as the second catalogue. The implementation of crossmatching in `astropy` constructs a k-d tree out of the second catalogue. This structure lets search through to efficiently find a match for each object in the first catalogue. The construction of a k-d tree is similar to the binary search described in this lesson. In this case, the k-dimensional (2-dimensiona) space is divided into two parts recursively until each division only contains only a single object. The results is the structure called a k-d tree which will look as




<br>
<center>
<a href="https://ibb.co/t4vs3Rf"><img src="https://i.ibb.co/L8ctvKw/kdTree.png" alt="kdTree" border="0"></a>
</center>

Note that each cell contains only one element of the second catalogue (A, B, C, D, E, F, G, H, I).

In our particular example, the procedure (based on right ascension and declination) is as follows

1. Find the object of the second catalogue with the median right ascension. Then, split the catalogue using the division line into objects to the left and objects to the right of this element.

2. Now, find the objects with the median declination in each partition. This defines the division lines which split the partitions into smaller partitions of objects down and up of these.

3. Find the objects with median right ascension in each of the partitions, split the partitions into smaller partitions of objects left and right of these.

4. Repeat steps 2. and 3. until each partition only has one object in it.

The created binary tree where each object used to split a partition (a node) links to the two objects that then split the partitions it has created (its children).

### Searching the Match of an Object using the k-d Tree

In order to match an object of the first catalogue, using the defined k-d tree, we use the following steps

1. Calculate the distance from the object to the highest level node (the root node), then go to the child node closest (in Right Ascension) to the object.

2. Calculate the distance from the object to this child, then go to the child node closest (in declination) to the object.

3. Calculate the distance from the object to this child, then go to the child node closest (in right ascension) to the object.

4. Repeat steps 2. and 3. until you reach a child node with no further children (knwon as a *leaf node*).

5. Find the shortest distance of all distances calculated. This corresponds to the closest object.

In the following image, we consider a target X and the matching of this object needs only 4 distance calculation along a branch of the k-d tree.

<center>
<a href="https://ibb.co/PxN54HN"><img src="https://i.ibb.co/7vXGVmX/kd-Tree-Target.png" alt="kd-Tree-Target" border="0"></a>
</center>

The presented  scheme improves the crossmatching algorithm. Note that, since each node branches into two children, a catalogue with N objects will have, on average, $\log_2 N$ nodes from the root to any leaf. 
For example, if we were using the complete SuperCOSMOS catalogue (with the order of 250 million objects), to reach any leaf node needs only 28 distance calculations!
 


### Example

 To illustrate th k-d trees implementation in astropy, we will build a simple catalogue with only 5 objects each, with coordinates in degrees.

In [6]:
import numpy as np
from astropy.coordinates import SkyCoord
from astropy import units as u
coords1 = [[270, -30], [185, 15], [180, 30], [45, 10], [300, -45]]
coords2 = [[185, 20], [280, -30], [180, 32], [302, -44], [55, 10]]
sky_cat1 = SkyCoord(coords1*u.degree, frame='icrs')
sky_cat2 = SkyCoord(coords2*u.degree, frame='icrs')


Here, the `astropy.coordinates.SkyCoord` function provides a flexible interface for celestial coordinate representation, manipulation, and transformation between systems.

The minimum input to inizialize this function is to provide one or more celestial coordinate values with unambiguous units. The option `frame='icrs'` refers to the type of coordinate frame used. Teh default is the **I**nternational **C**elestial **R**eference **S**ystem (**ICRS**) and it is essentially the same as equatorial coordinates. For complete information about this frame, see

https://www.iers.org/IERS/EN/Science/ICRS/ICRS.html

Other frames included in astropy are ICRS, FK5, FK4, FK4NoETerms, and Galactic. Complete information about the `SkyCoord` function can be found at

https://docs.astropy.org/en/stable/api/astropy.coordinates.SkyCoord.html


Now, lets see the information in each catalog

In [7]:
sky_cat1

<SkyCoord (ICRS): (ra, dec) in deg
    [(270., -30.), (185.,  15.), (180.,  30.), ( 45.,  10.), (300., -45.)]>

In [8]:
sky_cat2

<SkyCoord (ICRS): (ra, dec) in deg
    [(185.,  20.), (280., -30.), (180.,  32.), (302., -44.), ( 55.,  10.)]>

Now, we will use the method `.match_to_catalog_sky()` which finds the nearest on-sky matches for two catalogs.

This method returns three results:
- Indices of the matched objects
- On-sky separation (angular distance in DMS format) between the closest match for each element
- 3-dimensional distance between the closest match for each element

Complete information about this method can be found at

https://docs.astropy.org/en/stable/api/astropy.coordinates.SkyCoord.html#astropy.coordinates.SkyCoord.match_to_catalog_sky

In [12]:
closest_ids, closest_dists, closest_dists3d = sky_cat1.match_to_catalog_sky(sky_cat2)

print(closest_ids)
print(closest_dists)

[1 0 2 4 3]
[8d39m27.0001s 5d00m00s 2d00m00s 9d50m51.7182s 1d44m31.2393s]


Astropy returns distances as Quantity objects but you can convert these to NumPy arrays by accessing their value attribute,


In [14]:
closest_dists_array = closest_dists.value
closest_dists_array

array([8.65750003, 5.        , 2.        , 9.84769951, 1.7420109 ])

Finally, note that the `astropy` package doesn't let to specify a maximum radius and therefore, it is needed a function that independtly restricts the matches with distances greater than the desired maximum radius (e.g. 5 degrees).

### Exercises

1. Use the `astropy` package to crossmatch the following random catalogues.

In [11]:
def random_cat(n):
    ra = np.random.uniform(0, 360, size=(n, 1))
    dec = np.random.uniform(-90, 90, size=(n, 1))
    return np.hstack((ra, dec))


coords1 = random_cat(200)
coords2 = random_cat(200)

coords1

array([[ 8.65628377e+01, -6.32115733e+01],
       [ 3.55944359e+02,  8.62078973e+01],
       [ 1.41941655e+02, -6.73992278e+01],
       [ 7.02751901e+01, -3.78157206e+01],
       [ 3.13379398e+02, -2.73630807e+01],
       [ 1.08325998e+02,  6.02402988e+01],
       [ 1.55616496e+02,  7.58250753e+01],
       [ 6.10960684e+01,  6.45530016e+00],
       [ 1.47182919e+02, -2.59751904e+01],
       [ 2.08889204e+02,  2.60221722e+01],
       [ 3.28488732e+02, -7.59025554e+01],
       [ 8.24341074e+01, -3.45920197e+01],
       [ 1.44128918e+02,  2.59006634e+01],
       [ 1.11924064e+02,  8.43788428e+00],
       [ 1.50738426e+01,  5.12715640e+01],
       [ 2.38066394e+02,  6.47896674e+01],
       [ 2.87001610e+02,  6.54909908e+01],
       [ 2.45014678e+02, -4.25143737e+01],
       [ 2.09848078e+01, -5.29289990e+01],
       [ 1.57273080e+02,  2.50557764e+01],
       [ 2.21261707e+02, -5.98004630e+01],
       [ 3.35620926e+02,  8.36535009e+01],
       [ 6.34552090e+01, -4.58465476e+01],
       [ 1.

2. Use the `astropy` package to crossmatch the BSS and SuperCOSMOS catalogues presented in this lesson. Measure the time needed to complete the crossmatch.