# Computational Astrophysics 2021
---
## Eduard Larrañaga

Observatorio Astronómico Nacional\
Facultad de Ciencias\
Universidad Nacional de Colombia

---

## 02. Datasets for crossmatching
### About this notebook

In this worksheet we introduce the datasets that we will work for the crossmatching.

---

This exercise of crossmatching two catalogues is based on the course *Data-Driven Astronomy* by the University of Sydney, available at

https://www.coursera.org/learn/data-driven-astronomy



### Introducing the Data

Before we can make the crossmatch of two catalogues we first have to import the raw data. 

Since every astronomy catalogue have its own unique format, we will need to load the data and study its structure individually. 

#### BBS Catalogue

The first catalogue that we need is the Australia Telescope 20-GHz **AT20G** Bright Source Sample (BSS) survey. This survey was carried out by the Australia Telescope Compact Array, from 2004 to 2007. In particular, the BSS is a complete flux-limited subsample of the AT20G Survey catalogue comprising 320 extragalactic radio sources.

The raw data is the file table2.dat that can be downloades from the VizieR archives,

http://cdsarc.u-strasbg.fr/viz-bin/Cat?J/MNRAS/384/775 


As we already shown, every catalogue in VizieR has a detailed README file that gives you the exact format of each table in the catalogue.

In [1]:
import numpy as np

In [2]:
path='' #Define an empty string to use in case of local working

In [3]:
# Working with google colab needs to mount the Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
# we define the path to the files
path = '/content/drive/MyDrive/Colab Notebooks/CA2021/08. Cross-match Images/presentation/'

The full BSS catalogue contains 320 objects. From the ReadMe file, we see that the contents in the table2.dat file include the features

| Column | Feature |
|---|---|
|1:| Object catalogue ID number (sometimes with an asterisk indicating a note)|
|2-4:| Right ascension in HMS notation|
|5-7:| Declination in DMS notation|
|8-:| Other information|


The information in the columns 8-  includes flux densities, spectral intensities, redshift, etc.

For crossmatching, we only need the coordinates. Therefore, we will load just the columns 2-7 with the `usecols` argument in the `numpy.loadtxt` function. 

Note: We will not load the ID column because this number is always the same as the row number.


In [5]:
bbs_raw = np.loadtxt(path+'J_MNRAS_384_775/table2.dat', usecols=range(1, 7))
bbs_raw.shape

(320, 6)

In [6]:
bbs_raw[0:3]

array([[  0.  ,   4.  ,  35.65, -47.  ,  36.  ,  19.1 ],
       [  0.  ,  10.  ,  35.92, -30.  ,  27.  ,  48.3 ],
       [  0.  ,  11.  ,   1.27, -26.  ,  12.  ,  33.1 ]])

Since the Right Ascension is  given in hours-minutes-seconds (HMS) notation and the Declination is given in degrees-minutes-seconds (DMS) notation, we will use the functions defined before in this course to trasform this information to degrees.

In [7]:
def HMStoDeg(H,M,S):
  return 15*(H + M/60 + S/(60*60))

def DMStoDeg(D,M,S):
  if D>=0:
    return D + M/60 + S/(60*60)
  else:
    return -1*(-D + M/60 + S/(60*60))


Now, we will define an array with the position of the sources in degrees.

In [8]:
bss_cat = np.zeros([320,3])

for i in range(320):
  bss_cat[i,0]=i+1
  ascension = HMStoDeg(bbs_raw[i,0], bbs_raw[i,1], bbs_raw[i,2])
  declination = DMStoDeg(bbs_raw[i,3], bbs_raw[i,4], bbs_raw[i,5])
  bss_cat[i,1], bss_cat[i,2] = ascension, declination

bss_cat[0:5]

array([[  1.        ,   1.14854167, -47.60530556],
       [  2.        ,   2.64966667, -30.46341667],
       [  3.        ,   2.75529167, -26.20919444],
       [  4.        ,   3.24954167, -39.90733333],
       [  5.        ,   6.45491667, -26.03686111]])

---

#### SuperCOSMOS Catalogue

The SuperCOSMOS all-sky catalogue is a catalogue of galaxies generated from several visible light surveys available at 

http://ssa.roe.ac.uk/allSky

The complete description of the catalogue can be found in the paper

https://arxiv.org/abs/1607.01189

The raw data corresponds to the file 

http://ssa.roe.ac.uk/cats/SCOS_XSC_mCl1_B21.5_R20_noStepWedges.csv.gz

This dataset includes more than 241 millons of objects and its size is more than 8 Gb. 
 
Since this catalogue is so large, we will use a reduced version given in the course *Data-Driven Astronomy* by the University of Sydney, available at

https://www.coursera.org/learn/data-driven-astronomy

The corresponding file is available at

https://github.com/ashcat2005/data-driven-astronomy/blob/master/week2/data3/super.csv

and includes only 500 objects.

Although this a .csv file and can be read directly with `pandas`, we will use `numpy.loadtxt` to load the data as an array. 

The catalogue data includes in the first row the column labels. It is clear that the are

| Column | Feature |
|---|---|
|1:| Right ascension in decimal degrees|
|2:| Declination in decimal degrees|
|3:| Other data, including magnitude and apparent shape|


Since we only need the coordinates, we will load just the first two columns with the `usecols` argument in the `numpy.loadtxt` function. 

In [9]:
super_raw = np.loadtxt(path+'super.csv', delimiter=',', skiprows=1, usecols=[0, 1])
super_raw.shape

(500, 2)

In [10]:
super_raw[0:3]

array([[ 6.03176000e-02, -8.56561327e+01],
       [ 1.14850820e+00, -4.76054131e+01],
       [ 1.27943310e+00, -1.54590140e+00]])

Since the coordinates in this dataset are already in degrees, we need no conversion. Now, we will define an array with the loaded data, including an index column,

In [11]:
super_cat = np.zeros([500,3])

for i in range(500):
  super_cat[i,0]=int(i+1)
  super_cat[i,1], super_cat[i,2] = super_raw[i,0], super_raw[i,1]

super_cat[0:5]

array([[ 1.00000000e+00,  6.03176000e-02, -8.56561327e+01],
       [ 2.00000000e+00,  1.14850820e+00, -4.76054131e+01],
       [ 3.00000000e+00,  1.27943310e+00, -1.54590140e+00],
       [ 4.00000000e+00,  2.64883140e+00, -3.04631581e+01],
       [ 5.00000000e+00,  2.75505950e+00, -2.62091826e+01]])


---
### Angular Distance between Two Objects in the Catalogues

The angular distance between between one object in the first catalogue with coordinates $(\alpha_1 , \delta_1)$ and another object in the second catalogue with coordinates $(\alpha_2, \delta_2)$, can be calculated using the Haversine formula,

\begin{equation}
d = 2 \arcsin \sqrt{\sin^2 \frac{\left| \delta_1 - \delta_2 \right| }{2} + \cos \delta_1 \cos \delta_2 \sin^2 \frac{\left| \alpha_1 - \alpha_2\right|}{2}}
\end{equation}

https://en.wikipedia.org/wiki/Haversine_formula

We implmented this formula as a function before,

In [12]:
def ang_dist(RA1, dec1, RA2, dec2):
  '''
  The arguments RA and dec are given in decimal degrees
  and the function returns the angular distance in
  decimal degrees
  '''
  # Transform the arguments from decimal degrees to radians
  RA1 = np.radians(RA1)
  dec1 = np.radians(dec1)
  RA2 = np.radians(RA2)
  dec2 = np.radians(dec2)

  a = np.sin(np.abs(dec1 - dec2)/2)**2
  b = np.cos(dec1)*np.cos(dec2)*np.sin(np.abs(RA1 - RA2)/2)**2
  d = 2*np.arcsin(np.sqrt(a + b))
  return np.degrees(d)

For example, the distance between the first object in the BSS catalogue and the first object in the SuperCOSMOS catalogue is

In [13]:
ang_dist(bss_cat[0,1], bss_cat[0,2], super_cat[0,1], super_cat[0,2])

38.05168335509249

---
## Crossmatching 

The crossmatching process consists in finding a correspondence between the objects in two catalogues.

The process will be as follows:

1. Select an object from the BSS catalogue

2. Go through all the objects in the SuperCOSMOS catalogue, calculating the angular distance to find the closest one to the BSS object.

3. Record the match of the closest objects (* see note below).

4. Repeat steps 1-3 for all the objects in the BSS catalogue.

Note: If the closest object is not within a given maximum distance, it is unlikely that the two objects are actually counterparts, and it may be more likely that they are just nearby objects.

This maximum distance is choosen depending on the uncertainty of the measured object positions in each catalogue.

### Exercises

1. Write a crossmatch function that crossmatches two catalogues within a maximum distance. It should return a list of matches and non-matches for the first catalogue against the second.

2. The list of matches contains tuples of the first and second catalogue object IDs and their distance. 
The list of non-matches contains the unmatched object IDs from the first catalogue only. Both lists should be ordered by the first catalogue's IDs.

3. Include a function to calculate the time taken (in seconds) to run the crossmatcher.

4. Using the BSS and SuperCOSMOS catalogues as input arguments, compute the crossmatch function using a maximum distance of 10 arcseconds.

5. Compute again the crosmatching function using the BSS and SuperCOSMOS catalogues as input arguments with a maximum distance of 5 arcseconds.

In [18]:

def crossmatch(cat1,cat2, min_dist=5.):
  '''
  Crossmatch two catalogues cat1 and cat2 calculating the angular 
  distance between all the entries and choosing the minimum within 
  a maximum distance of min_dist
  '''
  # array to store all the distances
  distances = np.zeros(len(cat2))
  # list to store the match results
  match_results = []
  # list to store the no-match results
  nomatch_results = []

  # Main Loop
  for i in range(len(cat1)):
    for j in range(len(cat2)):
      distances[j] = ang_dist(cat1[i,1], cat1[i,2], cat2[j,1], cat2[j,2])
    # Comapre with the minimum distance allowed
    if distances.min() < min_dist:
      # Append a match
      match_results.append([i, np.where(distances==distances.min())[0][0]])
    else:
      # Append a no-match
      nomatch_results.append(i)

  return match_results, nomatch_results


In [19]:
%%time
match, no_match = crossmatch(bss_cat, super_cat)


CPU times: user 3.79 s, sys: 4.19 ms, total: 3.79 s
Wall time: 3.8 s


In [20]:
 match

[[0, 1],
 [1, 3],
 [2, 4],
 [3, 5],
 [4, 10],
 [6, 10],
 [7, 13],
 [8, 14],
 [9, 16],
 [10, 27],
 [11, 17],
 [12, 21],
 [13, 23],
 [14, 24],
 [15, 25],
 [16, 26],
 [17, 28],
 [18, 29],
 [19, 30],
 [20, 31],
 [21, 32],
 [22, 33],
 [23, 34],
 [24, 36],
 [25, 38],
 [26, 40],
 [27, 41],
 [28, 41],
 [29, 42],
 [30, 43],
 [31, 45],
 [32, 46],
 [33, 47],
 [34, 48],
 [35, 50],
 [36, 51],
 [37, 53],
 [38, 54],
 [39, 56],
 [40, 57],
 [41, 58],
 [42, 59],
 [43, 60],
 [44, 63],
 [45, 62],
 [46, 64],
 [47, 65],
 [48, 66],
 [49, 67],
 [50, 68],
 [51, 69],
 [52, 70],
 [53, 71],
 [54, 72],
 [55, 74],
 [56, 75],
 [57, 76],
 [58, 76],
 [59, 77],
 [60, 79],
 [61, 80],
 [62, 81],
 [63, 83],
 [64, 86],
 [65, 88],
 [66, 89],
 [67, 90],
 [68, 91],
 [69, 92],
 [70, 93],
 [71, 95],
 [72, 97],
 [73, 101],
 [74, 102],
 [75, 103],
 [76, 104],
 [77, 108],
 [78, 109],
 [79, 110],
 [80, 111],
 [81, 115],
 [82, 112],
 [83, 114],
 [84, 115],
 [85, 116],
 [86, 117],
 [87, 118],
 [88, 119],
 [89, 120],
 [90, 121],
 [91,

In [21]:
no_match

[5,
 174,
 176,
 177,
 178,
 179,
 183,
 203,
 205,
 254,
 258,
 265,
 267,
 269,
 271,
 272,
 274,
 276,
 277,
 278,
 280,
 284,
 286,
 287,
 289,
 291,
 293,
 294,
 295,
 296,
 297,
 298,
 299,
 301,
 302,
 304,
 305,
 309]

In [22]:
len(no_match)

38