# Matchup

Write a crossmatch function that crossmatches two catalogues within a maximum distance.
It should return a list of matches and non-matches for the first catalogue against the second.

The list of matches contains tuples of the first and second catalogue object IDs and their distance.
The list of non-matches contains the unmatched object IDs from the first catalogue only.
Both lists should be ordered by the first catalogue's IDs.

The BSS and SuperCOSMOS catalogues will be given as input arguments, each in the format you’ve seen previously.
The maximum distance is given in decimal degrees.

Here's how crossmatch should work on our sample catalogues with a maximum distance of 40 arcseconds:

```python
bss_cat = import_bss()
super_cat = import_super()
max_dist = 40/3600
matches, no_matches = crossmatch(bss_cat, super_cat, max_dist)
print(matches[:3])
print(no_matches[:3])
print(len(no_matches))
```

```python
[(1, 2, 0.00010988610938710059), (2, 4, 0.00076498459672424946), (3, 5, 0.00020863352870707666)]
[5, 6, 11]
9
```

Only 9 objects have no match. Let's try a 5 arcsecond maximum:

```python
bss_cat = import_bss()
super_cat = import_super()
max_dist = 5/3600
matches, no_matches = crossmatch(bss_cat, super_cat, max_dist)
print(matches[:3])
print(no_matches[:3])
print(len(no_matches))
```

```python
[(1, 2, 0.00010988610938710059), (2, 4, 0.00076498459672424946), (3, 5, 0.00020863352870707666)]
[5, 6, 11]
40
```

Now 40 objects have no match with the tighter search radius.

In [1]:
%run 2d.ipynb

def crossmatch(cat1, cat2, max_dist):
  dat1 = np.array(cat1)[:,1:]
  dat2 = np.array(cat2)[:,1:]
  tree = cKDTree(dat2)
  d2d, idx = tree.query(dat1, distance_upper_bound=max_dist*4) # allow 400% margin of error

  dists = []
  for id1, id2 in enumerate(idx):
    if id2<len(dat2):
      ra1, dec1 = dat1[id1]
      ra2, dec2 = dat2[id2]
      dist = angular_dist(ra1,dec1,ra2,dec2)
      if dist<max_dist:
        dists.append(dist)
      else:
        dists.append(5*dist)
    else:
      dists.append(10*max_dist)

  dists = np.array(dists)

  matches = list(zip(1+np.where(dists<=max_dist)[0], 1+idx[dists<=max_dist], dists[dists<=max_dist].tolist()))
  no_matches = (1+np.where((idx==tree.n)|(dists>max_dist))[0]).tolist()
  return matches, no_matches

UsageError: unrecognized arguments: Suppress output
UsageError: unrecognized arguments: Suppress output


You can use this to test your function.

In [None]:
bss_cat = import_bss()
super_cat = import_super()

# First example in the question
max_dist = 40/3600
matches, no_matches = crossmatch(bss_cat, super_cat, max_dist)
print(matches[:3])
print(no_matches[:3])
print(len(no_matches))

# Second example in the question
max_dist = 5/3600
matches, no_matches = crossmatch(bss_cat, super_cat, max_dist)
print(matches[:3])
print(no_matches[:3])
print(len(no_matches))
