# Accessing  astronomical catalogs

There are two ways to access astronomical data catalogs that are provided as table data with a VO API.

Firstly, there is a __[Simple Cone Search (SCS) protocol](http://www.ivoa.net/documents/latest/ConeSearch.html)__  used to search a given table with a given position and radius, getting back a table of results.  The interface requires only a position and search radius.   

For more complicated searches, the __[Table Access Protocol](http://www.ivoa.net/documents/TAP/)__ (TAP) protocol is  a powerful tool to search any VO table.  Here, we expand on its usage and that of the __[Astronomical Data Query Language](http://www.ivoa.net/documents/latest/ADQL.html)__ (ADQL) that it uses.  


* [1. Simple cone search](#cone)queries
* [2. Basic Table Access Protocol](#basic) queries
* [3 Expressing queries in ADQL](#adql)
* [4. Cross-correlating](#cc) our own catalog with a HEASARC catalog
* [5. Combining](#combo) data from multiple catalogs and cross-correlating

#### Throughout this notebook, we will step through using TAP for the following science goal: we want to select a sample of bright, nearby spiral galaxies. 


In [1]:
import numpy as np
import astropy.units as u
## There are a number of relatively unimportant warnings that 
## show up, so for now, suppress them:
import warnings
warnings.filterwarnings("ignore", module="astropy.io.votable.*")

## For simple astropy tables
import astropy, io

## For handling ordinary astropy Tables
from astropy.table import Table

## For handling VO table type objects
from astropy.io import votable as apvot

## Use NAVO utility for Registry and Cone searches
import sys
from navo_utils.registry import Registry
from navo_utils.cone import Cone
from navo_utils.tap import Tap


<a id="scs1"></a>

# 1. Simple cone search


Starting with a single simple source first: 

In [2]:
import astropy.coordinates as coord
coord=coord.SkyCoord.from_name("m51")
print(coord)


<SkyCoord (ICRS): (ra, dec) in deg
    (202.469575, 47.1952583)>


Below, we go through the exercise of how we can figure out the most relevant table. But for now, let's assume that we know that we want the "zcat" catalog. VO services are listed in a central Registry that can be searched through a [web interface](http://vao.stsci.edu/keyword-search/) or the python Registry class.  We use the registry to find the corresponding cone service and then submit our cone search. 

In [3]:
services=Registry.query(service_type='cone',source='zcat')
services

waveband,short_name,ivoid,res_description,access_url,reference_url,publisher,service_type
str23,str9,str28,str24,str77,str57,str17,str10
optical,CFAZ,ivo://nasa.heasarc/zcat,No Description Available,https://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=zcat&amp;,https://heasarc.gsfc.nasa.gov/W3Browse/all/zcat.html,NASA/GSFC HEASARC,conesearch
optical,ABELLZCAT,ivo://nasa.heasarc/abellzcat,No Description Available,https://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=abellzcat&amp;,https://heasarc.gsfc.nasa.gov/W3Browse/all/abellzcat.html,NASA/GSFC HEASARC,conesearch
gamma-ray#optical#x-ray,ROMABZCAT,ivo://nasa.heasarc/romabzcat,No Description Available,https://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=romabzcat&amp;,https://heasarc.gsfc.nasa.gov/W3Browse/all/romabzcat.html,NASA/GSFC HEASARC,conesearch


Here, the results of the Registry query for cone services show that the HEASARC lists every catalog as a separate cone service. Note: We could also find it with a search for source='heasarc%zcat', where the "%" is a wildcard within radius (given in degrees). 

Supposing that we want the table with the short_name CFAZ, and we want to retrieve the data for all sources within an arcminute of our specified location:

In [4]:
## Different tries may come back in different order, so find the one that's CFAZ. 
cfaz_service=services[np.isin(services['short_name'],['CFAZ'])][0]
## We are searching for sources within 1 arcminute of M51. 
table=Cone.query(service=cfaz_service,coords=coord,radius=1./60.)
table[0]

name,ra,dec,bmag,radial_velocity,radial_velocity_error,redshift,class,Search_Offset
Unnamed: 0_level_1,deg,deg,Unnamed: 3_level_1,km / s,km / s,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
str5,float64,float64,float32,int32,int32,float64,int32,float64
N5194,202.46823,47.19815,9.03,474,23,--,6200,0.1819


The SCS is quite straightforward and returns all of the columns of the given table (which can be anything) for the sources in the region queried.

<a id="basic"></a>

# 2. Basic Table Access Protocol queries

A TAP query is the most powerful way to search a catalog. A Simple Cone Search only allows you to ask for a position and radius, but TAP allows you to do much more, since the available tables contain much more information. 



Many services list a single TAP service in the Registry that can access many catalogs, boosting your efficiency. This is the power of the TAP! 
With the TAP, you can refine the search based on any other attribute in the given catalog. Suppose for our example, we want to select bright galaxy candidates but don't know the coordinates. Therefore, we start from figuring out the best table to query.   

As before, we use the Registry query, but for TAP service, the service_type='table'. 

In [5]:
tap_services=Registry.query(service_type='table',source='heasarc')
url=str(tap_services[0]['access_url'])
print(url)

https://heasarc.gsfc.nasa.gov/xamin/vo/tap


This url will contain the list of all available tables, which we can see by using the Tap.list_tables() function:

In [6]:
Tap.list_tables(url)

Retrieving tables...
Parsing tables...
Done.
TAP_SCHEMA.TAP_SCHEMA.columns
TAP_SCHEMA.TAP_SCHEMA.key_columns
TAP_SCHEMA.TAP_SCHEMA.keys
TAP_SCHEMA.TAP_SCHEMA.schemas
TAP_SCHEMA.TAP_SCHEMA.tables
public.a1
public.a1point
public.a2lcpoint
public.a2lcscan
public.a2led
public.a2pic
public.a2point
public.a2rtraw
public.a2specback
public.a2spectra
public.a3
public.a4
public.a4spectra
public.aavsovsx
public.abell
public.abellzcat
public.acceptcat
public.acrs
public.actegsrcat
public.actssrcat
public.aegis20
public.aegis20id
public.aegisx
public.aegisxdcxo
public.agilecat
public.agileupvar
public.agnsdssxm2
public.agnsdssxmm
public.akaribsc
public.akaripsc
public.aknepdfcxo
public.alfperxmm
public.allwiseagn
public.ami10c15gz
public.amigps16gh
public.ansuvpscat
public.arcquincxo
public.ariel3a
public.ariel5
public.arxa
public.ascaegclus
public.ascagis
public.ascagps
public.ascalss
public.ascamaster
public.ascao
public.ascaprspec
public.ascasis
public.asiagosn
public.askapbeta
public.at20g
publ

['TAP_SCHEMA.TAP_SCHEMA.columns',
 'TAP_SCHEMA.TAP_SCHEMA.key_columns',
 'TAP_SCHEMA.TAP_SCHEMA.keys',
 'TAP_SCHEMA.TAP_SCHEMA.schemas',
 'TAP_SCHEMA.TAP_SCHEMA.tables',
 'public.a1',
 'public.a1point',
 'public.a2lcpoint',
 'public.a2lcscan',
 'public.a2led',
 'public.a2pic',
 'public.a2point',
 'public.a2rtraw',
 'public.a2specback',
 'public.a2spectra',
 'public.a3',
 'public.a4',
 'public.a4spectra',
 'public.aavsovsx',
 'public.abell',
 'public.abellzcat',
 'public.acceptcat',
 'public.acrs',
 'public.actegsrcat',
 'public.actssrcat',
 'public.aegis20',
 'public.aegis20id',
 'public.aegisx',
 'public.aegisxdcxo',
 'public.agilecat',
 'public.agileupvar',
 'public.agnsdssxm2',
 'public.agnsdssxmm',
 'public.akaribsc',
 'public.akaripsc',
 'public.aknepdfcxo',
 'public.alfperxmm',
 'public.allwiseagn',
 'public.ami10c15gz',
 'public.amigps16gh',
 'public.ansuvpscat',
 'public.arcquincxo',
 'public.ariel3a',
 'public.ariel5',
 'public.arxa',
 'public.ascaegclus',
 'public.ascagis',
 

Here we find the *public.zcat* table. If we want to know what columns this table contains, we can query:  

In [7]:
Tap.list_columns(url,tablename="public.zcat")

<TableColumns names=('__row','name','ra','dec','lii','bii','bmag','radial_velocity','radial_velocity_error','ref_bmag','ref_radial_velocity','morph_type','bar_type','luminosity_class','structure','diameter_1','diameter_2','bt_mag','ugc_or_eso','distance','rfn_number','comments','redshift','ref_redshift','notes','class','__x_ra_dec','__y_ra_dec','__z_ra_dec')>

Success! This appears to be useful table for our goal since it contains columns with the information that we need to select a sample of the brightest nearby spiral galaxy candidates. 

Now that we know all the possible column information in the zcat catalog, we can do more than query on position (as in a cone search) but also on any other column (e.g., redshift, bmag, morph_type).  The query has to be expressed in a language called __[ADQL](http://www.ivoa.net/documents/latest/ADQL.html)__.  


<a id="adql"></a>

# 3. Expressing queries in ADQL


The basics of ADQL:

* *SELECT &#42; FROM my.interesting.catalog as cat...* 

says you want all ("&#42;") columns from the catalog called "my.interesting.catalog", which you will refer to in the rest of the query by the more compact name of "cat".  

Instead of returning all columns, you can 

* *SELECT cat.RA, cat.DEC, cat.bmag from catalog as cat...* 

to only return the columns you're interested in. To use multiple catalogs, your query could start, e.g.,

* *SELECT c1.RA,c1.DEC,c2.BMAG FROM catalog1 as c1 natural join catalog2 as c2...* 

says that you want to query two catalogs zipped together the "natural" way, i.e., by looking for a common column.

To select only some rows of the catalog based on the value in a column, you can add:  

* *WHERE cat.bmag < 14* 

says that you want to retrieve only those entries in the catalog whose bmag column has a value less than 14.

You can also append 

* *ORDER by cat.bmag* 

to return the result sorted ascending by one of the columns, adding *DESC* to the end for descending. 

A few special functions in the ADQL allow you to query regions:

* *WHERE contains( point('ICRS', cat.ra, cat.dec), circle('ICRS', 210.5, -6.5, 0.5))=1*

is how you would ask for any catalog entries whose RA,DEC lie within a circular region defined by RA,DEC 210.5,-6.5 and a radius of 0.5 (all in degrees).  The 'ICRS' specifies the coordinate system.  

See the ADQL documentation for more.

Here is a simple ADQL query where we print out the relevant columns for the bright (Bmag <14) sources found within 1 degree of M51 (we will discuss how to define the table and column names below):

In [8]:
##  Inside the format call, the {} are replaced by the given variables in order.
##  So this asks for 
##  rows of public.zcat where that row's ra and dec (cat.ra and cat.dec from the catalog) 
##  are within radius 1deg of the given RA and DEC we got above for M51 
##  (coord.ra.deg and coord.dec.deg from our variables defined above), and where 
##  the bmag column is less than 14.  
query="""SELECT ra, dec, Radial_Velocity, radial_velocity_error, bmag, morph_type FROM public.zcat as cat where 
    contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',{},{},1.0))=1 and
    cat.bmag < 14
    order by cat.radial_velocity_error 
    """.format(coord.ra.deg, coord.dec.deg)

In [9]:
results=Tap.query(url,query)
results

ra,dec,radial_velocity,radial_velocity_error,bmag,morph_type
float64,float64,int32,int32,float32,int32
202.46823227,47.19814872,474,23,9.03,4
202.49490508,47.2679204,558,23,10.94,0
202.54744757,46.66995898,2569,25,13.3,-5


See the __[information on the zcat](https://heasarc.gsfc.nasa.gov/W3Browse/galaxy-catalog/zcat.html)__ for column information. (We will use the 'radial_velocity' column rather than the 'redshift' column.) We note that spiral galaxies have morph_type between 1 - 9. 

Therefore, we can generalize the query above to complete our exercise and select the brightest (bmag < 14), nearby (radial velocity < 3000), spiral ( morph_type = 1 - 9) galaxies as follows: 

In [10]:
query="""SELECT ra, dec, Radial_Velocity, radial_velocity_error, bmag, morph_type FROM public.zcat as cat where 
    cat.bmag < 14 and cat.morph_type between 1 and 9 and cat.Radial_Velocity < 3000 
    order by cat.Radial_velocity 
    """.format(coord.ra.deg, coord.dec.deg)

In [11]:
results=Tap.query(url,query)
results

ra,dec,radial_velocity,radial_velocity_error,bmag,morph_type
float64,float64,int32,int32,float32,int32
10.6847407,41.26882595,-297,1,4.3,3
189.20757439,13.16274802,-223,18,10.58,2
186.73679277,15.04781754,-182,12,12.23,1
23.46217921,30.6601917,-180,1,6.5,5
186.49550755,15.67102319,-155,26,13.7,7
183.45114982,14.90009805,-98,15,11.0,2
183.91295676,13.90133935,-84,27,12.08,4
148.89928698,69.06297291,-40,5,7.75,2
68.21187756,71.88871924,-28,10,12.1,7
184.18072097,69.46565852,-7,4,10.4,8


<a id="cc"></a>
# 4. TAP:  Using the TAP to cross-correlate our objects with a catalog

TAP can also be a powerful way to collect a lot of useful information from existing catalogs in one quick step. For this exercise, we will start with a list of sources, uploaded from our own table, and do a 'cross-correlation' with the *zcat* table. 

For more on creating and working with VO tables, see that [notebook](VO_Tables.ipynb).  Here, we just read one in that's already prepared:  


(These take a while, i.e. half a minute.)

In [12]:
query="""
    SELECT cat.ra, cat.dec, Radial_Velocity, bmag, morph_type
    FROM zcat cat, tap_upload.mysources mt 
    WHERE
    contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',mt.ra,mt.dec,0.01))=1
    and Radial_Velocity > 0
    ORDER by cat.ra"""
zcattable=Tap.query(url,query,upload_file="data/my_sources.xml",upload_name='mysources')
zcattable

ra,dec,radial_velocity,bmag,morph_type
float64,float64,int32,float32,int32
136.00074371,21.96791867,3093,13.8,20
146.70334308,22.01827217,7446,14.6,-1
146.70334308,22.01827217,7597,15.0,-2
148.77805631,14.29613296,7194,15.1,2
175.03940162,15.32725392,3325,14.4,3
191.54198643,30.73226569,6651,15.0,20
191.54739911,30.72338192,6517,14.7,-2
194.91294475,28.89537435,6093,15.0,0
206.57163823,43.85050581,2229,15.0,-1
206.57996426,43.84385707,25864,18.5,-1


Therefore we now have the Bmag, morphological type and radial velocities for all the sources in our list with a single TAP query. 

FYI, this is what the same query looks like with the __[TapPlus library developed by ESDC](https://astroquery.readthedocs.io/en/latest/utils/tap.html)__.  A note about that library:  the "Plus" part refers to functions that are beyond the defined VO protocol and developed to work with extensions to the Gaia services.  Not all TapPlus functions will then work on all VO-compliant TAP services.  This function, however, is generic:

In [13]:
#from astroquery.utils.tap.core import TapPlus
#xamin=TapPlus(url=tap_services[0]['access_url'])
#job=xamin.launch_job(query=query, upload_resource='data/my_sources.xml', upload_table_name="mysources", verbose=True)
#result = job.get_results()
#result.pprint()

<a id="combo"></a>

# 5.  Combining data from different catalogs and cross-correlating

Our input list of sources contains galaxy pair candidates that may be interacting with each other. Therefore it would be interesting to know what the morphological type and the Bmagnitude are for the potential companions. 

In this advanced example, we want our search to be physically motivated since the criterion for galaxy interaction depends on the physical separation of the galaxies. Unlike the previous case, the search radius is not a constant, but varies for each candidate by the distance to the source. Specifically, we want to search for companions that are within 50 kpc of the candidate and therefore first need to find the angular diameter distance that corresponds to galaxy's distance (in our case the radial velocity).

Therefore, we begin by taking our table of objects and adding an angDdeg column:

In [14]:
mytable = zcattable
## The radial_velocity is in km/s, and this is just c*z, so
c=3.0e5 # km/s
redshifts=mytable['radial_velocity'].filled(0.)/c  # Filling masked values with zero
mytable['redshift']=redshifts
from astropy import units
physdist=0.05*units.Mpc # 50 kpc physical distance

## This needs scipy.  
from astropy.cosmology import Planck15
angDdist=Planck15.angular_diameter_distance(mytable['redshift'])
## angDdist is returned from the astropy.cosmology module as a Quantity object, 
##  i.e. a value and a unit.  Arctan is smart enough not to operate on quantities
##  that aren't unitless.  So angDdist.value to just get the value.
angDrad=np.arctan(physdist/angDdist)
angDdeg=angDrad*units.deg/units.rad
mytable['angDdeg']=angDdeg
mytable

ra,dec,radial_velocity,bmag,morph_type,redshift,angDdeg
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,deg
float64,float64,int32,float32,int32,float64,float64
136.00074371,21.96791867,3093,13.8,20,0.01031,0.0011097660332591
146.70334308,22.01827217,7446,14.6,-1,0.02482,0.0004691999680834
146.70334308,22.01827217,7597,15.0,-2,0.0253233333333333,0.0004601544041704
148.77805631,14.29613296,7194,15.1,2,0.02398,0.0004851417117951
175.03940162,15.32725392,3325,14.4,3,0.0110833333333333,0.0010333094157203
191.54198643,30.73226569,6651,15.0,20,0.02217,0.0005235991638639
191.54739911,30.72338192,6517,14.7,-2,0.0217233333333333,0.0005340756468748
194.91294475,28.89537435,6093,15.0,0,0.02031,0.0005702614527842
206.57163823,43.85050581,2229,15.0,-1,0.00743,0.0015345104005632
206.57996426,43.84385707,25864,18.5,-1,0.0862133333333333,0.0001452910436382


This time, rather than write the table to disk, we'll keep it in memory and give Tap.query() a "file-like" object using io.BytesIO():

In [15]:
## In memory only, use an IO stream. 
vot_obj=io.BytesIO()
print(mytable.columns)
apvot.writeto(apvot.from_table(mytable),vot_obj)
## (Reset the "file-like" object to the beginning.)
vot_obj.seek(0)
## 'uplt' is what we'll call it (for 'upload table') 
##   in the requests parameters below, or what you will:
files={'uplt':vot_obj}


<TableColumns names=('ra','dec','radial_velocity','bmag','morph_type','redshift','angDdeg')>


Now we construct and run a query that uses the new angDdeg column in every row search. Note, we also don't want to list the original candidates since we know these are in the catalog and we want rather to find any companions. Therefore, we exclude the match if the radial velocities match exactly.

This takes half a minute:

In [16]:
query="""SELECT mt.ra, mt.dec, cat.ra, cat.dec, cat.Radial_Velocity, cat.morph_type, cat.bmag 
    FROM zcat cat, tap_upload.mytable mt 
    WHERE
    contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',mt.ra,mt.dec,mt.angDdeg))=1
    and cat.Radial_Velocity > 0 and cat.radial_velocity != mt.radial_velocity
    ORDER by cat.ra"""

mytable2=Tap.query(url,query,upload_file=vot_obj,upload_name='mytable')
mytable2

ra,dec,ra_2,dec_2,radial_velocity,morph_type,bmag
float64,float64,float64,float64,int32,int32,float32
146.70334308,22.01827217,146.70334308,22.01827217,7446,-1,14.6
146.70334308,22.01827217,146.70334308,22.01827217,7597,-2,15.0


#### Therefore, by adding new information to our original data table, we could cross-correlate with the TAP.  We find that, in our candidate list, there is one true pair of galaxies!