# Accessing  HEASARC tables through the TAP with ADQL

We have used the __[Table Access Protocol](http://www.ivoa.net/documents/TAP/)__ (TAP) protocol in several other notebooks for basic queries.  Here, we expand on its usage and that of the __[Astronomical Data Query Language](http://www.ivoa.net/documents/latest/ADQL.html)__ (ADQL) that it uses.  

* [1. Basic](#basic) Table Access Protocol queries
* [2. Cross-correlating](#cc) our own catalog with a HEASARC catalog
* [3. Combining](#combo) data from multiple catalogs and cross-correlating


In [1]:
import numpy
## There are a number of relatively unimportant warnings that 
## show up, so for now, suppress them:
import warnings
warnings.filterwarnings("ignore")

## For simple astropy tables
import astropy, io, requests

## For handling ordinary astropy Tables
from astropy.table import Table

## For handling VO table type objects
from astropy.io import votable as apvot

# Use the astroquery TapPlus library.
from astroquery.utils.tap.core import TapPlus

## Use NAVO utility for Registry and Cone searches
from navo_utils.registry import Registry
from navo_utils.cone import Cone
from navo_utils.tap import Tap


<a id="basic"></a>

# 1. Basic Table Access Protocol queries

A TAP query is the most powerful way to search a catalog. Supposed you already know that you want to query the "zcat" at the HEASARC.  

A Simple Cone Search only allows you to ask for a position and radius:  

In [2]:
import astropy.coordinates as coord
coord=coord.SkyCoord.from_name("m51")
print(coord)
services=Registry.query(service_type='cone',source='heasarc%zcat')
services

<SkyCoord (ICRS): (ra, dec) in deg
    (202.469575, 47.1952583)>


waveband,short_name,ivoid,res_description,access_url,reference_url,publisher,service_type
str23,str9,str28,str1745,str77,str57,str17,str10
optical,ABELLZCAT,ivo://nasa.heasarc/abellzcat,"The all-sky ACO (Abell, Corwin and Olowin 1989, ApJS, 70, 1) Catalog of 4073 rich clusters of galaxies and 1175 southern poor or distant S-clusters has been searched for published redshifts. Data for 1059 of them were found and classified into various quality classes, e.g. to reduce the problem of foreground contamination of redshifts. Taking the ACO selection criteria for redshifts, a total of 992 entries remain, 21 percent more than ACO. Redshifts for rich clusters are now virtually complete out to a redshift z of 0.05 in the north and of 0.04 in the south. In the north, the magnitude-redshift (m_10 - z) relation agrees with that of Kalinkov et al. (1985, Astr. Nachr., 306, 283). For the southern rich clusters, minor adjustments to the m_10 - z relation of ACO are suggested, while for the S-clusters the redshifts are about 30 percent lower than estimated.",https://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=abellzcat&amp;,https://heasarc.gsfc.nasa.gov/W3Browse/all/abellzcat.html,NASA/GSFC HEASARC,conesearch
optical,CFAZ,ivo://nasa.heasarc/zcat,"The ZCAT database contains the CfA Redshift Catalog, which incorporates much of the latest velocity data from the Whipple Observatory and other sources, as well as velocities from earlier compilations such as the &amp;quot;Second Reference Catalog&amp;quot; of de Vaucouleurs, de Vaucouleurs, and Corwin; the &amp;quot;Index of Galaxy Spectra&amp;quot; of Gisler and Friel; and the &amp;quot;Catalog of Radial Velocities of Galaxies&amp;quot; of Palumbo, Tanzella-Nitti, and Vettolani. It includes BT magnitudes, some UGC numbers, and increased &amp;quot;accuracy&amp;quot; in the velocity source information. The data presented here have primarily been assembled for the purpose of studying the large scale structure of the universe, and, as such, are nearly complete in redshift information, but are not necessarily complete in such categories as diameter, magnitude, and cross-references to other catalogues.",https://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=zcat&amp;,https://heasarc.gsfc.nasa.gov/W3Browse/all/zcat.html,NASA/GSFC HEASARC,conesearch
gamma-ray#optical#x-ray,ROMABZCAT,ivo://nasa.heasarc/romabzcat,"This table contains the 5th edition of the Roma-BZCAT catalog of blazars which contains coordinates and multi-frequency data of 3561 sources. It presents several relevant changes with respect to the past editions which are briefly described in the reference paper. The Roma-BZCAT catalog contains data on 3561 sources, about 30% more than in the 1st edition, which either confirmed blazars or exhibiting characteristics close to this type of sources. With respect to the previous editions, this new edition has relevant changes in the sources' classification. The authors emphasize that all the sources in the Roma-BZCAT have a detection in the radio band. Moreover, complete spectroscopic information is published and could be accessed by the authors for all of them, with the exception of BL Lac candidates. Consequently, peculiar sources such as the so called &amp;quot;radio quiet BL Lacs&amp;quot;, which are reported in some other catalogs, are not included here because of possible contamination by hot stars and other extragalactic objects. In the 5th edition, the authors use a similar denomination for the blazars to that adopted in the previous editions. Each blazar is identified by a code, with 5BZ for all blazars, a fourth letter that specifies the type (B, G, Q or U), followed by the truncated equatorial coordinates (J2000). The authors introduced the edition number before the letters BZ to avoid possible confusion due to the fact that several sources changed their old names because of a newly adopted classification. The 5th edition contains 1151 BZB sources (92 of which are reported as candidates because their optical spectra could not be found in the literature), 1909 BZQ sources, 274 BZG sources, and 227 BZU objects.",https://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=romabzcat&amp;,https://heasarc.gsfc.nasa.gov/W3Browse/all/romabzcat.html,NASA/GSFC HEASARC,conesearch


In [3]:
## The 3rd is the one we want:
table=Cone.query(service=services[2],coords=coord,radius=1)
table[0]

source_number,name,ra,dec,redshift,redshift_flag,flux_1p4_ghz,rmag,xray_flux,object_type,Search_Offset
Unnamed: 0_level_1,Unnamed: 1_level_1,deg,deg,Unnamed: 4_level_1,Unnamed: 5_level_1,mJy,mag,erg/cm^2/s,Unnamed: 9_level_1,Unnamed: 10_level_1
int32,str25,float64,float64,float64,str2,float64,float64,float64,str24,float64
2149,[MML2015] 5BZQ J1332+4722,203.1885,47.372969,0.668,,233.0,17.4,0.0,QSO RLoud flat radio sp.,31.1436


With the TAP, you can refine the search based on any other attribute in the given catalog.  

The basics of ADQL:

* *SELECT &#42; FROM catalog as cat* says you want all ("&#42;") columns from the catalog called "catalog", which you will refer to below by the more compact name of "cat", 
* *WHERE cat.bmag < 14* says that you want to retrieve only those entries in the catalog whose bmag column has a value less than 14
* *FROM catalog1 as c1 natural join catalog2 as c2* says that you want to query two catalogs zipped together the "natural" way, i.e., by looking for a common column,
* etc.

There are many other options.  Instead of returning all columns, you can *SELECT cat.RA, cat.DEC, cat.bmag from catalog as cat...* to only return the columns you're interested in.

You can also append *ORDER by cat.bmag* to return the result sorted ascending by one of the columns, adding *DESC* to the end for descending. 

A few special functions in the ADQL allow you to query regions:

* *WHERE contains( point('ICRS', cat.ra, cat.dec), circle('ICRS', 210.5, -6.5, 0.5))=1*

is how you would ask for any catalog entries whose RA,DEC lie within a circular region defined by RA,DEC 210.5,-6.5 and a radius of 0.5 (all in degrees).  The 'ICRS' specifies the coordinate system.  

See the ADQL documentation for more.

With these basics, we do the following:

In [4]:
## Find the zcat TAP service we want:
tap_services=Registry.query(service_type='table',source='heasarc')
tap_services['access_url'][0]

'https://heasarc.gsfc.nasa.gov/xamin/vo/tap'

In [5]:
## Not sure why TapPlus doesn't work: 
##
#url=tap_services[0]['access_url']+"/sync"
#url="https://heasarc.gsfc.nasa.gov/xamin_test/vo/tap/sync"
#print(url)
#xamin_service = TapPlus(url=url,verbose=True)
#xamin_job = xamin_service.launch_job(
#    f"""SELECT ra, dec, Radial_Velocity FROM zcat as cat where 
#    contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',{coord.ra.deg},{coord.dec.deg},{1.0}))=1 and
#    cat.bmag < 14
#    order by cat.radial_velocity_error
#    """)
#xamin_results = xamin_job.get_results()
#print(xamin_results)

In [6]:
query=f"SELECT ra, dec, Radial_Velocity FROM zcat as cat where contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',{coord.ra.deg},{coord.dec.deg},1.0))=1 and cat.bmag < 14 order by cat.radial_velocity_error"
service=tap_services[0]
table=Tap.query(service,query)
table

ra,dec,radial_velocity
float64,float64,int32
202.46823227,47.19814872,474
202.49490508,47.2679204,558
202.54744757,46.66995898,2569


If you aren't sure what columns are available, get all attributes of one row and take a look:

In [7]:
url=str(tap_services['access_url'][0])
print(type(str(url)))
table=Tap.query(url,"SELECT top 1 * FROM zcat")
table

<class 'str'>


__row,name,ra,dec,lii,bii,bmag,radial_velocity,radial_velocity_error,ref_bmag,ref_radial_velocity,morph_type,bar_type,luminosity_class,structure,diameter_1,diameter_2,bt_mag,ugc_or_eso,distance,rfn_number,comments,redshift,ref_redshift,notes,class,__x_ra_dec,__y_ra_dec,__z_ra_dec
int32,str5,float64,float64,float64,float64,float32,int32,int32,str2,int32,int32,str2,int32,str2,float64,float64,float64,str6,float64,str2,str2,float64,int32,str2,int32,float64,float64,float64
1,N2573,25.40675631,-89.33515576,302.7686762,-27.77712257,--,2294,-1,,3100,4,,-1,,2.3,1.0,--,1001,--,,,--,-1,,6200,0.0049783679012001,0.0104812285064099,-0.999932677584865


(See the __[information on the zcat](https://heasarc.gsfc.nasa.gov/W3Browse/galaxy-catalog/zcat.html)__ for column information. We will use the 'radial_velocity' column rather than the 'redshift' column.)

<a id="cc"></a>
# 2. TAP:  Using the TAP to cross-correlate our objects with a catalog

Now to search all of our sources in one go, we need to upload our own table and do a 'cross-correlation' with the *zcat* table. For more on creating VOTable objects, see that notebook.  Here, we just read one in:  

This is how we'd like it to work, but for the moment, this is broken.

In [8]:
#query="""
#    SELECT cat.ra, cat.dec, Radial_Velocity 
#    FROM zcat cat, tap_upload.mysources mt 
#    WHERE
#    contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',mt.ra,mt.dec,0.1))=1
#    and Radial_Velocity > 0
#    ORDER by cat.ra"""
#url='https://heasarc.gsfc.nasa.gov/xamin_test/vo/tap'
#table=Tap.query(url,query,upload_file="../my_sources.xml",upload_name='mysources')
#table.meta

### This can be done with requests:

(These take a while, i.e. half a minute.)

In [9]:
files={'uplt':open('../my_sources.xml', 'rb')}

cc_params={
    'lang': 'ADQL', 
    'request': 'doQuery',
    'upload':'mysources,param:uplt'
    }

cc_params["query"]=query
r = requests.post(tap_services[0]['access_url']+'/sync',data=cc_params,stream=True,files=files)
#r.text
mytable=Table.read(io.BytesIO(r.content))
mytable

ra,dec,radial_velocity
float64,float64,int32
202.46823227,47.19814872,474
202.49490508,47.2679204,558
202.54744757,46.66995898,2569


###  Or with TapPlus

In [10]:
from astroquery.utils.tap.core import TapPlus
xamin=TapPlus(url=tap_services[0]['access_url'])
job=xamin.launch_job(query=query, upload_resource='../my_sources.xml', upload_table_name="mysources", verbose=True)
result = job.get_results()
result.pprint()

Created TAP+ (v1.0.1) - Connection:
	Host: heasarc.gsfc.nasa.gov
	Use HTTPS: True
	Port: 443
	SSL Port: 443
Launched query: 'SELECT  TOP 2000 ra, dec, Radial_Velocity FROM zcat as cat where contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',202.469575,47.1952583,1.0))=1 and cat.bmag < 14 order by cat.radial_velocity_error'
200 OK
[('Date', 'Fri, 01 Jun 2018 19:30:02 GMT'), ('Query-Defer', '261'), ('Content-Type', 'text/xml'), ('Transfer-Encoding', 'chunked'), ('Strict-Transport-Security', 'max-age=31536000; includeSubDomains')]
Retrieving sync. results...
Query finished.
     ra          dec     radial_velocity
------------ ----------- ---------------
202.46823227 47.19814872             474
202.49490508  47.2679204             558
202.54744757 46.66995898            2569


<a id="combo"></a>

# 3.  Combining data from different catalogs and cross-correlating
Now we'd like to take the redshift information (above, as a radial velocity) and determine a search radius to use for each galaxy based on its distance, so that we are are searching within a given physical distance. 

In [11]:
## The radial_velocity is in km/s, and this is just c*z, so
c=3.0e5 # km/s
redshifts=mytable['radial_velocity'].filled(0.)/c  # Filling masked values with zero
mytable['redshift']=redshifts
from astropy import units
physdist=0.05*units.Mpc # 50 kpc physical distance

## This needs scipy.  
from astropy.cosmology import Planck15
angDdist=Planck15.angular_diameter_distance(mytable['redshift'])
## angDdist is returned from the astropy.cosmology module as a Quantity object, 
##  i.e. a value and a unit.  Arctan is smart enough not to operate on quantities
##  that aren't unitless.  So angDdist.value to just get the value.
angDrad=numpy.arctan(physdist/angDdist)
angDdeg=angDrad*units.deg/units.rad
mytable['angDdeg']=angDdeg
mytable

ra,dec,radial_velocity,redshift,angDdeg
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,deg
float64,float64,int32,float64,float64
202.46823227,47.19814872,474,0.00158,0.007164315980361
202.49490508,47.2679204,558,0.00186,0.0060879425744679
202.54744757,46.66995898,2569,0.0085633333333333,0.0013332722231319


This time, rather than write the table to disk, we'll keep it in memory and give requests a "file-like" object using io.BytesIO():

In [12]:
## In memory only, use an IO stream. 
vot_obj=io.BytesIO()
print(mytable.columns)
apvot.writeto(apvot.from_table(mytable),vot_obj)
## (Reset the "file-like" object to the beginning.)
vot_obj.seek(0)
## 'uplt' is what we'll call it (for 'upload table') 
##   in the requests parameters below, or what you will:
files={'uplt':vot_obj}


<TableColumns names=('ra','dec','radial_velocity','redshift','angDdeg')>


This takes half a minute:

In [13]:
cc_params={
    'lang': 'ADQL', 
    'request': 'doQuery',
    'upload':'mytable,param:uplt'
    }
## This is your ADQL query, where "mytable" here has to 
##  match what you specified in the upload parameter above
cc_params["query"]="""
    SELECT cat.ra, cat.dec, cat.Radial_Velocity 
    FROM zcat cat, tap_upload.mytable mt 
    WHERE
    contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',mt.ra,mt.dec,mt.angDdeg))=1
    and cat.Radial_Velocity > 0
    ORDER by cat.ra"""
## The name you give here (tab1) matches what's in the cc_params
r = requests.post('https://heasarc.gsfc.nasa.gov/xamin/vo/tap/sync',data=cc_params,stream=True,files=files)
mytable=Table.read(io.BytesIO(r.content))
mytable

ra,dec,radial_velocity
float64,float64,int32
202.46823227,47.19814872,474
202.49490508,47.2679204,558
202.54744757,46.66995898,2569
