# Registry-powered and Table Searches


If NAVO develops astroquery.vo, we could use things like the following. This is a summary of what is below in more detail. 

RegTAP:  

    query_results=astroquery.vo.Registry.query( ... lots of options, this already exists in our github ...)
    heasarc_image_services=astroquery.vo.Registry.list_image_services(source='heasarc') 

TAP:

    CAOM URL from Registry.query(keyword='caom',service_type='table', publisher='Space Telescope') then:
    CAOM_service = TapPlus(url=tap_url)
    job = CAOM_service.launch_job("""
            SELECT top 10 s_ra, s_dec, access_estsize, access_url FROM ivoa.Obscore 
            WHERE CONTAINS(POINT('ICRS', 16.0, 40.0),s_region)=1
            AND obs_collection = "GALEX" AND dataproduct_type = 'image'
          """)
     CAOM_results = job.get_results()

In [1]:
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline  
import requests, io, astropy
from IPython.display import Image, display

## For handling ordinary astropy Tables
from astropy.table import Table, vstack

## For reading FITS files
import astropy.io.fits as apfits

## There are a number of relatively unimportant warnings that 
## show up, so for now, suppress them:
import warnings
warnings.filterwarnings("ignore")

## our stuff
import sys
# Use the NASA_NAVO/astroquery
from navo_utils.cone import Cone
from navo_utils.registry import Registry

Registry query methods exist in an astroquery.vo.Registry() class with different levels of simplicity and power. So you can, for example, if you already know you want to search NED, get related service URLs as follows. Note that you may get *more* results than you expect, which should be easily differentiated by a human.

In [2]:
results = Registry.query(source='ned', service_type='cone', debug=True)
print('Found {} results:'.format(len(results)))
print(results[:]['access_url'])
print(results[1]['ivoid'])
print(results.columns)

Registry:  sending query ADQL = 
          select res.waveband,res.short_name,cap.ivoid,res.res_description,
          intf.access_url,res.reference_url,res_role.role_name as publisher,cap.cap_type as service_type
          from rr.capability as cap
            natural join rr.resource as res
            natural join rr.interface as intf 
		    natural join rr.res_role as res_role
             where cap.cap_type='conesearch' and cap.ivoid like '%ned%' and res_role.base_role = 'publisher'

Queried: http://vao.stsci.edu/RegTAP/TapService.aspx/sync

Found 2 results:
                                              access_url                                              
------------------------------------------------------------------------------------------------------
                                                https://irsa.ipac.caltech.edu/SCS?table=shelacomb&amp;
http://ned.ipac.caltech.edu/cgi-bin/NEDobjsearch?search_type=Near+Position+Search&amp;of=xml_main&amp;
ivo://ned.ipac/bas

The Registry.query() method takes arguments which we can use to further filter the results (passed to internal function _build_adql):  

    service_type : "conesearch", "simpleimageaccess", "simplespectralaccess", "tableaccess". May be shortened to "cone", "image", "spectr", or "table" or "tap", respectively.
    keyword      : any keyword contained in ivoid, title, or description
    waveband     : waveband string. Multiple options may be comma-delimited i.e. "optical, infrared"
    source       : any substring in ivoid
    publisher    : the name of any publishing organization
    order_by     : what field to order it by, but then you have to know the names, currently
                    ("waveband","short_name","ivoid","res_description","access_url","reference_url","publisher", service_type")
    logic_string : any other string you want to add to the ADQL where clause, should start with " and "

The results are returned by Registry.query() in an astropy table using the conversion function _astropy_table_from_votable_response(). 

The Registry.query_counts() method takes arguments which we can use to see which keyword values might help us narrow down our search, or possibly give us too MANY results (these are passed to internal function _build_counts_adql):

    field      : keyword field for which to see popular values: "waveband", "publisher", "service_type" currently supported.
    minimum    : A minimum count of occurences for the keyword value to use as a cutoff (optional, defaults to 1)

In [9]:
results = Registry.query_counts('publisher', 15, debug=True)
print(results)

Registry:  sending query ADQL = select * from (select role_name as publisher, count(role_name) as count_publisher from rr.res_role where base_role = 'publisher'  group by role_name) as count_table where count_publisher >= 15 order by count_publisher desc

Queried: http://vao.stsci.edu/RegTAP/TapService.aspx/sync

                         publisher                           count_publisher
------------------------------------------------------------ ---------------
                                                         CDS           17148
                                           NASA/GSFC HEASARC            1039
                          NASA/IPAC Infrared Science Archive             520
                                            The GAVO DC team             159
                   Space Telescope Science Institute Archive             101
      WFAU, Institute for Astronomy, University of Edinburgh              99
                                                     SVO CAB         

With a 'publisher' field to work from, we can get a narrowed down query:

In [4]:
results = Registry.query(source='ned', publisher='Extragalactic Database', service_type='cone', debug=True)
print('Found {} results:'.format(len(results)))
print(results[:]['access_url'])

Registry:  sending query ADQL = 
          select res.waveband,res.short_name,cap.ivoid,res.res_description,
          intf.access_url,res.reference_url,res_role.role_name as publisher,cap.cap_type as service_type
          from rr.capability as cap
            natural join rr.resource as res
            natural join rr.interface as intf 
		    natural join rr.res_role as res_role
             where cap.cap_type='conesearch' and cap.ivoid like '%ned%' and res_role.base_role = 'publisher' and res_role.role_name like '%Extragalactic Database%'

Queried: http://vao.stsci.edu/RegTAP/TapService.aspx/sync

Found 1 results:
                                              access_url                                              
------------------------------------------------------------------------------------------------------
http://ned.ipac.caltech.edu/cgi-bin/NEDobjsearch?search_type=Near+Position+Search&amp;of=xml_main&amp;



Note we will need to URL-decode the access_url information in our results, as the registry resource standard expects it be encoded.

In [5]:
from html import unescape

for result in results:
    print(unescape(result['access_url']))

http://ned.ipac.caltech.edu/cgi-bin/NEDobjsearch?search_type=Near+Position+Search&of=xml_main&


# 11. TAP

First, look up table services in the registry, go through the results, find the one you want.
  

Then assuming you know how to construct ADQL logic, you can query the service with the astroquery utility TapPlus. TapPlus was created as a library layer under the GAIA archive, but works for all TAP services. Documentation at: https://github.com/astropy/astroquery/blob/master/docs/utils/tap.rst


In general, one opens a connection to the service URL. Then, if one does not already know the database table information associated with the service, one can ask the service for it. Since this is a standard service using the known CAOM and ObsCore data models, we know this information from http://www.ivoa.net/documents/ObsCore/.

Then one can build and run the main query, either synchronously or asynchronously. We'll do a synchronous call. Geometrically based queries allow one to do the equivalent of a cone search or bounded-box footprint search, but not be limited to only cone search filtering OR predefined returned columns. 


In [15]:
tap_services_CAOM=Registry.query(keyword='caom',service_type='table', publisher='Space Telescope')
print('Found {} results:'.format(len(tap_services_CAOM)))
tap_url = unescape(tap_services_CAOM[0]['access_url'])
print(tap_url) 

from astroquery.utils.tap.core import TapPlus
CAOM_service = TapPlus(url=tap_url)


Found 1 results:
http://vao.stsci.edu/CAOMTAP/TapService.aspx
Created TAP+ (v1.0.1) - Connection:
	Host: vao.stsci.edu
	Use HTTPS: False
	Port: 80
	SSL Port: 443


In [11]:
job = CAOM_service.launch_job("""
    SELECT * FROM ivoa.Obscore 
    WHERE CONTAINS(POINT('ICRS', 16.0, 40.0),s_region)=1
  """)
CAOM_results = job.get_results()
print(CAOM_results)

Launched query: '
    SELECT  TOP 2000 * FROM ivoa.Obscore 
    WHERE CONTAINS(POINT('ICRS', 16.0, 40.0),s_region)=1
  '
Retrieving sync. results...
Query finished.
dataproduct_type calib_level obs_collection ... facility_name instrument_name
---------------- ----------- -------------- ... ------------- ---------------
           image           2          GALEX ...       CALTECH           GALEX
           image           2          GALEX ...       CALTECH           GALEX
           image           2          GALEX ...       CALTECH           GALEX
           image           3            PS1 ...           IfA            GPC1
           image           3            PS1 ...           IfA            GPC1
           image           2          GALEX ...       CALTECH           GALEX
           image           2          GALEX ...       CALTECH           GALEX
           image           2          GALEX ...       CALTECH           GALEX
           image           2          GALEX ...       C

Since this is TAP and not just a cone search, we can narrow down our query to, in this case, GALEX images. And we can specify only the returned column information we want. So to only get the position, estimated file size, and actual link to data for each returned result, one could form the ADQL as such:

In [7]:
job = CAOM_service.launch_job("""
    SELECT top 10 s_ra, s_dec, access_estsize, access_url FROM ivoa.Obscore 
    WHERE CONTAINS(POINT('ICRS', 16.0, 40.0),s_region)=1
    AND obs_collection = "GALEX" AND dataproduct_type = 'image'
  """)
CAOM_results = job.get_results()
print(CAOM_results[1])

Launched query: '
    SELECT top 10 s_ra, s_dec, access_estsize, access_url FROM ivoa.Obscore 
    WHERE CONTAINS(POINT('ICRS', 16.0, 40.0),s_region)=1
    AND obs_collection = "GALEX" AND dataproduct_type = 'image'
  '
Retrieving sync. results...
Query finished.
      s_ra             s_dec        access_estsize                                                                                         access_url                                                                                        
---------------- ------------------ -------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
15.8587956991105 40.445530075031598        9841662 https://mast.stsci.edu/portal/Download/file?uri=http://galex.stsci.edu/data/GR6/pipe/01-vsn/22075-GI2_033048_M31_E_Axis_4/d/01-main/0001-img/07-try/GI2_033048_M31_E_Axis_4-fd-int.fits.gz
