# Registry-powered Searches

Archives register their data sets so that programs can discover them: http://vao.stsci.edu/keyword-search/

The link above is a GUI, so you can play around with it. 

There is also an applications program interface (API) to this service, so that programs can send queries and retrieve search results automatically. The standard search API is a Table Access Protocol service on a relational data model for registries, which is a very low-level interface. Here we have created several simplified tools for common, useful queries into the registry.


In [1]:
from astroquery.vo import Registry


## For handling ordinary astropy Tables
from astropy.table import Table, vstack

## There are a number of relatively unimportant warnings that 
## show up, so for now, suppress them:
import warnings
warnings.filterwarnings("ignore")

ModuleNotFoundError: No module named 'astroquery.vo'

Registry query methods exist in an astroquery.vo.Registry() class with different levels of simplicity and power. For example, if you already know you want to search NED, you can get related service URLs as follows. Note that you may get *more* results than you expect, some of which should be easily differentiated by inspection.

In [2]:
results = Registry.query(source='ned', service_type='cone', verbose=True)
print('Found {} results:'.format(len(results)))
print(results[:]['access_url'])
print(results[1]['ivoid'])
print(results.columns)

Registry:  sending query ADQL = 
          select res.waveband,res.short_name,cap.ivoid,res.res_description,
          intf.access_url,res.reference_url,res_role.role_name as publisher,cap.cap_type as service_type
          from rr.capability as cap
            natural join rr.resource as res
            natural join rr.interface as intf
		    natural join rr.res_role as res_role
             where cap.cap_type='conesearch' and cap.ivoid like '%ned%' and res_role.base_role = 'publisher'

Queried: http://vao.stsci.edu/RegTAP/TapService.aspx/sync

Found 2 results:
                                              access_url                                              
------------------------------------------------------------------------------------------------------
                                                https://irsa.ipac.caltech.edu/SCS?table=shelacomb&amp;
http://ned.ipac.caltech.edu/cgi-bin/NEDobjsearch?search_type=Near+Position+Search&amp;of=xml_main&amp;
ivo://ned.ipac/basi

The Registry.query() method takes arguments which we can use to further filter the results (passed to internal function _build_adql):  

    service_type : "conesearch", "simpleimageaccess", "simplespectralaccess", "tableaccess". May be shortened to "cone", "image", "spectr", or "table" or "tap", respectively.
    keyword      : any keyword contained in ivoid, title, or description
    waveband     : waveband string. Multiple options may be comma-delimited i.e. "optical, infrared"
    source       : any substring in ivoid
    publisher    : the name of any publishing organization
    order_by     : what field to order it by, but then you have to know the names, currently
                    ("waveband","short_name","ivoid","res_description","access_url","reference_url","publisher", service_type")
    logic_string : any other string you want to add to the ADQL where clause, should start with " and "

The results are returned by Registry.query() in an astropy table using the conversion function _astropy_table_from_votable_response(). 



## Waveband Allowed Terms and values in (A)
* gamma-ray:	 	less than 0.1
* X-ray:	 	0.1-100
* EUV:	 	100-1000
* UV:	 	1,000-3,000
* Optical:	 	3,000-10,000
* Infrared:	 	10,000-1,000,000
* Millimeter:	 	10^6 - 10^8
* Radio:	 	over 10^8






The Registry.query_counts() method takes arguments which we can use to see which keyword values might help us narrow down our search, or possibly give us too MANY results (these are passed to internal function _build_counts_adql):

    field      : keyword field for which to see popular values: "waveband", "publisher", "service_type" currently supported.
    minimum    : A minimum count of occurences for the keyword value to use as a cutoff (optional, defaults to 1)

In [3]:
results = Registry.query_counts('publisher', 15, verbose=True)
print(results)

Registry:  sending query ADQL = select * from (select role_name as publisher, count(role_name) as count_publisher from rr.res_role where base_role = 'publisher'  group by role_name) as count_table where count_publisher >= 15 order by count_publisher desc

Queried: http://vao.stsci.edu/RegTAP/TapService.aspx/sync

                         publisher                           count_publisher
------------------------------------------------------------ ---------------
                                                         CDS           17286
                                           NASA/GSFC HEASARC            1041
                          NASA/IPAC Infrared Science Archive             521
                                            The GAVO DC team             159
                   Space Telescope Science Institute Archive             101
      WFAU, Institute for Astronomy, University of Edinburgh              99
                                                     SVO CAB         

With a 'publisher' field to work from, we can get a narrowed down query:

In [4]:
results = Registry.query(source='ned', publisher='Extragalactic Database', service_type='cone', verbose=True)
print('Found {} results:'.format(len(results)))
print(results[:]['access_url'])

Registry:  sending query ADQL = 
          select res.waveband,res.short_name,cap.ivoid,res.res_description,
          intf.access_url,res.reference_url,res_role.role_name as publisher,cap.cap_type as service_type
          from rr.capability as cap
            natural join rr.resource as res
            natural join rr.interface as intf
		    natural join rr.res_role as res_role
             where cap.cap_type='conesearch' and cap.ivoid like '%ned%' and res_role.base_role = 'publisher' and res_role.role_name like '%Extragalactic Database%'

Queried: http://vao.stsci.edu/RegTAP/TapService.aspx/sync

Found 1 results:
                                              access_url                                              
------------------------------------------------------------------------------------------------------
http://ned.ipac.caltech.edu/cgi-bin/NEDobjsearch?search_type=Near+Position+Search&amp;of=xml_main&amp;



Note we will need to URL-decode the access_url information in our results, as the registry resource standard expects it be encoded.

In [5]:
from html import unescape

for result in results:
    print(unescape(result['access_url']))

http://ned.ipac.caltech.edu/cgi-bin/NEDobjsearch?search_type=Near+Position+Search&of=xml_main&


# Appendix: Documentation on the Standards

### Table Access Protocol 
* IVOA standard for RESTful web service access to tabular data
* http://www.ivoa.net/documents/TAP/


### Registry Relational Schema
* IVOA standard for modeling registry metadata for querying with TAP
* http://www.ivoa.net/documents/RegTAP/


### Astronomical Query Data Language (2.0)
* IVOA standard for querying astronomical data in tabular format, with geometric search support
* http://www.ivoa.net/documents/latest/ADQL.html

