# Brainstorming for astroquery.vo needs


If NAVO develops astroquery.vo, we could use things like the following. This is a summary of what is below in more detail. 

RegTAP:  

    query_results=Registry.query( ... lots of options, this already exists in our github ...)
    heasarc_image_services=Registry.list_image_services(source='heasarc') 

SCS:

    ned_services=Registry.query.list_cone_services(source='ned')
    ned_results=Cone.query(ras,decs,radius,ned_services)
    
SIA:  
    
    images_info=Image.query(heasarc_image_services) 
    plt.imshow( Image.get( images_info[0] )
    Image.get( images_info[0], filename='image.fits')

SSA:

    Same as SIA basically. 

TAP(?):

    tap_services_2mass=Registry.query(keyword='2mass',service_type='table')[0]
    tap_results=Tap.query(
        source=tap_services_2mass[32],
        logic_string='CONTAINS(POINT('J2000',ra,dec),CIRCLE('J2000',9.90704,8.96507,0.001))'
        )

Going through Vandana's presentation to make a list of functions we could use, with one basic version of a Cone class defined to play with.

In [1]:
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline  
import requests, io, astropy
from IPython.display import Image, display

## For handling ordinary astropy Tables
from astropy.table import Table, vstack

## For reading FITS files
import astropy.io.fits as apfits

## There are a number of relatively unimportant warnings that 
## show up, so fix later but for now, suppress them:
import warnings
warnings.filterwarnings("ignore")

## our stuff
# Use the NASA_NAVO/astroquery
from navo_utils.cone import Cone
from navo_utils.registry import Registry

# For debugging in WingIDE:

#import wingdbstub
#wingdbstub.Ensure()

Registry queries already coded by TomD and TJ in astroquery.vo.Registry() class. So you can, for example, if you already know you want to search NED, get it's URL as follows. Unfortunately, *with the current implementation, you get two results, where the second isn't NED but has "ned" in the ivoid ("shela_combined"). Not sure what to do about that.* Could hard-wire things like "ned", "heasarc", etc. But that's not ideal. 

In [2]:
results = Registry.query(source='heasarc', service_type='cone',debug=True)
print('Found {} results:'.format(len(results)))
print(results[:]['access_url'])
print(results[1]['ivoid'])
print(results.columns)

Registry:  sending query ADQL = 
          select res.waveband,res.short_name,cap.ivoid,res.res_description,
          intf.access_url, res.reference_url
          from rr.capability as cap
          natural join rr.resource as res
          natural join rr.interface as intf
           where cap.cap_type='conesearch' and cap.ivoid like '%heasarc%'

Queried: http://vao.stsci.edu/RegTAP/TapService.aspx/sync

Found 917 results:
                                  access_url                                  
------------------------------------------------------------------------------
https://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=ngc2362cxo&amp;
https://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=ngc2403cx2&amp;
https://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=fer2fusrid&amp;
https://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=maghmxbcat&amp;
   https://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=ngc2000&amp;
https://heasarc.gs

The Registry.query() method takes arguments (passed to internal function _build_adql):  

    service_type   : "image", "cone", or "spectr"
    keyword        : any keyword contained in ivoid, title, or description
    waveband       : waveband string
    source         : any substring in ivoid
    order_by       : what field to order it by, but then you have to know the names, currently
                      ("waveband","short_name","ivoid","res_description", "access_url", "reference_url")
    logic_string   : any other string you want to add to the ADQL where clause, should start with " and "

The results are already in an astropy table from Tom's _astropy_table_from_votable_response(). 

**But note that the URLs are escaped and should not be by the time we get them back. How to fix?**

In [3]:
import html
print(results[0]['access_url'])
print(html.unescape(results[0]['access_url']))

https://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=ngc2362cxo&amp;
https://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=ngc2362cxo&


### 3. Workshop section on data discovery using NED's Cone search. 

Instead of searching NED ‘manually’, a generic cone search that you can give a list of ras, decs, and radii (or just one obviously) and optionally specify that you want ‘ned’ results or some other IVOID substring. It queries the RegTAP to find out what cone searches are available that match what you asked for. (Is there a way to get the NED URL dynamically from RegTAP without the above ambiguity? A special case if the ivoid requested is "ned"?) So the user would call:

    cone_results = astroquery.Cone.query(ras, decs, radii, [source=’some_ivoid_string_eg_ned', waveband=etc.]) 

where ras, decs, and radii can be floats, strings, or arrays of either. If a single source (i.e., ivoid), then get back a table of objects; if several matching sources (which will have different columns?), get back a list of tables, one for each matching source? Since every table will return different columns, need to return some kind of meta data result as well. Separate object or attached to each result column's meta data? Or standardize the tables as discussed below?

Note that with kwargs, you can pass through any parameters to the Registry.query() call.

So like the Registry, we need a Cone work as follows:

In [4]:
#  Single arguments:  should take floats or strings, converts floats to string for the query.
#  For now, make them all arrays until we sort the above issue
cone_services=Registry.query(service_type='cone',source='ned')
print(cone_services[:]['short_name'])
print(cone_services[0]['access_url'])
print(type(cone_services[-1]) is astropy.table.row.Row)


  short_name  
--------------
SHELA_Combined
  NED(sources)
https://irsa.ipac.caltech.edu/SCS?table=shelacomb&amp;
True


In [5]:
x=Table([{'access_url':cone_services[0]['access_url']}])
print(type(x))
print(x)

<class 'astropy.table.table.Table'>
                      access_url                      
------------------------------------------------------
https://irsa.ipac.caltech.edu/SCS?table=shelacomb&amp;


We would like the most generic way to pass coordinates to SkyCoord's constructor. See [parse_coords_examples.ipynb](parse_coords_examples.ipynb)

(On 20180412, below NED example was not working. Even though the URL has of=xml_main, it is returning HTML. No idea what's going on with that service, but they seem to be messing with it. Wait. Worked on 13th.)

In [6]:
# Fudge the URL, since it's currently incorrect in the registry (!)
coords=[[185.47873,4.47365],[35.323,6.934]]
radius=0.03

# The registry entry is broken?
#results=Cone.query(coords,radius,cone_services[0])
# http://ned.ipac.caltech.edu/cgi-bin/NEDobjsearch?search_type=Near+Position+Search&amp;
# http://ned.ipac.caltech.edu/cgi-bin/NEDobjsearch?search_type=Near+Position+Search&amp;of=xml_main&amp;
# Hard-wire the URL it ought to be with the 'of=xml_main&' 
results=Cone.query(
    coords,radius,
    "http://ned.ipac.caltech.edu/cgi-bin/NEDobjsearch?search_type=Near+Position+Search&amp;of=xml_main&amp;")
#  Note that the meta data isn't getting correctly merged. TO BE FIXED.
#print(ned_results[0].meta['url'])
#print(ned_results[0].meta["xml_raw"])

Found 1 services to query.
    Querying service http://ned.ipac.caltech.edu/cgi-bin/NEDobjsearch?search_type=Near+Position+Search&of=xml_main&
    Got 494 results for source number 0
    Got 7 results for source number 1


In [7]:
#print(results[0][0].meta)

Or if you don't know what the source is but you want to do a cone search on all catalogs related to some search term like waveband:

(Note that for safety, by default, if the Registry query gets more than 10 services, the Cone query will not go through. If you're sure, then you can reset that with max_services=N.) 

In [8]:
#gamma_services=Registry.query(waveband='gamma',service_type='cone',debug=True,keyword='swift',source='heasarc')
#results=Cone.query(coords,0.01,gamma_services)
## is equivalent to
results=Cone.query(coords,0.01,waveband='gamma',debug=True,keyword='swift',source='heasarc')
print("Got results from {} different services.".format(len(results)))

Registry:  sending query ADQL = 
          select res.waveband,res.short_name,cap.ivoid,res.res_description,
          intf.access_url, res.reference_url
          from rr.capability as cap
          natural join rr.resource as res
          natural join rr.interface as intf
           where cap.cap_type='conesearch' and cap.ivoid like '%heasarc%' and res.waveband like '%gamma%' and 
             (res.res_description like '%swift%' or
            res.res_title like '%swift%' or
            cap.ivoid like '%swift%') 
            

Queried: http://vao.stsci.edu/RegTAP/TapService.aspx/sync

Found 8 services to query.
    Querying service https://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=swiftbalog&
    Got 42 results for source number 0
    (Got no results for source number 0)
    Querying service https://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=swiftgrb&
    (Got no results for source number 1)
    (Got no results for source number 1)
    Querying service https

In [16]:
## Look at results. Queries with no results should have empty tables. 
##  But there should be an entry in the list of lists regardless. 
print("Resulting list has {} elements (i.e., from that many services queried).".format(len(results)))
for i,r in enumerate(results):
    print("    Entry {} has {} elements (i.e., from that many sources queried).".format(i,len(r)))
    for j,t in enumerate(r):
        print("        Query of service {} for object {} returned {} rows (i.e., from that many results).".format(i,j,len(t)))
    

Resulting list has 8 elements (i.e., from that many services queried).
    Entry 0 has 2 elements (i.e., from that many sources queried).
        Query of service 0 for object 0 returned 42 rows (i.e., from that many results).
        Query of service 0 for object 1 returned 0 rows (i.e., from that many results).
    Entry 1 has 2 elements (i.e., from that many sources queried).
        Query of service 1 for object 0 returned 0 rows (i.e., from that many results).
        Query of service 1 for object 1 returned 0 rows (i.e., from that many results).
    Entry 2 has 2 elements (i.e., from that many sources queried).
        Query of service 2 for object 0 returned 0 rows (i.e., from that many results).
        Query of service 2 for object 1 returned 0 rows (i.e., from that many results).
    Entry 3 has 2 elements (i.e., from that many sources queried).
        Query of service 3 for object 0 returned 18 rows (i.e., from that many results).
        Query of service 3 for object 1 ret

In [9]:
## Test if you get too many:
results=Cone.query(coords,0.01,waveband='gamma',debug=True,keyword='swift')

Registry:  sending query ADQL = 
          select res.waveband,res.short_name,cap.ivoid,res.res_description,
          intf.access_url, res.reference_url
          from rr.capability as cap
          natural join rr.resource as res
          natural join rr.interface as intf
           where cap.cap_type='conesearch' and res.waveband like '%gamma%' and 
             (res.res_description like '%swift%' or
            res.res_title like '%swift%' or
            cap.ivoid like '%swift%') 
            

Queried: http://vao.stsci.edu/RegTAP/TapService.aspx/sync



AssertionError: ERROR: You're asking to query more than 69 services; max_services is set to 10. If you really want to do more, then set the max_services parameter to a larger number.

In [None]:
#print(results)

*Notes: the _astropy_table_from_votable_response() should then be generic, not just in Registry class*

But these come with different columns:

In [None]:
print(results[0][0].columns)
print(results[-1][-1].columns)


There's only a UCD in some columns, and it depends on the service. 

**So to merge into standard tables, perhaps go through looking for UCDs or UTYPEs and renaming any columns with them with that standard name. Then do the merge.**  If you use the default 'outer' join, you'll end up with lots of columns where rows from different services use differnet columns and the others remain empty. Give the user the option to do an 'inner' join, and you'll end up only with columns that are common to all results, probably only the ones with the UCDs. 


Started to define a function for the second part of this cell in the workshop notebook that got the pass bands from NED. This is very NED-specific. Any way to generalize?

    ned_info = astroquery.get_ned_info( ra, dec, radii )

calls the cone search and passes the ACREF for each match to NED again to get the info. But ACREF isn't a required value returned by a cone search. All that's required is the ID, RA, and DEC. So I don't think this can be generalized.


## 4. 

    sia_services = Registry.list_sia_services( [source=’ivoid_string_eg_heasarc'] , [keyword=‘allwise’], [waveband=‘whatever’] ... ) 

This one can easily be generalized so you can get images from any service (or a chosen one) using the same options as the Registry.query(), i.e., waveband, keyword, etc. 

It returns a table of information, including the ‘access_url’ that you can then plug into another generic function




In [None]:
def list_image_services(**kwargs):
    return astroquery.vo.Registry(service_type="image",**kwargs)
def list_spectra_serices(**kwargs):
    return astroquery.vo.Registry(service_type="spectra",**kwargs)
def list_cone_serices(**kwargs):
    return astroquery.vo.Registry(service_type="cone",**kwargs)
#  This one isn't in Registry yet, but presumably can be added.
def list_tap_services(**kwargs):
    return astroquery.vo.Registry(servic_type="tap",**kwargs)

## 5 SIA

Then pick one of the listed services (say number 20, after you looked at the descriptions) and query it to get the URL to an image at a given coordinate.

    image_info = Image.query(coords, 0, access_url=sia_services[20][‘access_url’]]

or perhaps you don't know which service is quite what you want, so get info for all of them:

    images_info = Image.query(coords, 0,  access_url=sia_services[:][‘access_url’]]

to get a table list of all images in a list of services that contain that point. 

The standardize() function would be a version of _astropy_table_from_votable_response.

Or, like with the Cone search above, the user doesn't give a service but just asks for information on images matching whatever criteria:

    heasarc_images_info = Image.query(coords, ‘0’, naxis=‘300,300’, service='heasarc')
    uv_images_info = Image.query(coords, ‘0’, naxis=‘300,300’, waveband='uv')
    2mass_images_info = Image.query(coords,‘0’, naxis=‘300,300’, keyword='2mass')


## 7-8. Retrieving images

You can look at the images_info and pick one to download:

    Image.get( images_info[6], filename='my_file.fits')
    
or get the image data to hand to the plotter:

    image=Image.get( images_info[6] )
    plt.imshow( image,  cmap='gray', origin='lower',vmax=0.02 )
    
where it downloads it to filename if specified, or to a temporary filename if not and then reads it in and returns the image data if not.

Or hand a list and return a list of images:

    images=Image.get(images_info )
    plt.imshow( images[0],  cmap='gray', origin='lower',vmax=0.02 )

In [None]:
class ImageClass(BaseQuery):
    
    def query(self, **kwargs):
        """Get information on what images are available"""
        services=Registry.query(service_type='image',**kwargs)
        # Like the Cone class above, collect results...
        return 
    
    def get(self, image_url , filename=''):
        """Returns the data that can be handed to plt.imshow() from a URL
    
        For now, input URL. But could just get a list of URLs or a 
        list of tables that have an 'access_url' column.
        """
        if filename is '':
            filename='tmp.fits'
        self._download( image_url, filename=filename)
        if filename == 'tmp.fits':
            hdus=astropy.io.fits.open('tmp.fits')
            # Which extension? TBD
            return hdus[0].data
        else:
            return

    def _download(self, url, filename=''):
        # simple wrapper of urllib
        return
    
#class SpectralClass(ImageClass)

## 10. 

This is now easy:

In [None]:
services=Registry.query(service='cone',keyword='chandra')
chandra_results=Cone.query(coords,10,services)
len(chandra_results)

# 11. TAP

This currently doesn't work but should be perfectly doable:

    tap_services_2mass=Registry.query(keyword='2mass',service_type='table')
    
Look through the results, find the one you want, then assuming you know how to construct ADQL logic and you know the names of the columns in the catalog you're searching:

    tap_results=Tap.query(
        source=tap_services_2mass[32],
        logic_string='CONTAINS(POINT('J2000',ra,dec),CIRCLE('J2000',9.90704,8.96507,0.001))'
        )

is the equivalent to a cone search, but you could do whatever you wanted. If you didn't know what TAP service you wanted, you probably couldn't do this (as above for images, where you can get image information from all services in the registry). The reason is that the TAP query would depend on the column names, and they are not common.

On the other hand, since people have to know how to use ADQL and know the columns of the catalog they're interested in, it's not clear we can add much value with a wrapper.