#  Using DataLink to retrieve associated data products

In this example, we will query an SSA service for spectra, but then we will look at what products are associated with each spectrum using DataLink.


In [None]:
import pyvo as vo
from astropy.coordinates import SkyCoord
from IPython.display import Image as ipImage, display, HTML
import os
import warnings
warnings.filterwarnings('ignore', '.*Unknown element mirrorURL.*', vo.utils.xml.elements.UnknownElementWarning)
    

In [None]:
#  Pick an object
pos = SkyCoord.from_name('NGC 2264')

### Simple example

See what the Registry has for SSA services that mention a datalink:

In [None]:
service=vo.regsearch(keywords=['datalink'],servicetype='ssa')[0]
service.describe()

In [None]:
#  Search for spectra:
results=service.search(pos=pos,size=0.1)
results.to_table()

The above table shows the results of the SSA query, i.e. the spectra we can download and their metadata.  

But each of these results may have further associated data through DataLink.  <font color=red>(How should this be tested for?  As below, if you call getdatalink() on a service that doesn't have one, it doesn't exit gracefully IMO.)</font>  Let's see what else is available through DataLink for *one* of these spectra:

In [None]:
links=results[2].getdatalink()
links.to_table()

This shows that for *each* of the spectra returned by our SSA query, there are four linked datasets that we might be interested in.  

Note that, as usual, the structure of the 'links' object used above isn't actually a table, since inside it's based on a set of XML objects.  But you can can browse it as a table by using the to_table() method, and to access each of the linked datasets, you can iterate over them as through they are rows.  (Be careful;  to_table() doesn't consistently return rows in the same order as the iterator over the results object would.  Don't use row numbers but check attributes instead.)  

For example, from the description and content_type columns above, we see that this service offers PNG previews of the spectra as the last of those four linked objects.  Let's take a look at one:     

In [None]:
display(ipImage(data=links[3].getdataset().data))

### Accessing multiple datalinks

Now let's suppose we didn't look at just one catalog result and ourself inspect what linked objects it has.  We can instead have a script iterate through all 19 catalog results, and look for *any* linked object that is of type image/png.  You could build your own for loop on the results above like this, sending a query to get the datalink for each row of your results:  

In [None]:
#for result in results:
#    link=result.getdatalink()
#    ...

But there is a more efficient iterator demonstrated below.  

Under the hood, there is (at most) one DataLink resource returned with the results object by the TAP service with each VOTable it returns.  That resource tells PyVO whether there's a DataLink and how to construct the DataLink query for each of the rows in the result. This iterator is more efficient than writing your own loop, because it collects the different DataLink calls into one query for the server and then gives you an iterator for the results. So it looks almost the same, but generally this will  be faster, (though there may be circumstances in which one method is preferable to another depending on the sizes of the jobs and the server).  

In [None]:
#  Iterate over the TAP results calling getdatalink() for each
cnt=0
for links in results.iter_datalinks():
    print(f"On {cnt}th datalink")
    cnt+=1
    #  Then look at all the linked objects for each TAP result
    #   and find the type we want
    for link in links:
        if link.content_type == "image/png":
            display(ipImage(data=link.getdataset().data))

### Error handling

Let's take a look at a different service:

In [None]:
service=vo.regsearch(servicetype='sia',keywords=['uvot'])[0]

In [None]:
results=service.search(pos=pos,size=0.1)
results.to_table()

<font color=red>SkyView doesn't give any DataLink.  Where should this error checking be done?  This should return None rather than an exception.  Though buried in the two exceptions is a sensible </font>

    DALServiceError: No Adhoc Service with ivo-id b'ivo://ivoa.net/std/datalink'!

<font color=red>Is there a way to check a TAPResults or SIAResults object for whether there's any DataLink before calling this? </font>

In [None]:
try:
    results[0].getdatalink()
except Exception as e:
    print(f"Exception:  {e}")

### A more complicated example


Let's try HEASARC's DataLink service through a TAP query to the Chandra master catalog:

In [None]:
#  Get the HEASARC TAP resource from the Registry
services=vo.regsearch(servicetype='tap',keywords=['heasarc'])
#  Construct a query to get objects near our source:
query="""SELECT * FROM chanmaster WHERE 
        1=CONTAINS(POINT('ICRS', ra, dec),
        CIRCLE('ICRS', {}, {}, 1))""".format(pos.ra.deg,pos.dec.deg)
results = services[0].search(query)
results.to_table()

And look at what DataLink's are available for one of these results:  

In [None]:
links=results[0].getdatalink()
links.to_table()['description','content_type']

So this service offers, for each row in the Chandra catalog, access to the proposal abstract, related ADS bibcodes, nearby XMM, RXTE, ASCA, and ROSAT observations, etc. as HTML results.   

As above, one can iterate over each of these results:

In [None]:
##  If you need to re-run this cell, first re-run the query to
##   make sure the iterator is also reset.  
#results = services[0].search(query)
for links in results.iter_datalinks():
    #  Then look at all the linked objects for each TAP result
    #   and find the type we want
    for link in links:
        print(link.description)
        if "ADS" in link.description:
            # Let's just look at one of them
            l=link
    break
display(HTML(l.getdataset().data.decode()))

Note that one of DataLinks takes us to the Chandra observation itself with a content_type that indicates it is itself a datalink.  In other words, our original TAP query returned a number of rows in the catalog, each of which has a DataLink associated with it. That DataLink leads to a number of HTML results as well as another DataLink.   Let's look at the next DataLink level down and see what it gives:  

In [None]:
for link in links:
    if "datalink" in link.content_type:
        break
link.getdatalink().to_table()['description','content_type']

This shows that there are more datalinks below that, for the FITS images, orbit files, etc.  The user can therefore continue down the chain of links looking for particular types of data they are interested in.

Here's an example routine that recurses down to see what's there, showing the content types it finds at each level.  Note that there is no method iter_datalinks() for the results that are already DataLink results.  (Each row of a DataLink result has its own independent URL.)

In [None]:
def linkwalker( result, level):
    print("LEVEL {}".format(level))
    try:
        result2=result.getdatalink()
        print(result2.to_table()['description','content_type'])
    except Exception as e:
        print("Exception {}".format(e))
        return
    for link in [l for l in result2 if "datalink" in l.content_type]:
        linkwalker(link, level+1)
    return

In [None]:
linkwalker(results[0], 0)

You could then insert something to check for specific type of data product you're interested in and then submit such a query to a set of different services.  For instance, let's look for all FITS images.

In [None]:
def linkwalker( result, level, keyword=None ):
    print("LEVEL {}".format(level))
    try:
        result2=result.getdatalink()
        if keyword is None:
            print(result2.to_table()['description','content_type'])
        else:
            t=result2.to_table()['description','content_type']
            #print([r for r in t if keyword in r['description'].lower() or keyword in r['content_type'].lower()])
            for r in t:
                if keyword in r['description'].lower() or keyword in r['content_type'].lower():
                    print(r)
    except Exception as e:
        print("Exception {}".format(e))
        return
    for link in [l for l in result2 if "datalink" in l.content_type]:
        linkwalker(link, level+1, keyword)
    return

In [None]:
linkwalker(results[0], 0,keyword=b'fits')