#  UCDs:  what are they good for?

Suppose you want to do something using a column that you expect to find in a bunch of different tables, like coordinates and time.  It's a good bet that many if not most of the tables have coordinate columns, but there's no rule about what they have to be named.  

When doing detailed catalog queries with the TAP, you can obviously examine the columns of every table you're interested in to find the columns you want.  Then you can hard-code the correct ones into each query for each table and service.  

Or, you can also search for keywords like "ra" or "ascension" in the columns and their descriptions to get the columns you want automatically that way.  

But is there are more generic way?  [Unified Content Descriptors (UCDs)](http://www.ivoa.net/documents/latest/UCD.html) are a VO standard that allows table publishers to name their columns whatever they (or their contributors) want but to identify those that contain standard sorts of data.  For example, the RA column could be called "RA", "ra", "Right_Ascension", etc.  But in all cases, a VO service can label the column with it's UCD, which is "pos.eq.ra".  This information is not part of the table but part of the meta-data that the service may provide with that data. Though not required of all VO services, UCDs are commonly provided precisely to make such tasks as identifying the columns of interest easier to automate.  

This is easiest to show by example.

In [None]:
# Generic VO access routines
import pyvo as vo

# For specifying coordinates
from astropy.coordinates import SkyCoord

# Ignore unimportant warnings
import warnings
warnings.filterwarnings('ignore', '.*Unknown element mirrorURL.*', vo.utils.xml.elements.UnknownElementWarning)

Let's look at some tables in a little more detail.  

In [None]:
#  Let's find the Hubble Source Catalog version 3 (HSCv3), assuming there's only one at MAST
services = vo.regsearch(servicetype='tap', keywords=['mast'])
service=[s for s in services if 'HSCv3' in s.res_title][0]

print(f'Title: {service.res_title}')
print(f'{service.res_description}')


Now let's see what tables are provided by this service for HSCv3.  Note that this is another query to the service:

In [None]:
tables = service.service.tables  # Queries for details of the service's tables
print(f'{len(tables)} tables:')
for t in tables:
    print(f'{t.name:30s} - {t.description}\n----')  # A more succinct option than t.describe()

Let's look at the first 10 columns of the DetailedCatalog table.  Again, note that calling the columns attribute sends another query to the service to ask for the columns.

In [None]:
columns=tables['dbo.DetailedCatalog'].columns
for c in columns:
    #print(f'{c.name:30s} - {c.description}')
    print("{} - {}".format( "{} [{}]".format(c.name,c.ucd).ljust(30) , c.description))

The PyVO method to get the columns will automatically fetch all the meta-data about those columns.  It's up to the service provider to set them correctly, of course, but in this case, we see that the column named "MatchRA" is identified with the UCD "pos.eq.ra".  

So if we did not know the exact name used in HSCv3 for the RA, we could do something like this:

In [None]:
ra_name=[c.name for c in columns if 'RA' in c.name]
print(ra_name)

Or you can check for the exact UCD:

But the UCD is not required.  If it isn't there, you get a None type, so code the check carefully:

In [None]:
ra_name=[c.name for c in columns if c.ucd and 'pos.eq.ra' in c.ucd][0]
dec_name=[c.name for c in columns if c.ucd and 'pos.eq.dec' in c.ucd][0]

ra_name,dec_name

In [None]:
tables.fieldname_with_ucd('pos.eq.ra')

In [None]:
len([c.name for c in columns if c.ucd and 'pos.foo.ra' in c.ucd])


What that shows you is that there are two columns in this table that give RA information.  But only one has the 'pos.eq.ra' UCD.  

<font color=red>Is there a reason for this?  Does that mean MatchRA is the 'better' RA to use than SourceRA?</font>

In [None]:
coord = SkyCoord.from_name("m83")
#  For zcat
#query = f'''
#SELECT ra, dec, Radial_Velocity, radial_velocity_error, bmag, morph_type FROM public.zcat as cat where 
#contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',{coord.ra.deg},{coord.dec.deg},1.0))=1
#'''
query=f"select top 10 {ra_name}, {dec_name} from dbo.DetailedCatalog"
results = service.search(query)
results

This then means that you can code the same query to work for different tables in a loop.  This sends a bunch of queries but doesn't take too long, a minute maybe.    

In [None]:
#  Look for all TAP services with x-ray and optical data
collection={}
for s in vo.regsearch(servicetype='tap',keywords=['x-ray','optical']):
    print(f"Looking at service from {s.ivoid}")
    tables=s.service.tables
    #  Find all the tables that have an RA,DEC and a start and end time
    for t in tables:
        names={}
        for ucd in ['pos.eq.ra','pos.eq.dec','time.start','time.end']:
            cols=[c.name for c in t.columns if c.ucd and ucd in c.ucd]
            if len(cols) > 0:  
                names[ucd]=cols[0]  # use the first that matches
        if len(names.keys()) == 4:  
            query="select top 10 {}, {}, {}, {} from {}".format(
                names['pos.eq.ra'],
                names['pos.eq.dec'],
                names['time.start'],
                names['time.end'],
                t.name)
            print(f"Table {t.name} has the right columns.  Executing query:\n{query}")
            results=s.search(query)
            print("Found {} results\n".format(len(results)))
            #  Careful.  We're assuming the table names are unique
            collection[t.name]=results


You can also use UCDs to look at the results.  Above, we collected just the first 10 rows of the four columns we're interested in from every catalog that had them.  But these tables still have their original column names.  So the UCDs will still be useful, and PyVO provides a simple routine to convert from UCD to column (field) name.  

Note, however,  that returning the UCDs as part of the result is not mandatory, and some services do not do it.  So you'll have to check.

In [None]:
for tname,results in collection.items():
    raname=results.fieldname_with_ucd('pos.eq.ra')
    if raname:
        print(f"Table {tname} has the RA column named {raname}")
    else:
        print(f"(Table {tname} didn't give back the UCD.)")

<font color=red>This appears to be the current state of things.  Is this something we want to advertise yet?  There are a few ways it could be improved.  Do *all* NASA centers provide UCDs with the TAP results?  HEASARC was not until I asked TomM about it.  

Second issue:  HEASARC has one TAP service that does not list wavebands, so it doesn't show up in the above search.  TomM says that this is correct according to something MarkusD wrote up about how to register TAP services that have many tables.  One TAP service is listed in the Registry, and each table has its own list as a cone search, with the TAP service listed as auxiliary info.  So should PyVO then change the regsearch() to search Cone Services as well when asked for TAP services and include the associated TAP services in the result?  
</font>