Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Improvements on doc for TAP service and other minor changes #436

Merged
merged 18 commits into from
Jul 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 196 additions & 4 deletions docs/dal/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -146,10 +146,196 @@ Table Access Protocol

-- `Table Access Protocol <https://www.ivoa.net/documents/TAP/>`_


Consider the following example for using TAP and ADQL, retrieving 5
objects from the GAIA DR3 database, showing their id, position and
mean G-band magnitude between 19 - 20:

.. doctest-remote-data::

>>> import pyvo as vo
>>> tap_service = vo.dal.TAPService("http://dc.g-vo.org/tap")
>>> tap_results = tap_service.search("SELECT TOP 10 * FROM ivoa.obscore")
>>> ex_query = """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "explore" step is missing here, IMO. Where do the user find the name of the table and columns?

Some services have a description that can be used here (similar to the SIA service above)

Others (like this one) have examples:

>>> for e in tap_service.examples:
>>> ...     print(e['QUERY'])

One can find the names of the tables:

[t for t in tap_service.tables.keys()]

And the columns in a table:

tap_service.search('select * from gaia.dr3lite', maxrec=1).table.columns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply and many thanks for the review. I have added your suggestions accordingly into that location.

... SELECT TOP 5
... source_id, ra, dec, phot_g_mean_mag
... FROM gaia.dr3lite
... WHERE phot_g_mean_mag BETWEEN 19 AND 20
... ORDER BY phot_g_mean_mag
... """
>>> result = tap_service.search(ex_query)
>>> print(result)
<Table length=5>
source_id ra dec phot_g_mean_mag
deg deg mag
int64 float64 float64 float32
------------------- ------------------ ------------------ ---------------
2162809607452221440 315.96596187101636 45.945474015208106 19.0
2000273643933171456 337.1829026565382 50.7218533537033 19.0
2171530448339798784 323.9151025188806 51.27690705826792 19.0
2171810342771336704 323.25913736080776 51.94305655940998 19.0
2180349528028140800 310.5233961869657 50.3486391034819 19.0

To explore more query examples, you can try either the ``description``
attribute for some services. For other services like this one, try
the ``examples`` attribute.

.. doctest-remote-data::

>>> print(tap_service.examples[0]['QUERY'])
SELECT TOP 50 l.id, l.pmra as lpmra, l.pmde as lpmde,
g.source_id, g.pmra as gpmra, g.pmdec as gpmde
FROM
lspm.main as l
JOIN gaia.dr3lite AS g
ON (DISTANCE(g.ra, g.dec, l.raj2000, l.dej2000)<0.01) -- rough pre-selection
WHERE
DISTANCE(
ivo_epoch_prop_pos(
g.ra, g.dec, g.parallax,
g.pmra, g.pmdec, g.radial_velocity,
2016, 2000),
POINT(l.raj2000, l.dej2000)
)<0.0002 -- fine selection with PMs

Furthermore, one can find the names of the tables using:

.. doctest-remote-data::

>>> print([tab_name for tab_name in tap_service.tables.keys()]) # doctest: +ELLIPSIS, +IGNORE_WARNINGS
['amanda.nucand', 'annisred.main', 'antares.data', ..., 'wfpdb.main', 'wise.main', 'zcosmos.data']


And also the names of the columns from a known table, for instance
the first three columns:

.. doctest-remote-data::

>>> result.table.columns[:3] # doctest: +IGNORE_WARNINGS
<TableColumns names=('source_id','ra','dec')>

If you know a TAP service's access URL, you can directly pass it to
:py:class:`~pyvo.dal.TAPService` to obtain a service object.
Sometimes, such URLs are published in papers or passed around through
other channels. Most commonly, you will discover them in the VO
registry (cf. :ref:`pyvo.registry<pyvo-registry>`).

To perform a query using ADQL, the ``search()`` method is used.
TAPService instances have several methods to inspect the metadata
of the service - in particular, what tables with what columns are
available - discussed below.

To get an idea of how to write queries in ADQL, have a look at
`GAVO's ADQL course`_; it is basically a standardised subset of SQL
with some extensions to make it work better for astronomy.
Comment on lines +228 to +229
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of ADQL is already provided in the first paragraph, no need for the second half of the sentence.
I'm also on the fence about the ADQL tutorial, but if @msdemlei wants to have the link here, then it should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I give you the ADQL course shows its age, but frankly I think it would be a good idea to link to some accessible ADQL resource. The ADQL spec definitely is a terrible place to send anyone but an implementor to...

So, I'm happy to yield to any other gentle ADQL intro, but we really should avoid sending folks new to SQL-like stuff to IVOA documentation, and not saying anything about "what's this ADQL that I'm supposed to enter" isn't a good option either.

Talking about which: I think we're not saying anything about TAP examples anywhere yet? They're also great to get people over the first few metres (try the examples attribute of a TAPService object). Would this fit here, in addition?


.. _GAVO's ADQL course: https://docs.g-vo.org/adql

Synchronous vs. asynchronous query
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether this belongs to this exact document. Again, the opening paragraph mentions them, and they are well-defined in the specs, so it's not exactly pyvo's task to explain what they mean on the server side.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In synchronous (“sync”) mode, the client keeps a connection for the
entire runtime of the query, and query processing generally starts
when the request is submitted. This is convenient but becomes
brittle as queries have runtimes of the order of minutes, when you
may encounter query timeouts. Also, many data providers impose
rather strict limits on the runtime alotted to sync queries.

In asynchronous (“async”) mode, on the other hand, the client just
submits a query and receives a URL that let us inspect the
execution status (and retrieve its result) later. This means that
no connection needs to be held, which makes this mode a lot more
robust of long-running queries. It also supports queuing queries,
which allows service operators to be a lot more generous with
resource limits.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async queries are not bullet proof either. Service operators might impose limits on the size of the result set which can cause the query to fail or have the result truncated. Also, results might not be available before execution completes (i.e. streaming).


To specify the query mode, you can use either ``run_sync()`` for
synchronous query or ``run_async()`` for asynchronous query.

.. doctest-remote-data::

>>> job = tap_service.submit_job(ex_query)

To learn more details from the asynchronous query, let's look at the
``submit_job()`` method. This submits an asynchronous query without
starting it, it creates a new object :py:class:`~pyvo.dal.AsyncTAPJob`.

.. doctest-remote-data::

>>> job.url # doctest: +ELLIPSIS
'http://dc.zah.uni-heidelberg.de/__system__/tap/run/async/...'

The job URL mentioned before is available in the ``url`` attribute.
Clicking on the URL leads you to the query itself, where you can check
the status(phase) of the query and decide to run, modify or delete
the job. You can also do it via various attributes:

.. doctest-remote-data::

>>> job.phase
'PENDING'

A newly created job is in the PENDING state.
While it is pending, it can be configured, for instance, overriding
the server's default time limit (after which the query will be
canceled):

.. doctest-remote-data::

>>> job.executionduration = 700
>>> job.executionduration
700

When you are ready, you can start the job:

.. doctest-remote-data::

>>> job.run() # doctest: +ELLIPSIS
<pyvo.dal.tap.AsyncTAPJob object at 0x...>

This will put the job into the QUEUED state. Depending on how busy
the server is, it will immediately go to the EXECUTING status:

.. doctest-remote-data::

>>> job.phase # doctest: +IGNORE_OUTPUT
'EXECUTING'

The job will eventually end up in one of the phases:

* COMPLETED - if all went to plan,
* ERROR - if the query failed for some reason;
look at the error
attribute of the job to find out details,
* ABORTED - if you manually killed the query using the ``abort()``
method or the server killed your query, presumably because it hit
the time limit.

After the job ends up in COMPLETED, you can retrieve the result:

.. doctest-remote-data::

>>> job.phase # doctest: +IGNORE_OUTPUT
'COMPLETED'
>>> job.fetch_result() # doctest: +SKIP
(result table as shown before)

Eventually, it is friendly to clean up the job rather than relying
on the server to clean it up once ``job.destruction`` (a datetime
that you can change if you need to) is reached.

.. doctest-remote-data::

>>> job.delete()

For more attributes please read the description for the job object
:py:class:`~pyvo.dal.AsyncTAPJob`.

With ``run_async()`` you basically submit an asynchronous query and
return its result. It is like running ``submit_job()`` first and then
run the query manually.

Query limit
^^^^^^^^^^^

As a sanity precaution, most services have some default limit of how many
rows they will return before overflowing:
Expand Down Expand Up @@ -256,6 +442,7 @@ Finally, tables and their content can be removed:

>>> tap_service.remove_table(name='test_schema.test_table')

For further information about the service's parameters, see :py:class:`~pyvo.dal.TAPService`.

.. _pyvo-sia:

Expand All @@ -274,7 +461,7 @@ Simple Image Access
referred to as datacubes, cube or image cube datasets and may be considered examples
of hypercube or n-cube data. PyVO supports both versions of SIA.

-- `Simple IMage Access <https://www.ivoa.net/documents/SIA/>`_
-- `Simple Image Access <https://www.ivoa.net/documents/SIA/>`_

Basic queries are done with the ``pos`` and ``size`` parameters described in
:ref:`pyvo-astro-params`, with ``size`` being the rectangular region around
Expand Down Expand Up @@ -316,6 +503,8 @@ Available values:

This service exposes the :ref:`verbosity <pyvo-verbosity>` parameter

For further information about the service's parameters, see :py:class:`~pyvo.dal.SIAService`.

.. _pyvo-ssa:

Simple Spectrum Access
Expand Down Expand Up @@ -349,6 +538,7 @@ SSA queries can be further constrained by the ``band`` and ``time`` parameters.
... time=time, band=Quantity((1e-13, 1e-12), unit="meter")
... )

For further information about the service's parameters, see :py:class:`~pyvo.dal.SSAService`.

.. _pyvo-scs:

Expand All @@ -375,7 +565,8 @@ within a circular region on the sky defined by the parameters ``pos``
>>> scs_srv = vo.dal.SCSService('http://dc.zah.uni-heidelberg.de/arihip/q/cone/scs.xml')
>>> scs_results = scs_srv.search(pos=pos, radius=size)

This service exposes the :ref:`verbosity <pyvo-verbosity>` parameter
This service exposes the :ref:`verbosity <pyvo-verbosity>` parameter.
For further information about the service's parameters, see :py:class:`~pyvo.dal.SCSService`.

.. _pyvo-slap:

Expand All @@ -394,6 +585,7 @@ Simple Line Access
This service let you query for spectral lines in a certain ``wavelength``
range. The unit of the values is meters, but any unit may be specified using
`~astropy.units.Quantity`.
For further information about the service's parameters, see :py:class:`~pyvo.dal.SLAService`.

Jobs
====
Expand All @@ -411,7 +603,7 @@ to run several queries in parallel from one script.
.. note::
It is good practice to test the query with a maxrec constraint first.

When you invoke ``submit job`` you will get a job object.
When you invoke ``submit_job`` you will get a job object.

.. doctest-remote-data::

Expand Down
3 changes: 3 additions & 0 deletions pyvo/dal/query.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,9 @@ def create_query(self, **keywords):
return q

def describe(self):
"""
describe the general information about the DAL service
"""
print('DAL Service at {}'.format(self.baseurl))


Expand Down
16 changes: 16 additions & 0 deletions pyvo/dal/tap.py
Original file line number Diff line number Diff line change
Expand Up @@ -858,6 +858,15 @@ def result_uri(self):

@property
def uws_version(self):
"""
the version of the UWS serving this async job

Asynchronous TAP jobs are managed using a standard called Universal
Worker Service (UWS). For instance, starting version 1.1, you can
have long polls, which save on monitoring requests. Normal users
generally will not have to look at this.

"""
self._update()
return self._job.version

Expand Down Expand Up @@ -1038,6 +1047,13 @@ def __init__(

@property
def queryurl(self):
"""
the URL to which to submit queries

In TAP, that varies depending on whether we run sync or async
queries.

"""
return '{baseurl}/{mode}'.format(baseurl=self.baseurl, mode=self._mode)

def execute_stream(self, post=False):
Expand Down