-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Improvements on doc for TAP service and other minor changes #436
Conversation
2) improved the docstring by adding details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I had a few comments, 3 of which I would like to see changed (fix the code examples, remove duplicated heading name and the references to topcat. Python users may very well not know topcat and may prefer not to use GUIs, so self-contained documentation is much nicer than referencing totally unrelated tools and their features), the rest are more comments that can go either way, especially if the maintainers disagree with me.
docs/dal/index.rst
Outdated
available - discussed below. Note that for exploratory query | ||
construction, you can also use interactive TAP clients such as | ||
TOPCAT_, which include table browsers. | ||
|
||
.. _TOPCAT: http://www.star.bris.ac.uk/~mbt/topcat/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a how-to for TAP, but a doc page for pyvo, please remove the cross reference.
docs/dal/index.rst
Outdated
@@ -146,10 +146,168 @@ Table Access Protocol | |||
|
|||
-- `Table Access Protocol <https://www.ivoa.net/documents/TAP/>`_ | |||
|
|||
Getting started |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a getting started heading in this very same document, I strongly suggest not to repeat it (it also messes up the heading references)
`GAVO's ADQL course`_; it is basically a standardised subset of SQL | ||
with some extensions to make it work better for astronomy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The definition of ADQL is already provided in the first paragraph, no need for the second half of the sentence.
I'm also on the fence about the ADQL tutorial, but if @msdemlei wants to have the link here, then it should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I give you the ADQL course shows its age, but frankly I think it would be a good idea to link to some accessible ADQL resource. The ADQL spec definitely is a terrible place to send anyone but an implementor to...
So, I'm happy to yield to any other gentle ADQL intro, but we really should avoid sending folks new to SQL-like stuff to IVOA documentation, and not saying anything about "what's this ADQL that I'm supposed to enter" isn't a good option either.
Talking about which: I think we're not saying anything about TAP examples anywhere yet? They're also great to get people over the first few metres (try the examples attribute of a TAPService object). Would this fit here, in addition?
|
||
.. _GAVO's ADQL course: https://docs.g-vo.org/adql | ||
|
||
Synchronous vs. asynchronous query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether this belongs to this exact document. Again, the opening paragraph mentions them, and they are well-defined in the specs, so it's not exactly pyvo's task to explain what they mean on the server side.
docs/dal/index.rst
Outdated
(...) | ||
|
||
The job URL mentioned before is available in the ``url`` attribute; | ||
you can use that URL later to resume the job even from clients like TOPCAT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, why is there a TOPCAT reference?
docs/dal/index.rst
Outdated
>>> """ SELECT TOP 5 | ||
>>> source_id, ra, dec, phot_g_mean_mag | ||
>>> FROM gaia.dr3lite | ||
>>> WHERE phot_g_mean_mag BETWEEN 19 AND 20 | ||
>>> ORDER BY phot_g_mean_mag | ||
>>> """) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix multi-line syntax, ensure that all doctest run and pass with locally running pytest docs/dal/ --remote-data
>>> """ SELECT TOP 5 | |
>>> source_id, ra, dec, phot_g_mean_mag | |
>>> FROM gaia.dr3lite | |
>>> WHERE phot_g_mean_mag BETWEEN 19 AND 20 | |
>>> ORDER BY phot_g_mean_mag | |
>>> """) | |
... """ SELECT TOP 5 | |
... source_id, ra, dec, phot_g_mean_mag | |
... FROM gaia.dr3lite | |
... WHERE phot_g_mean_mag BETWEEN 19 AND 20 | |
... ORDER BY phot_g_mean_mag | |
... """) |
remove repeated heading name
Thank you for your detailed review. I just fixed the three issues that you mentioned. For the ADQL reference I will wait for @msdemlei 's comments. |
Codecov Report
@@ Coverage Diff @@
## main #436 +/- ##
==========================================
+ Coverage 79.84% 79.94% +0.10%
==========================================
Files 52 52
Lines 5988 6018 +30
==========================================
+ Hits 4781 4811 +30
Misses 1207 1207
... and 4 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
>>> tap_service = vo.dal.TAPService("http://dc.g-vo.org/tap") | ||
>>> tap_results = tap_service.search("SELECT TOP 10 * FROM ivoa.obscore") | ||
>>> ex_query = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "explore" step is missing here, IMO. Where do the user find the name of the table and columns?
Some services have a description
that can be used here (similar to the SIA service above)
Others (like this one) have examples:
>>> for e in tap_service.examples:
>>> ... print(e['QUERY'])
One can find the names of the tables:
[t for t in tap_service.tables.keys()]
And the columns in a table:
tap_service.search('select * from gaia.dr3lite', maxrec=1).table.columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply and many thanks for the review. I have added your suggestions accordingly into that location.
docs/dal/index.rst
Outdated
>>> for e in tap_service.examples: | ||
... print(e['QUERY']) # doctest: +SKIP | ||
<query examples> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a firm believer in including working and testable examples in the documentation. Could you please rewrite this in a way that it's not skipped? (and if needed, condense it enough so it runs quickly)
Also, please refrain from using single-letter variables, and use full, descriptive variable names at all times, especially in user facing examples (but also in code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review.
Actually, all the doctest: +SKIP
I put in the doc are because of their output EITHER are temporarily i.e. might be different later in time, OR the outputs are simply too long (that the output can not be shown fully), as far as I understand from doctest
, it checks the code output 1 to 1 with the strings that I wrote in the doc.
In this case, since the output is related to the names of the tables, it can indeed be updated later in time, and the list of the names is also too long. I am not sure if there is an ambiguous way for doctest
to solve this issue. I might go back to doctest documentation and check further details.
And for the variable issues, thanks for that hint, sure I will fix it later asap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah also here, the output for the query examples are also too long, there are many examples shown in the output, I could try too write them 1 to 1 into the doc, if that is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for long outputs ellipses would work, or you can rewrite the example of looking for one special case in there.
For variable outputs, doctest: +IGNORE_OUTPUT
is way more preferable than skipping it altogether, as it at least runs the line and spots any syntax, etc issues even if it doesn't compare the expected and actual outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hints. I have changed the doctest
flags accordingly. Also I have to leave one docktest: +SKIP
to line 319: job.fetch_result()
. This is because the actual query is not faster than doctest
's processing speed, so it would turn out an error that the result does not exist yet.
Furthermore I have optimised some code outputs so that docktest: +ELLIPSIS
would not be needed.
And dockets: + IGNORE_OUTPUTS
is indeed used for variable outputs.
DalOverflowing warnings
no connection needs to be held, which makes this mode a lot more | ||
robust of long-running queries. It also supports queuing queries, | ||
which allows service operators to be a lot more generous with | ||
resource limits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Async queries are not bullet proof either. Service operators might impose limits on the size of the result set which can cause the query to fail or have the result truncated. Also, results might not be available before execution completes (i.e. streaming).
On Wed, May 17, 2023 at 10:28:55AM -0700, Adrian wrote:
Async queries are not bullet proof either. Service operators might
impose limits on the size of the result set which can cause the
query to fail or have the result truncated. Also, results might not
be available before execution completes (i.e. streaming).
Do you argue text to this effect should be added to the brief
discussion of async? Me... I'd say both caveats are perhaps not
overly central in this overview: That there will be resource limits
is, I think, more or less expectable. And it would be the
availability of partial results up front that would require
mentioning in my view (but then I'm an implementor for whom this kind
of thing is an actual effort...).
On the other hand, I'm strongly against including these points
either, except that I think "bullet proof" doesn't quite fit what we
ought to say (perhaps: "note that resource limits even apply to async
queries" or so, although that sounds a bit trite to me). Perhaps
let's merge and you add the extra material in another PR?
|
The API presents both options because there's not clear winner but it makes it difficult for the user to choose the rights one without supporting information on the benefits and drawbacks. I remember it was one of the questions I wondered myself when I first came across this years ago. |
Given the approval, the fact that the PR is clearly an improvement over what we have atm, and the sentiment in Thank you @chmmao! |
@astropy/pyvo-maintainers - note to ourselves: older PRs like this one should be rebased before merge to get a more recent CI status, so to ensure we don't add a breakage into |
DOC: Improvements on doc for TAP service and other minor changes
Greetings,
I would like to add some extensions of the TAP doc from a beginner user perspective, to make it more detailed and easier to start with.
Change suggestions: