Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Improvements on doc for TAP service and other minor changes #436

Merged
merged 18 commits into from
Jul 6, 2023

Conversation

chmmao
Copy link
Contributor

@chmmao chmmao commented Mar 15, 2023

Greetings,

I would like to add some extensions of the TAP doc from a beginner user perspective, to make it more detailed and easier to start with.

Change suggestions:

  1. overall extension of the documentation on how to use TAP service, e.g. sync and async query
  2. add links to parameter/class description for each service protocol
  3. add some doc strings for several functions
  4. minor fixes on spelling

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I had a few comments, 3 of which I would like to see changed (fix the code examples, remove duplicated heading name and the references to topcat. Python users may very well not know topcat and may prefer not to use GUIs, so self-contained documentation is much nicer than referencing totally unrelated tools and their features), the rest are more comments that can go either way, especially if the maintainers disagree with me.

Comment on lines 187 to 191
available - discussed below. Note that for exploratory query
construction, you can also use interactive TAP clients such as
TOPCAT_, which include table browsers.

.. _TOPCAT: http://www.star.bris.ac.uk/~mbt/topcat/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a how-to for TAP, but a doc page for pyvo, please remove the cross reference.

@@ -146,10 +146,168 @@ Table Access Protocol

-- `Table Access Protocol <https://www.ivoa.net/documents/TAP/>`_

Getting started
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a getting started heading in this very same document, I strongly suggest not to repeat it (it also messes up the heading references)

Comment on lines +194 to +195
`GAVO's ADQL course`_; it is basically a standardised subset of SQL
with some extensions to make it work better for astronomy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of ADQL is already provided in the first paragraph, no need for the second half of the sentence.
I'm also on the fence about the ADQL tutorial, but if @msdemlei wants to have the link here, then it should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I give you the ADQL course shows its age, but frankly I think it would be a good idea to link to some accessible ADQL resource. The ADQL spec definitely is a terrible place to send anyone but an implementor to...

So, I'm happy to yield to any other gentle ADQL intro, but we really should avoid sending folks new to SQL-like stuff to IVOA documentation, and not saying anything about "what's this ADQL that I'm supposed to enter" isn't a good option either.

Talking about which: I think we're not saying anything about TAP examples anywhere yet? They're also great to get people over the first few metres (try the examples attribute of a TAPService object). Would this fit here, in addition?


.. _GAVO's ADQL course: https://docs.g-vo.org/adql

Synchronous vs. asynchronous query
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether this belongs to this exact document. Again, the opening paragraph mentions them, and they are well-defined in the specs, so it's not exactly pyvo's task to explain what they mean on the server side.

(...)

The job URL mentioned before is available in the ``url`` attribute;
you can use that URL later to resume the job even from clients like TOPCAT.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, why is there a TOPCAT reference?

Comment on lines 161 to 166
>>> """ SELECT TOP 5
>>> source_id, ra, dec, phot_g_mean_mag
>>> FROM gaia.dr3lite
>>> WHERE phot_g_mean_mag BETWEEN 19 AND 20
>>> ORDER BY phot_g_mean_mag
>>> """)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix multi-line syntax, ensure that all doctest run and pass with locally running pytest docs/dal/ --remote-data

Suggested change
>>> """ SELECT TOP 5
>>> source_id, ra, dec, phot_g_mean_mag
>>> FROM gaia.dr3lite
>>> WHERE phot_g_mean_mag BETWEEN 19 AND 20
>>> ORDER BY phot_g_mean_mag
>>> """)
... """ SELECT TOP 5
... source_id, ra, dec, phot_g_mean_mag
... FROM gaia.dr3lite
... WHERE phot_g_mean_mag BETWEEN 19 AND 20
... ORDER BY phot_g_mean_mag
... """)

remove repeated heading name
@chmmao
Copy link
Contributor Author

chmmao commented Mar 22, 2023

Thanks for the PR!

I had a few comments, 3 of which I would like to see changed (fix the code examples, remove duplicated heading name and the references to topcat. Python users may very well not know topcat and may prefer not to use GUIs, so self-contained documentation is much nicer than referencing totally unrelated tools and their features), the rest are more comments that can go either way, especially if the maintainers disagree with me.

Thank you for your detailed review. I just fixed the three issues that you mentioned. For the ADQL reference I will wait for @msdemlei 's comments.

@codecov
Copy link

codecov bot commented Mar 22, 2023

Codecov Report

Merging #436 (cfcbf7f) into main (af6c204) will increase coverage by 0.10%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main     #436      +/-   ##
==========================================
+ Coverage   79.84%   79.94%   +0.10%     
==========================================
  Files          52       52              
  Lines        5988     6018      +30     
==========================================
+ Hits         4781     4811      +30     
  Misses       1207     1207              
Impacted Files Coverage Δ
pyvo/dal/query.py 84.81% <ø> (ø)
pyvo/dal/tap.py 73.11% <ø> (ø)

... and 4 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

>>> tap_service = vo.dal.TAPService("http://dc.g-vo.org/tap")
>>> tap_results = tap_service.search("SELECT TOP 10 * FROM ivoa.obscore")
>>> ex_query = """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "explore" step is missing here, IMO. Where do the user find the name of the table and columns?

Some services have a description that can be used here (similar to the SIA service above)

Others (like this one) have examples:

>>> for e in tap_service.examples:
>>> ...     print(e['QUERY'])

One can find the names of the tables:

[t for t in tap_service.tables.keys()]

And the columns in a table:

tap_service.search('select * from gaia.dr3lite', maxrec=1).table.columns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply and many thanks for the review. I have added your suggestions accordingly into that location.

Comment on lines 183 to 185
>>> for e in tap_service.examples:
... print(e['QUERY']) # doctest: +SKIP
<query examples>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a firm believer in including working and testable examples in the documentation. Could you please rewrite this in a way that it's not skipped? (and if needed, condense it enough so it runs quickly)

Also, please refrain from using single-letter variables, and use full, descriptive variable names at all times, especially in user facing examples (but also in code).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review.

Actually, all the doctest: +SKIP I put in the doc are because of their output EITHER are temporarily i.e. might be different later in time, OR the outputs are simply too long (that the output can not be shown fully), as far as I understand from doctest, it checks the code output 1 to 1 with the strings that I wrote in the doc.
In this case, since the output is related to the names of the tables, it can indeed be updated later in time, and the list of the names is also too long. I am not sure if there is an ambiguous way for doctest to solve this issue. I might go back to doctest documentation and check further details.

And for the variable issues, thanks for that hint, sure I will fix it later asap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah also here, the output for the query examples are also too long, there are many examples shown in the output, I could try too write them 1 to 1 into the doc, if that is okay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for long outputs ellipses would work, or you can rewrite the example of looking for one special case in there.

For variable outputs, doctest: +IGNORE_OUTPUT is way more preferable than skipping it altogether, as it at least runs the line and spots any syntax, etc issues even if it doesn't compare the expected and actual outputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hints. I have changed the doctest flags accordingly. Also I have to leave one docktest: +SKIP to line 319: job.fetch_result(). This is because the actual query is not faster than doctest's processing speed, so it would turn out an error that the result does not exist yet.
Furthermore I have optimised some code outputs so that docktest: +ELLIPSIS would not be needed.
And dockets: + IGNORE_OUTPUTS is indeed used for variable outputs.

no connection needs to be held, which makes this mode a lot more
robust of long-running queries. It also supports queuing queries,
which allows service operators to be a lot more generous with
resource limits.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async queries are not bullet proof either. Service operators might impose limits on the size of the result set which can cause the query to fail or have the result truncated. Also, results might not be available before execution completes (i.e. streaming).

@msdemlei
Copy link
Contributor

msdemlei commented May 19, 2023 via email

@andamian
Copy link
Contributor

The API presents both options because there's not clear winner but it makes it difficult for the user to choose the rights one without supporting information on the benefits and drawbacks. I remember it was one of the questions I wondered myself when I first came across this years ago.

@bsipocz
Copy link
Member

bsipocz commented Jul 6, 2023

Given the approval, the fact that the PR is clearly an improvement over what we have atm, and the sentiment in Perhaps let's merge and you add the extra material in another PR?, I will go ahead and merge this. Any further improvements is welcome in a follow-up PR.

Thank you @chmmao!

@bsipocz bsipocz merged commit 52ae511 into astropy:main Jul 6, 2023
10 checks passed
@bsipocz bsipocz modified the milestones: v1.5, v1.4.2 Jul 6, 2023
@bsipocz
Copy link
Member

bsipocz commented Jul 6, 2023

@astropy/pyvo-maintainers - note to ourselves: older PRs like this one should be rebased before merge to get a more recent CI status, so to ensure we don't add a breakage into main.

bsipocz added a commit that referenced this pull request Aug 16, 2023
DOC: Improvements on doc for TAP service and other minor changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants