Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better docs and infrastructure for unified read/write functions #8255

Merged
merged 23 commits into from Apr 20, 2019

Conversation

@taldcroft
Copy link
Member

taldcroft commented Dec 10, 2018

This PR is a decent-sized update to the unified I/O infrastructure, with the main driver being to make it possible to get format-specific documentation in a simple way. This is done with (e.g.):

>>> Table.read.help('ascii.latex')
>>> Table.write.help('fits')
>>> CCDData.read.help('fits')
>>> Table.read.help()  # Generic reader help
>>> Table.read?  # Basically the same thing

This prints output to the console via a pager (pydoc.pager) and includes the generic read/write interface help and format-specific docs.

In order to make the read.help and write.help methods in a clean way I ended up with some new general-purpose infrastructure for defining read and write methods as class descriptors. Along the way I cleaned up the io.ascii docs to make them more consistent within this framework.

I also did cleanup of the Table registry by removing the custom-installed 'ascii' and 'csv' entry points in favor of generic mechanisms now in place.

NOTE for future readers: the comments up through #8255 (comment) are mostly overtaken by the new code.

@@ -414,6 +328,8 @@ def read(table, guess=None, **kwargs):

return dat

read.__doc__ = core.READ_DOCSTRING

This comment has been minimized.

Copy link
@taldcroft

taldcroft Dec 10, 2018

Author Member

Needed to avoid circular import.

@mhvk

This comment has been minimized.

Copy link
Contributor

mhvk commented Dec 10, 2018

Yes to having this information!

A possible alternative implementation might be to have read have attributes which have the relevant docstring (and possibly can be used as short-cuts to reading), i.e.,

Table.read?
# Tells "look at table.read.format"
Table.read.fits?
# Gives information on how to read a latex-formatted file.

# MAYBE???
t = Table.read.fits('event.fits')  # same as Table.read('event.fits', format='fits')

To implement this, read will have to become an instance of a "TableReader" class that has the appropriate general docstring, a __call__ method that implements the current read class method, and a __getattr__ which helps get the format documenation. (Possibly, this could be pushed up to the io_registry.read implementation; not sure).

One advantage would be that it is more discoverable. Arguably, it is also a bit more pythonic, but it may well be too tricky to get right... Certainly your implementation here has the benefit of simplicity!

@taldcroft

This comment has been minimized.

Copy link
Member Author

taldcroft commented Dec 10, 2018

@mhvk - very nice idea, I like it on every level! The main question is whether the implementation can be kept manageable. Mostly I think it isn't too bad, but there is the issue of the hierarchy of readers, namely:

t = Table.read('file.fits')  # auto-identify
t = Table.read.fits('file.fits')
t = Table.read.ascii('file.dat')
t = Table.read.ascii.ecsv('file.ecsv')

So presumably a UnifiedReader class needs to have a __call__ method that calls a particular reader and an optional list of registered sub-readers (also UnifiedReader objects). (This should be generalized to any class that is supported in the unified I/O framework).

At that point one starts to think about solving the whole issue where the table __init__.py needs to import (in advance) every class that has a reader. I'm not sure at this point if the import time issue is a real driver.

@mhvk

This comment has been minimized.

Copy link
Contributor

mhvk commented Dec 10, 2018

Yes, indeed, I do worry about the implementation. Though I don't think there is a need to "pre-import" too much - that can be done in the __getattr__ method.

The recursion may be trickier, as one wants to be sure not to reimplement things that python and/or the io registry already do - indeed, this is partially why I wondered if we could push part of this up to io.registry (i.e., should it be the module gaining the new Reader class, with Table.read just a thin veneer?)

Copy link
Member

astrofrog left a comment

I don't think using attribute notation would necessarily work since some formats (defined outside astropy) may not have Python-variable-compliant names?

In any case, this here is much better than the current situation, so I would actually propose that we consider merging this and then consider making more substantial changes if we really want.

This should be mentioned in the unified I/O docs though (and anywhere Table.read is mentioned and in the Table.read docstring)

if format and not args and not kwargs:
reader = io_registry.get_reader(format, cls)
reader_doc = ('Table.read() general documentation\n'
'==================================\n')

This comment has been minimized.

Copy link
@astrofrog

astrofrog Dec 10, 2018

Member

The headings don't look quite right - the top heading seems a bit 'stuck' to the Table.read call - maybe try with heading lines above too? (since I guess we don't need to be ReST compliant here?)

@taldcroft

This comment has been minimized.

Copy link
Member Author

taldcroft commented Dec 11, 2018

Good points @astrofrog.

I suspect in the wild there are not many (probably zero) user-defined formats that have a name that doesn't work as an attribute. The most likely problem is - which is commonly translated to _.

But the issue of just getting something that works without going crazy (given the reality of time constraints) is well-taken. I can clean this implementation up a bit, include the writers, and we'll see where it goes.

@taldcroft taldcroft changed the title POC: better docs for unified read/write functions [skip-ci] Better docs and infrastructure for unified read/write functions Dec 16, 2018
@taldcroft taldcroft mentioned this pull request Jan 26, 2019
3 of 3 tasks complete
@pllim pllim added this to the v3.2 milestone Feb 8, 2019
@astrofrog astrofrog self-requested a review Feb 26, 2019
Copy link
Member

astrofrog left a comment

@taldcroft - this looks great! In addition to the in-line comments, one thing I was wondering was whether when a user does:

Table.read.help('ascii') 

it would make sense to just show the format-specific docs, but to include a sentence saying that for general information about the read method, just to do Table.read.help()? As a user, I feel like the workflow I might follow is to first check the help for Table.read, then to find out that I can do Table.read.help('ascii') so I don't necessarily want to see the general docs again. Alternatively, you could swap the order to have the format-specific docs first?

see: https://github.com/astropy/astropy-APEs/blob/master/APE6.rst.
"""ECSV (Enhanced Character Separated Values) format table.
Th ECSV format allows for specification of key table and column meta-data, in

This comment has been minimized.

Copy link
@astrofrog

astrofrog Apr 8, 2019

Member

Th -> The

This parameter is ignored when the ``output`` arg is not a string
(e.g., a file object).
Writer : ``Writer``
Writer class (DEPRECATED).

This comment has been minimized.

Copy link
@astrofrog

astrofrog Apr 8, 2019

Member

I wonder if it's time to finally remove Reader/Writer?

This comment has been minimized.

Copy link
@taldcroft

taldcroft Apr 11, 2019

Author Member

Punting to #8567 and 4.0 release.

output : str, file_like
Output [filename, file-like object]. Defaults to``sys.stdout``.
format : str
Output table format. Defaults to 'basic'.

This comment has been minimized.

Copy link
@astrofrog

astrofrog Apr 8, 2019

Member

I wonder whether when this docstring is used for a specific format it would make sense to not list format? In other words, maybe that is a keyword that is specific to the generic read and write methods?

This comment has been minimized.

Copy link
@taldcroft

taldcroft Apr 11, 2019

Author Member

Good point but I can't think of a clean and easy way to handle this. For now I'd just say to live with this.

Get help on the writer for a particular format (e.g. 'fits') using the
``help()`` method::
>>> CCDData.write.help('fits') # Get detailed help on FITS writer

This comment has been minimized.

Copy link
@astrofrog

astrofrog Apr 8, 2019

Member

I wonder if we should also provide a way and document how to get a list of available format

This comment has been minimized.

Copy link
@taldcroft

taldcroft Apr 11, 2019

Author Member

Done

@@ -8,6 +8,31 @@ For many common cases this will simplify the process of file I/O and reduce the
master the separate details of all the I/O packages within Astropy. For
details on the implementation see :ref:`io_registry`.

Getting started with Image I/O
==============================

This comment has been minimized.

Copy link
@astrofrog

astrofrog Apr 8, 2019

Member

What is the rationale for having this section here? Should it not be in the CCDData docs?

This comment has been minimized.

Copy link
@taldcroft

taldcroft Apr 11, 2019

Author Member

The rationale was that the Unified I/O was never supposed to be just for Table, so I'm trying to make the docs more inclusive / symmetric with Table.

.format(cls.__name__, method_name))
reader_doc = re.sub('.', '=', header)
reader_doc += header
reader_doc += re.sub('.', '=', header)

This comment has been minimized.

Copy link
@astrofrog

astrofrog Apr 8, 2019

Member

Can you add a newline after the heading? That is,

================================
Table.read general documentation
================================

Read and parse a data table and return as a Table.

instead of

================================
Table.read general documentation
================================
Read and parse a data table and return as a Table.

This comment has been minimized.

Copy link
@taldcroft

taldcroft Apr 11, 2019

Author Member

Done.

Notes
-----

This comment has been minimized.

Copy link
@astrofrog

astrofrog Apr 8, 2019

Member

Maybe remove the newline here? At the moment this has too many newlines in some cases:


Notes
-----


The available built-in formats are:

This comment has been minimized.

Copy link
@taldcroft

taldcroft Apr 11, 2019

Author Member

Done.

@taldcroft taldcroft force-pushed the taldcroft:unified-docs branch from e238ce9 to 94dc13c Apr 11, 2019
@taldcroft taldcroft force-pushed the taldcroft:unified-docs branch from 94dc13c to d4c0f44 Apr 11, 2019
@taldcroft

This comment has been minimized.

Copy link
Member Author

taldcroft commented Apr 11, 2019

@astrofrog - I think I have addressed all your comments.

@astrofrog astrofrog self-requested a review Apr 11, 2019
Copy link
Member

bsipocz left a comment

My main comment is that using intersphinx linking rather than hard wired urls is preferable.
The other comment is whether it's possible to change the order of the doc, so the format specific part comes up before the generic one.

The rest is then just nitpicking.

astropy/io/ascii/docs.py Show resolved Hide resolved
----------
table : str, file-like, list, pathlib.Path object
Input table as a file name, file-like object, list of strings,
single newline-separated string or pathlib.Path object .

This comment has been minimized.

Copy link
@bsipocz

bsipocz Apr 11, 2019

Member

could you link `pathlib.Path`

This comment has been minimized.

Copy link
@taldcroft

taldcroft Apr 12, 2019

Author Member

Done.

Reader : `~astropy.io.ascii.BaseReader`
Reader class (DEPRECATED)
encoding: str

This comment has been minimized.

Copy link
@bsipocz

bsipocz Apr 11, 2019

Member
Suggested change
encoding: str
encoding : str

This comment has been minimized.

Copy link
@taldcroft

taldcroft Apr 12, 2019

Author Member

Done.

astropy/io/ascii/docs.py Show resolved Hide resolved
table for each chunk. The default is to return a single stacked table
for all the chunks.
Reader : `~astropy.io.ascii.BaseReader`

This comment has been minimized.

Copy link
@bsipocz

bsipocz Apr 11, 2019

Member

I think we normally remove the deprecated parameters from the docstring when they get deprecated. Same for Writer below.

This comment has been minimized.

Copy link
@taldcroft

taldcroft Apr 12, 2019

Author Member

Done

See also:
- http://docs.astropy.org/en/stable/io/ascii/

This comment has been minimized.

Copy link
@bsipocz

bsipocz Apr 11, 2019

Member

same here, if possible use intersphinx instead

This comment has been minimized.

Copy link
@taldcroft

taldcroft Apr 12, 2019

Author Member

Won't fix.

astropy/io/ascii/fixedwidth.py Show resolved Hide resolved
astropy/table/connect.py Show resolved Hide resolved
astropy/table/connect.py Show resolved Hide resolved
astropy/table/connect.py Show resolved Hide resolved
@taldcroft

This comment has been minimized.

Copy link
Member Author

taldcroft commented Apr 12, 2019

@bsipocz - comments all addressed, I think.

@taldcroft taldcroft force-pushed the taldcroft:unified-docs branch from d947005 to 8c0ce56 Apr 12, 2019
@codecov

This comment has been minimized.

Copy link

codecov bot commented Apr 12, 2019

Codecov Report

❗️ No coverage uploaded for pull request base (master@780d1f4). Click here to learn what that means.
The diff coverage is 88%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #8255   +/-   ##
=========================================
  Coverage          ?   86.89%           
=========================================
  Files             ?      399           
  Lines             ?    59197           
  Branches          ?     1100           
=========================================
  Hits              ?    51442           
  Misses            ?     7114           
  Partials          ?      641
Impacted Files Coverage Δ
astropy/io/ascii/daophot.py 98.76% <ø> (ø)
astropy/io/ascii/cds.py 96.63% <ø> (ø)
astropy/nddata/ccddata.py 94.59% <ø> (ø)
astropy/io/ascii/rst.py 100% <ø> (ø)
astropy/io/ascii/ipac.py 97.47% <ø> (ø)
astropy/io/ascii/html.py 97.87% <ø> (ø)
astropy/io/ascii/sextractor.py 98.48% <ø> (ø)
astropy/io/ascii/ecsv.py 93.75% <ø> (ø)
astropy/io/ascii/latex.py 97.88% <ø> (ø)
astropy/io/ascii/fixedwidth.py 96.75% <ø> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 780d1f4...ebddb39. Read the comment docs.

@bsipocz

This comment has been minimized.

Copy link
Member

bsipocz commented Apr 12, 2019

Thanks @taldcroft!

@pllim

This comment has been minimized.

Copy link
Member

pllim commented Apr 19, 2019

Please rebase to resolve conflicts.

@bsipocz bsipocz force-pushed the taldcroft:unified-docs branch from 8c0ce56 to ebddb39 Apr 19, 2019
@bsipocz

This comment has been minimized.

Copy link
Member

bsipocz commented Apr 19, 2019

Rebased, this should be now good to go.

@pllim pllim merged commit ed4ccb4 into astropy:master Apr 20, 2019
15 checks passed
15 checks passed
astropy-bot:changelog Changelog entry consistent with milestone
astropy-bot:milestone This pull request has a milestone set.
ci/circleci: 32bit Your tests passed on CircleCI!
Details
ci/circleci: egg-info-37 Your tests passed on CircleCI!
Details
ci/circleci: html-docs Your tests passed on CircleCI!
Details
ci/circleci: image-tests-mpl202 Your tests passed on CircleCI!
Details
ci/circleci: image-tests-mpl212 Your tests passed on CircleCI!
Details
ci/circleci: image-tests-mpl222 Your tests passed on CircleCI!
Details
ci/circleci: image-tests-mpl300 Your tests passed on CircleCI!
Details
ci/circleci: image-tests-mpldev Your tests passed on CircleCI!
Details
codecov/patch No report found to compare against
Details
codecov/project No report found to compare against
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
giles Click details to preview the documentation build
Details
@taldcroft taldcroft deleted the taldcroft:unified-docs branch Oct 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.