Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow round-trip of MaskedColumn in ECSV, FITS, HDF5 #7481

Merged
merged 7 commits into from
Jul 16, 2018

Conversation

taldcroft
Copy link
Member

@taldcroft taldcroft commented May 18, 2018

This provides a workaround for #4708. It allows fully robust round-trip of MaskedColumn in ECSV, FITS, and HDF5 by explicitly writing the mask as a separate column (if there are any masked elements). However, the new machinery is only enabled by default for HDF5, where writing a masked column is currently entirely unsupported.

For FITS the standard is mostly adequate, with the exception that it may not always be possible to choose a TNULL value that is not represented within the data. In this case it would be useful to issue a warning when writing, and point to a new opt-in feature to use this MaskedColumn serialization and store the data and mask separately.

For ECSV the current standard of using "" is robust for all cases except storing a blank entry in a string column. Like FITS, this situation could generate a warning on write and allow for explicitly choosing to separate the data and mask.

To get MaskedColumn to work I had to introduce some generalizations in the table serialization machinery. I added some more inline docs, though this really would deserve a longer treatment in some design docs somewhere (in a perfect world).

@astropy-bot
Copy link

astropy-bot bot commented May 18, 2018

Hi there @taldcroft 👋 - thanks for the pull request! I'm just a friendly 🤖 that checks for issues related to the changelog and making sure that this pull request is milestoned and labeled correctly. This is mainly intended for the maintainers, so if you are not a maintainer you can ignore this, and a maintainer will let you know if any action is required on your part 😃.

Everything looks good from my point of view! 👍

If there are any issues with this message, please report them here.

@taldcroft
Copy link
Member Author

BTW, test failures are numpy upstream (numpy/numpy#11121) but I'll fix.

Copy link
Member Author

@taldcroft taldcroft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some inline comments to help out reviewers. The real code changes are actually fairly minimal.

table = table_cls(table, names=kwargs.get('names'))
# Ensure that `table` is a Table subclass.
names = kwargs.get('names')
if isinstance(table, Table):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been refactored a bit but the only real diff is setting serialized_meta where appropriate.

def __init__(self, bound=False):
super().__init__(bound)

# If bound to a data object instance then create the dict of attributes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to init code for TimeInfo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see that being reusable!

# If not a mixin, or if class in ``exclude_classes`` tuple then
# treat as a normal column. Excluded sub-classes must be explicitly
# specified.
if not has_info_class(col, MixinInfo) or col.__class__ in exclude_classes:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a key change here to not use "being a mixin" as the criteria for getting into the serialize code, instead ask the object if it has columns that need to be serialized separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes OOP to the rescue!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adjust the comment above this line to reflect the change.

@@ -51,52 +56,44 @@ def _represent_mixin_as_column(col, name, new_cols, mixin_cols,
# - description: DO store
# - meta: DO store
info = {}
for attr, nontrivial, xform in (('unit', lambda x: x not in (None, ''), str),
for attr, nontrivial, xform in (('unit', lambda x: x is not None and x != '', str),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Work around bug in unit where doing an equality test of None with an invalid unit raises an exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment in the code for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better still, raise an issue! That should be relatively trivial to fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New issue #7603.

I'm not adding a comment about this because in retrospect the x not in (None, '') code is non-ideal anyway. It effectively expands to testing x == None instead of the preferred x is None test.

data_attrs = [key for key in ordered_keys if key in obj_attrs and
getattr(obj_attrs[key], 'shape', ())[:1] == col.shape[:1]]

for data_attr in data_attrs:
data = obj_attrs[data_attr]
if len(data_attrs) == 1 and not has_info_class(data, MixinInfo):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace this ad-hoc rule (and mostly repeated code) with explicit test that uses col.info._represent_as_dict_primary_data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better indeed (even if not completely sold on profusion of names in info - at some point, we should have a nice, explicit API, but right now I think we're still finding out what needs to be in it!).

p.s. Having looked further down now: Do very much like the added comments.

for col in tbl.itercols():
_represent_mixin_as_column(col, col.info.name, new_cols, mixin_cols,
exclude_classes=exclude_classes)

# If no metadata was created then just return the original table.
if not mixin_cols:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For code clarity this should probably be named serialized_cols now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you have the energy, that would be a good change. But perhaps easier to do in a follow-up PR, as this one is large enough already!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed to make this change be a followup action.

Copy link
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: I am quite jet lagged so someone else should review this too! ✈️

Looks good to me, just a few small comments below - in addition I wonder whether we might want to think about actually having extensions to the FITS format for this kind of thing, so that users could easily write files from other languages that follow this serialization convention. For instance we could have TMASK4 = 5 mean that column 5 is a mask for column 4. I'm not suggesting we solve this now since we are already using the YAML for other things, but it would be nice to provide ways to have better inter-operability with other packages than Astropy.

Maybe the right thing to do is to think about cases like this which could even belong in the FITS standard itself and propose them.

assert np.all(t2['a'] == [1, 2, 3])
assert np.all(t2['b'][:2] == [1.0, 2.0]) # OK
assert np.isnan(t2['b'][2]) # WRONG
assert np.all(t2['c'] == ['c', 'N', 'e']) # WRONG ('N' is first char of 'N/A')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather have an xfail test with the right asserts than a test with the wrong asserts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @astrofrog here.

Also, for someone who comes across this later, it would probably be good to put a link to the relevant github issue .

Alternatively to both (and perhaps better), briefly explain why things go wrong here.

Copy link
Member Author

@taldcroft taldcroft Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end I think it makes more sense to take this "failure regression" test out and open an issue that can be tracked. XFAILing tests is not the most sensible / useful way to track bugs. I have edited the original #4708 issue to reflect these additional problems and removed this test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good; just be sure there is a link to that issue in the test file.

@@ -51,52 +56,44 @@ def _represent_mixin_as_column(col, name, new_cols, mixin_cols,
# - description: DO store
# - meta: DO store
info = {}
for attr, nontrivial, xform in (('unit', lambda x: x not in (None, ''), str),
for attr, nontrivial, xform in (('unit', lambda x: x is not None and x != '', str),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment in the code for this

3 -- e True

>>> for col in t.itercols():
... col.info.serialize_method['fits'] = 'data_mask'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a keyword argument to the Table write function for FITS files (or maybe even at a higher level so it works for HDF5 and ECSV) so that one can do:

t.write('myfile.fits', serialize_method='data_mask')

I think this would be cleaner for users than having to loop as above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed the interface is a little clunky and maybe we can come up with something better. The problem is that we are dealing with different formats that have different requirements and format-specific rules and defaults. tl;dr - it's a little complicated, and basically instead of thinking to invent something elegant I just shoe-horned onto the existing interface.

serialize_method came into being as a way to specify that, by default, a Time object should be serialized to most ASCII table formats (including ECSV) as the formatted value, but for FITS and HDF5 it should be a jd1/jd2 tuple. The code doing the conversion would not necessarily know the context (FITS or ECSV), so we put in a context method to provide that. So:

  1. serialize_method defines default reasonable behaviors for column types that depend on the format (FITS, HDF5, etc). Previously there was only Time, but now also MaskedColumn.
  2. serialize_method exists as an Info attribute that can be customized.

Within a single table you can have Time and MaskedColumn objects, so any high-level serialize_method needs to allow a dict keyed by column name. Not a problem, though if you have a masked table with Time columns you will need to fill in the dict appropriately for every column anyway.

One way forward is to open an issue to define a better high-level interface. That is effectively independent of this PR, because no matter what you will need to retain the Info.serialize_method attribute for the internal code. This provides a very convenient way to get this information down from the user interface into the low-level routines that need it. (And of course it is already in the public API for Time). If there is a better UI that could be what people see in the user docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. Just to clarify, we could implement this keyword argument just for the FITS writer though.

@taldcroft
Copy link
Member Author

@astrofrog - I agree that working for broader interoperability is worthwhile. Having said that, I think so far we have adopted a well-defined two pronged approach:

  1. Work entirely within existing FITS conventions / standards and accept the limitations.
  2. Use the YAML serialized meta machinery to enable lossless serialization of astropy objects.

I fully appreciate that the YAML serialized meta is not in the spirit of FITS, and one can argue we shouldn't be going down this path. But my impression that the user base of research astronomers do want the convenience that it brings.

The issue I see with trying to establish new FITS conventions to chip away at (2) is that you quickly run into problems and it turns into a slippery slope. TMASK4 = 5 could do the job for MaskedColumn, but most of the other astropy objects (e.g. Longitude) are not handled, and even for MaskedColumn if there is non-trivial meta for that column then it gets lost. So you end up with a fragmented solution which is confusing.

One thing that would certainly be a good thing is to carefully document the __serialized_columns__ data model. This could be a new APE which extends APE-6, but at least with this written down as a standard then other tools could interpret or even create the information.

@astrofrog
Copy link
Member

I agree with your points - I think what we really need is a binary equivalent of ECSV, which at this point is basically ASDF. So we might want to recommend that if people want to save to a binary format that preserves all the Astropy-ness, the best solution is ASDF?

@perrygreenfield
Copy link
Member

@astrofrog You won't get an argument from me. But I do think we need other compelling examples to motivate use of asdf. I'll try to work on that in the coming months.

Copy link
Contributor

@saimn saimn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand all the serialization implementation, but at least the idea implemented here makes sense so 👍 from my side (modulo @astrofrog 's comments). And it would be good to have an easier API, as discussed above (this can be for a follow-up PR of course).

sentinel values according to the FITS standard:

- ``NaN`` for float columns
- Value of ``TNULLn`` for integer columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention Column.fill_value here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@saimn
Copy link
Contributor

saimn commented May 23, 2018

For FITS the standard is mostly adequate, with the exception that it may not always be possible to choose a TNULL value that is not represented within the data. In this case it would be useful to issue a warning when writing, and point to a new opt-in feature to use this MaskedColumn serialization and store the data and mask separately.

👍 for adding the warning, this would close the original issue. I can have a look if you want.

@taldcroft
Copy link
Member Author

What about adding a keyword serialize_method to Table.write. It would require a dict, which can contain keys of two different categories:

  • class, such as Time or MaskedColumn. This would apply the corresponding serialize_method value to all columns which are an instance of the type.
  • str, This would apply serialize_method to the column of that name.
    In the case where both apply, the str key version (column name) would have precedence.

So an example would be t.write(filename, serialize_method={Time: 'jd1_jd2', 'x': 'data_mask'}).

This would set all the appropriate col.info.serialize_method[format] = value in a context manager around the call to the appropriate write function. (I haven't looked at the registry code in a while to remember if there are any stumbling blocks, but seems like it should be straightforward).

With this it would make sense to de-emphasize (or remove?) refs to the info.serialize_method dict.

@astrofrog
Copy link
Member

What about adding a keyword serialize_method to Table.write. It would require a dict, which can contain keys of two different categories:

Sounds good but can we also make it accept a string for the case when the same serialization method can be applied to all columns?

@taldcroft
Copy link
Member Author

I've added a couple of commits that implement the suggestion to allow a serialize_method kwarg in Table.write. In theory this PR is ready for final review from the code perspective, I just need to update the narrative docs.

@taldcroft
Copy link
Member Author

This is really ready for final review now.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taldcroft - I finally had time to review; this is great! It makes the whole serialization part even clearer. Apart from my question about using TableColumns in the latter,
my comments are mostly nitpicks, asking for a line or two of extra comments, etc.

Do also address the few leftover comments from @astrofrog.

CHANGES.rst Outdated
@@ -72,6 +77,10 @@ astropy.io.fits

- Add an ``ignore_hdus`` keyword to ``FITSDiff`` to allow ignoring HDUs by
NAME when diffing two FITS files [#7538]
- Optionally allow writing masked columns to FITS with the mask explicitly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase artefact -> add new line.

if isinstance(table, Table):
# Note that making a copy of the table here is inefficient but
# without this copy a number of tests break (e.g. in test_fixedwidth).
# TODO: investigate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you raise an issue? (perhaps pointing also to the problem below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert np.all(t2['a'] == [1, 2, 3])
assert np.all(t2['b'][:2] == [1.0, 2.0]) # OK
assert np.isnan(t2['b'][2]) # WRONG
assert np.all(t2['c'] == ['c', 'N', 'e']) # WRONG ('N' is first char of 'N/A')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @astrofrog here.

Also, for someone who comes across this later, it would probably be good to put a link to the relevant github issue .

Alternatively to both (and perhaps better), briefly explain why things go wrong here.

def __init__(self, bound=False):
super().__init__(bound)

# If bound to a data object instance then create the dict of attributes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see that being reusable!

# if an exception occurred.
if serialize_method:
for col in tbl.itercols():
# Go through every column and if it has a serialize_method info
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a copy/paste artefact, I think. At least I expected it to note that this would restore the original method.

Also, why not directly iterate over name, original_value in original_sms.items() and do tbl[name].info.serialize_method = original_value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I think I just did the copy/paste without thinking.

data_attrs = [key for key in ordered_keys if key in obj_attrs and
getattr(obj_attrs[key], 'shape', ())[:1] == col.shape[:1]]

for data_attr in data_attrs:
data = obj_attrs[data_attr]
if len(data_attrs) == 1 and not has_info_class(data, MixinInfo):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better indeed (even if not completely sold on profusion of names in info - at some point, we should have a nice, explicit API, but right now I think we're still finding out what needs to be in it!).

p.s. Having looked further down now: Do very much like the added comments.

for col in tbl.itercols():
_represent_mixin_as_column(col, col.info.name, new_cols, mixin_cols,
exclude_classes=exclude_classes)

# If no metadata was created then just return the original table.
if not mixin_cols:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you have the energy, that would be a good change. But perhaps easier to do in a follow-up PR, as this one is large enough already!

@@ -150,6 +154,32 @@ def _construct_mixin_from_obj_attrs_and_info(obj_attrs, info):
return mixin


class _TableLite(OrderedDict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be an idea to use TableColumns instead? It would mean either adding a few properties to it, or slightly reworking _construct_mixin_from_columns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methods/properties needed here are a strictly orthogonal set to TableColumns. The only thing in common is the OrderedDict base and the concept of containing columns. So I think it makes sense to not clutter up the core TableColumns class for this purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's leave it for now. In the end we'd probably want Table to be fast enough that it doesn't matter...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When Table is changed to allow a mixture of MaskedColumn and Column in the same table, then we can probably revert this. Performance wasn't the driver, but the issue in the comment for this class.

passed through to the underlying data reader (e.g. `~astropy.io.ascii.write`).
The arguments and keywords (other than ``format`` and
``serialize_methods``) provided to this function are passed through to
the underlying data reader (e.g. `~astropy.io.ascii.write`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a Parameters section here that describes format and serialize_method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent idea, and one that I applied to Table.read as well.

and the ``mc1`` column written as a data / mask pair and
write to ECSV:

.. doctest-skip::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised the doctest fails; normally it happily ignores whitespace differences.

@taldcroft
Copy link
Member Author

@mhvk - thanks for the review. I just got back to this and addressed your comments in the final 2 commits. I squashed the previous 19 to make the rebase manageable, and probably a fine idea anyway.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taldcroft - looks great, only one post-rebase cleanup and a nitpick. Also what is probably a general question: I wasn't sure why the writing tests in the documentation are skipped.

@astrofrog - I think your comments have all been addressed, could you check/change your review status?

CHANGES.rst Outdated
@@ -59,6 +59,10 @@ astropy.io.ascii
- Emit a warning when reading an ECSV file without specifying the ``format``
and without PyYAML installed. Previously this silently fell through to
parsing as a basic format file and the file metadata was lost. [#7580]
- Optionally allow writing masked columns to ECSV with the mask explicitly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs empty line above.

# If not a mixin, or if class in ``exclude_classes`` tuple then
# treat as a normal column. Excluded sub-classes must be explicitly
# specified.
if not has_info_class(col, MixinInfo) or col.__class__ in exclude_classes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adjust the comment above this line to reflect the change.

out : `Table`
Table corresponding to file contents

Notes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the notes section without actual notes? Was this meant to have something added for both read and write after the class is defined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I got completely confused by this. Turns out it was some past "cleverness" [my fault!] and these docstrings get munged to add a table of available formats. So without the "Notes" section the content gets dumped into the Return or Parameters section and the sphinx output is broken. And if you do a Table.read? you'll see the table of formats.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that is good to know -- but to ensure confusion doesn't recur, let's add a comment!

that tells the ECSV reader to interpret the ``'c'`` and ``'c.masked'``
columns as components of one `~astropy.table.MaskedColumn` object:

.. doctest-skip::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the test really not work?

of later unmasking the values.

Astropy provides a work-around for this limitation that users can choose to
use. The key part using the ``serialize_method='data_mask'`` keyword argument
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reword: The key part is to use the

2 2.0 -- False
3 -- e True

.. doctest-skip::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we are skipping because this writes in the wrong place? I'm still a bit confused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC (maybe not) some of the skipping was because PyYAML is not installed for docs generation / testing. There was, at some point, a reason I put in the doctest-skip. Also, in general I assume that docstring examples which demonstrate writing to a local file should be skipped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me ping @astrofrog and @bsipocz - my own sense would be to have as many doctests as feasible, and it would seem OK to require yaml for them (or treat it like scipy?)

Anyway, fine to treat this as out of scope for this PR!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And basically I don't have a good relationship with sphinx and doc testing. For instance in that unified.rst file I changed a line of code within a not-skipped section to obviously fail, and my local testing (e.g. python setup.py test --package=io --args='-k rst') showed no problem. So without having a way to reliably test locally (and not having time / desire to understand our sphinx testing), I don't want to risk burning more CI time.

Copy link
Member

@bsipocz bsipocz Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's perfectly fine to require all optional dependencies for the docs build, and it looks like we have added pyyaml at some point to the docs build.

However doctests runs along with normal tests, so the incantation you need (most probably) is .. doctest-requires:: pyyaml before all code snippets that requires pyyaml (or any other optional dependencies). Example for scipy is e.g. in this file docs/stats/lombscargle.rst

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bsipocz, nice to know! Since this is passing tests now, maybe I'll let this slide for now and clean it up later in some other context to avoid burning CI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. That docs cleanup seems to me an ideal first PR issue, so if you could open a reminder for it we can label it accordingly to be picked up by someone else.

@taldcroft taldcroft force-pushed the masked-roundtrip branch 2 times, most recently from 64967c9 to 2b6d861 Compare July 5, 2018 15:41
@taldcroft
Copy link
Member Author

@mhvk - good to merge? I also believe that I addressed all the comments from @astrofrog .

@mhvk
Copy link
Contributor

mhvk commented Jul 5, 2018

@taldcroft - if you can, could you add/adjust the comments (see first two queries in my last review). That can be skip ci to not waste too much electricity.

- Generalize detection of col types requiring serialization
- Rework construct_mixins_from_columns to make Table at end
- Allow round-trip of MaskedColumn in ecsv, FITS, HDF5
- Fix issue with invalid unit in serialized columns
- Add serialize_method attr to MaskedColumn for stability of FITS, ECSV
- Revert to always copying table for io.ascii writing
- Update changelog [skip ci]
- Add narrative docs
- Fix errors and improve narrative docs
- Add serialize_methods kwarg to Table.write interface [skip ci]
- Allow for single string as serialize_method
- Handle corner case of ECSV with blank string in data
- Update ECSV narrative docs
- Update FITS docs for serialize_method
- Handle exceptions correctly
- Change default MaskedColumn serialize method from None to 'null_value'
- More narrative docs
- Add another changelog entry
- More docs tweaks / fixes
@taldcroft
Copy link
Member Author

@bsipocz - can you (or someone) restart the ci/circleci 32bit tests? This one never got off the ground. The changes since last time passing are trivial, but I still feel uncomfortable merging without see at least that set of tests passing.

@bsipocz
Copy link
Member

bsipocz commented Jul 16, 2018

@taldcroft - Done.

@taldcroft taldcroft dismissed astrofrog’s stale review July 16, 2018 13:28

Comments addressed

@taldcroft taldcroft merged commit 8217143 into astropy:master Jul 16, 2018
@taldcroft taldcroft deleted the masked-roundtrip branch July 16, 2018 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants