-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow round-trip of MaskedColumn in ECSV, FITS, HDF5 #7481
Conversation
Hi there @taldcroft 👋 - thanks for the pull request! I'm just a friendly 🤖 that checks for issues related to the changelog and making sure that this pull request is milestoned and labeled correctly. This is mainly intended for the maintainers, so if you are not a maintainer you can ignore this, and a maintainer will let you know if any action is required on your part 😃. Everything looks good from my point of view! 👍 If there are any issues with this message, please report them here. |
BTW, test failures are numpy upstream (numpy/numpy#11121) but I'll fix. |
a18ef4e
to
5ec9778
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some inline comments to help out reviewers. The real code changes are actually fairly minimal.
table = table_cls(table, names=kwargs.get('names')) | ||
# Ensure that `table` is a Table subclass. | ||
names = kwargs.get('names') | ||
if isinstance(table, Table): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been refactored a bit but the only real diff is setting serialized_meta
where appropriate.
def __init__(self, bound=False): | ||
super().__init__(bound) | ||
|
||
# If bound to a data object instance then create the dict of attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to init code for TimeInfo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see that being reusable!
astropy/table/serialize.py
Outdated
# If not a mixin, or if class in ``exclude_classes`` tuple then | ||
# treat as a normal column. Excluded sub-classes must be explicitly | ||
# specified. | ||
if not has_info_class(col, MixinInfo) or col.__class__ in exclude_classes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a key change here to not use "being a mixin" as the criteria for getting into the serialize code, instead ask the object if it has columns that need to be serialized separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes OOP to the rescue!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe adjust the comment above this line to reflect the change.
@@ -51,52 +56,44 @@ def _represent_mixin_as_column(col, name, new_cols, mixin_cols, | |||
# - description: DO store | |||
# - meta: DO store | |||
info = {} | |||
for attr, nontrivial, xform in (('unit', lambda x: x not in (None, ''), str), | |||
for attr, nontrivial, xform in (('unit', lambda x: x is not None and x != '', str), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Work around bug in unit where doing an equality test of None
with an invalid unit raises an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment in the code for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better still, raise an issue! That should be relatively trivial to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New issue #7603.
I'm not adding a comment about this because in retrospect the x not in (None, '')
code is non-ideal anyway. It effectively expands to testing x == None
instead of the preferred x is None
test.
data_attrs = [key for key in ordered_keys if key in obj_attrs and | ||
getattr(obj_attrs[key], 'shape', ())[:1] == col.shape[:1]] | ||
|
||
for data_attr in data_attrs: | ||
data = obj_attrs[data_attr] | ||
if len(data_attrs) == 1 and not has_info_class(data, MixinInfo): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace this ad-hoc rule (and mostly repeated code) with explicit test that uses col.info._represent_as_dict_primary_data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better indeed (even if not completely sold on profusion of names in info
- at some point, we should have a nice, explicit API, but right now I think we're still finding out what needs to be in it!).
p.s. Having looked further down now: Do very much like the added comments.
for col in tbl.itercols(): | ||
_represent_mixin_as_column(col, col.info.name, new_cols, mixin_cols, | ||
exclude_classes=exclude_classes) | ||
|
||
# If no metadata was created then just return the original table. | ||
if not mixin_cols: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For code clarity this should probably be named serialized_cols
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if you have the energy, that would be a good change. But perhaps easier to do in a follow-up PR, as this one is large enough already!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed to make this change be a followup action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclaimer: I am quite jet lagged so someone else should review this too!
Looks good to me, just a few small comments below - in addition I wonder whether we might want to think about actually having extensions to the FITS format for this kind of thing, so that users could easily write files from other languages that follow this serialization convention. For instance we could have TMASK4 = 5
mean that column 5 is a mask for column 4. I'm not suggesting we solve this now since we are already using the YAML for other things, but it would be nice to provide ways to have better inter-operability with other packages than Astropy.
Maybe the right thing to do is to think about cases like this which could even belong in the FITS standard itself and propose them.
assert np.all(t2['a'] == [1, 2, 3]) | ||
assert np.all(t2['b'][:2] == [1.0, 2.0]) # OK | ||
assert np.isnan(t2['b'][2]) # WRONG | ||
assert np.all(t2['c'] == ['c', 'N', 'e']) # WRONG ('N' is first char of 'N/A') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rather have an xfail test with the right asserts than a test with the wrong asserts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @astrofrog here.
Also, for someone who comes across this later, it would probably be good to put a link to the relevant github issue .
Alternatively to both (and perhaps better), briefly explain why things go wrong here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end I think it makes more sense to take this "failure regression" test out and open an issue that can be tracked. XFAILing tests is not the most sensible / useful way to track bugs. I have edited the original #4708 issue to reflect these additional problems and removed this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good; just be sure there is a link to that issue in the test file.
@@ -51,52 +56,44 @@ def _represent_mixin_as_column(col, name, new_cols, mixin_cols, | |||
# - description: DO store | |||
# - meta: DO store | |||
info = {} | |||
for attr, nontrivial, xform in (('unit', lambda x: x not in (None, ''), str), | |||
for attr, nontrivial, xform in (('unit', lambda x: x is not None and x != '', str), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment in the code for this
docs/io/unified.rst
Outdated
3 -- e True | ||
|
||
>>> for col in t.itercols(): | ||
... col.info.serialize_method['fits'] = 'data_mask' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a keyword argument to the Table write function for FITS files (or maybe even at a higher level so it works for HDF5 and ECSV) so that one can do:
t.write('myfile.fits', serialize_method='data_mask')
I think this would be cleaner for users than having to loop as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed the interface is a little clunky and maybe we can come up with something better. The problem is that we are dealing with different formats that have different requirements and format-specific rules and defaults. tl;dr - it's a little complicated, and basically instead of thinking to invent something elegant I just shoe-horned onto the existing interface.
serialize_method
came into being as a way to specify that, by default, a Time
object should be serialized to most ASCII table formats (including ECSV) as the formatted value, but for FITS and HDF5 it should be a jd1/jd2 tuple. The code doing the conversion would not necessarily know the context (FITS or ECSV), so we put in a context method to provide that. So:
serialize_method
defines default reasonable behaviors for column types that depend on the format (FITS, HDF5, etc). Previously there was onlyTime
, but now alsoMaskedColumn
.serialize_method
exists as an Info attribute that can be customized.
Within a single table you can have Time
and MaskedColumn
objects, so any high-level serialize_method
needs to allow a dict keyed by column name. Not a problem, though if you have a masked table with Time columns you will need to fill in the dict appropriately for every column anyway.
One way forward is to open an issue to define a better high-level interface. That is effectively independent of this PR, because no matter what you will need to retain the Info.serialize_method
attribute for the internal code. This provides a very convenient way to get this information down from the user interface into the low-level routines that need it. (And of course it is already in the public API for Time). If there is a better UI that could be what people see in the user docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. Just to clarify, we could implement this keyword argument just for the FITS writer though.
@astrofrog - I agree that working for broader interoperability is worthwhile. Having said that, I think so far we have adopted a well-defined two pronged approach:
I fully appreciate that the YAML serialized meta is not in the spirit of FITS, and one can argue we shouldn't be going down this path. But my impression that the user base of research astronomers do want the convenience that it brings. The issue I see with trying to establish new FITS conventions to chip away at (2) is that you quickly run into problems and it turns into a slippery slope. One thing that would certainly be a good thing is to carefully document the |
I agree with your points - I think what we really need is a binary equivalent of ECSV, which at this point is basically ASDF. So we might want to recommend that if people want to save to a binary format that preserves all the Astropy-ness, the best solution is ASDF? |
@astrofrog You won't get an argument from me. But I do think we need other compelling examples to motivate use of asdf. I'll try to work on that in the coming months. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand all the serialization implementation, but at least the idea implemented here makes sense so 👍 from my side (modulo @astrofrog 's comments). And it would be good to have an easier API, as discussed above (this can be for a follow-up PR of course).
docs/io/unified.rst
Outdated
sentinel values according to the FITS standard: | ||
|
||
- ``NaN`` for float columns | ||
- Value of ``TNULLn`` for integer columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention Column.fill_value
here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
👍 for adding the warning, this would close the original issue. I can have a look if you want. |
What about adding a keyword
So an example would be This would set all the appropriate With this it would make sense to de-emphasize (or remove?) refs to the |
Sounds good but can we also make it accept a string for the case when the same serialization method can be applied to all columns? |
5ec9778
to
cd34393
Compare
I've added a couple of commits that implement the suggestion to allow a |
dcea5cc
to
199a2a4
Compare
This is really ready for final review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taldcroft - I finally had time to review; this is great! It makes the whole serialization part even clearer. Apart from my question about using TableColumns
in the latter,
my comments are mostly nitpicks, asking for a line or two of extra comments, etc.
Do also address the few leftover comments from @astrofrog.
CHANGES.rst
Outdated
@@ -72,6 +77,10 @@ astropy.io.fits | |||
|
|||
- Add an ``ignore_hdus`` keyword to ``FITSDiff`` to allow ignoring HDUs by | |||
NAME when diffing two FITS files [#7538] | |||
- Optionally allow writing masked columns to FITS with the mask explicitly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebase artefact -> add new line.
astropy/io/ascii/ui.py
Outdated
if isinstance(table, Table): | ||
# Note that making a copy of the table here is inefficient but | ||
# without this copy a number of tests break (e.g. in test_fixedwidth). | ||
# TODO: investigate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you raise an issue? (perhaps pointing also to the problem below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert np.all(t2['a'] == [1, 2, 3]) | ||
assert np.all(t2['b'][:2] == [1.0, 2.0]) # OK | ||
assert np.isnan(t2['b'][2]) # WRONG | ||
assert np.all(t2['c'] == ['c', 'N', 'e']) # WRONG ('N' is first char of 'N/A') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @astrofrog here.
Also, for someone who comes across this later, it would probably be good to put a link to the relevant github issue .
Alternatively to both (and perhaps better), briefly explain why things go wrong here.
def __init__(self, bound=False): | ||
super().__init__(bound) | ||
|
||
# If bound to a data object instance then create the dict of attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see that being reusable!
astropy/table/info.py
Outdated
# if an exception occurred. | ||
if serialize_method: | ||
for col in tbl.itercols(): | ||
# Go through every column and if it has a serialize_method info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a copy/paste artefact, I think. At least I expected it to note that this would restore the original method.
Also, why not directly iterate over name, original_value in original_sms.items()
and do tbl[name].info.serialize_method = original_value
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I think I just did the copy/paste without thinking.
data_attrs = [key for key in ordered_keys if key in obj_attrs and | ||
getattr(obj_attrs[key], 'shape', ())[:1] == col.shape[:1]] | ||
|
||
for data_attr in data_attrs: | ||
data = obj_attrs[data_attr] | ||
if len(data_attrs) == 1 and not has_info_class(data, MixinInfo): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better indeed (even if not completely sold on profusion of names in info
- at some point, we should have a nice, explicit API, but right now I think we're still finding out what needs to be in it!).
p.s. Having looked further down now: Do very much like the added comments.
for col in tbl.itercols(): | ||
_represent_mixin_as_column(col, col.info.name, new_cols, mixin_cols, | ||
exclude_classes=exclude_classes) | ||
|
||
# If no metadata was created then just return the original table. | ||
if not mixin_cols: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if you have the energy, that would be a good change. But perhaps easier to do in a follow-up PR, as this one is large enough already!
@@ -150,6 +154,32 @@ def _construct_mixin_from_obj_attrs_and_info(obj_attrs, info): | |||
return mixin | |||
|
|||
|
|||
class _TableLite(OrderedDict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might it be an idea to use TableColumns
instead? It would mean either adding a few properties to it, or slightly reworking _construct_mixin_from_columns
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The methods/properties needed here are a strictly orthogonal set to TableColumns
. The only thing in common is the OrderedDict
base and the concept of containing columns. So I think it makes sense to not clutter up the core TableColumns
class for this purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's leave it for now. In the end we'd probably want Table
to be fast enough that it doesn't matter...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When Table is changed to allow a mixture of MaskedColumn and Column in the same table, then we can probably revert this. Performance wasn't the driver, but the issue in the comment for this class.
astropy/table/table.py
Outdated
passed through to the underlying data reader (e.g. `~astropy.io.ascii.write`). | ||
The arguments and keywords (other than ``format`` and | ||
``serialize_methods``) provided to this function are passed through to | ||
the underlying data reader (e.g. `~astropy.io.ascii.write`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a Parameters
section here that describes format
and serialize_method
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent idea, and one that I applied to Table.read
as well.
and the ``mc1`` column written as a data / mask pair and | ||
write to ECSV: | ||
|
||
.. doctest-skip:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised the doctest fails; normally it happily ignores whitespace differences.
a2e67c3
to
4306b3c
Compare
@mhvk - thanks for the review. I just got back to this and addressed your comments in the final 2 commits. I squashed the previous 19 to make the rebase manageable, and probably a fine idea anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taldcroft - looks great, only one post-rebase cleanup and a nitpick. Also what is probably a general question: I wasn't sure why the writing tests in the documentation are skipped.
@astrofrog - I think your comments have all been addressed, could you check/change your review status?
CHANGES.rst
Outdated
@@ -59,6 +59,10 @@ astropy.io.ascii | |||
- Emit a warning when reading an ECSV file without specifying the ``format`` | |||
and without PyYAML installed. Previously this silently fell through to | |||
parsing as a basic format file and the file metadata was lost. [#7580] | |||
- Optionally allow writing masked columns to ECSV with the mask explicitly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs empty line above.
astropy/table/serialize.py
Outdated
# If not a mixin, or if class in ``exclude_classes`` tuple then | ||
# treat as a normal column. Excluded sub-classes must be explicitly | ||
# specified. | ||
if not has_info_class(col, MixinInfo) or col.__class__ in exclude_classes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe adjust the comment above this line to reflect the change.
out : `Table` | ||
Table corresponding to file contents | ||
|
||
Notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the notes section without actual notes? Was this meant to have something added for both read
and write
after the class is defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I got completely confused by this. Turns out it was some past "cleverness" [my fault!] and these docstrings get munged to add a table of available formats. So without the "Notes" section the content gets dumped into the Return
or Parameters
section and the sphinx output is broken. And if you do a Table.read?
you'll see the table of formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that is good to know -- but to ensure confusion doesn't recur, let's add a comment!
that tells the ECSV reader to interpret the ``'c'`` and ``'c.masked'`` | ||
columns as components of one `~astropy.table.MaskedColumn` object: | ||
|
||
.. doctest-skip:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the test really not work?
docs/io/unified.rst
Outdated
of later unmasking the values. | ||
|
||
Astropy provides a work-around for this limitation that users can choose to | ||
use. The key part using the ``serialize_method='data_mask'`` keyword argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reword: The key part is to use the
2 2.0 -- False | ||
3 -- e True | ||
|
||
.. doctest-skip:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we are skipping because this writes in the wrong place? I'm still a bit confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC (maybe not) some of the skipping was because PyYAML is not installed for docs generation / testing. There was, at some point, a reason I put in the doctest-skip. Also, in general I assume that docstring examples which demonstrate writing to a local file should be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me ping @astrofrog and @bsipocz - my own sense would be to have as many doctests as feasible, and it would seem OK to require yaml for them (or treat it like scipy?)
Anyway, fine to treat this as out of scope for this PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And basically I don't have a good relationship with sphinx and doc testing. For instance in that unified.rst
file I changed a line of code within a not-skipped section to obviously fail, and my local testing (e.g. python setup.py test --package=io --args='-k rst'
) showed no problem. So without having a way to reliably test locally (and not having time / desire to understand our sphinx testing), I don't want to risk burning more CI time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's perfectly fine to require all optional dependencies for the docs build, and it looks like we have added pyyaml at some point to the docs build.
However doctests runs along with normal tests, so the incantation you need (most probably) is .. doctest-requires:: pyyaml
before all code snippets that requires pyyaml (or any other optional dependencies). Example for scipy is e.g. in this file docs/stats/lombscargle.rst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bsipocz, nice to know! Since this is passing tests now, maybe I'll let this slide for now and clean it up later in some other context to avoid burning CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. That docs cleanup seems to me an ideal first PR issue, so if you could open a reminder for it we can label it accordingly to be picked up by someone else.
64967c9
to
2b6d861
Compare
@mhvk - good to merge? I also believe that I addressed all the comments from @astrofrog . |
@taldcroft - if you can, could you add/adjust the comments (see first two queries in my last review). That can be |
- Generalize detection of col types requiring serialization - Rework construct_mixins_from_columns to make Table at end - Allow round-trip of MaskedColumn in ecsv, FITS, HDF5 - Fix issue with invalid unit in serialized columns - Add serialize_method attr to MaskedColumn for stability of FITS, ECSV - Revert to always copying table for io.ascii writing - Update changelog [skip ci] - Add narrative docs - Fix errors and improve narrative docs - Add serialize_methods kwarg to Table.write interface [skip ci] - Allow for single string as serialize_method - Handle corner case of ECSV with blank string in data - Update ECSV narrative docs - Update FITS docs for serialize_method - Handle exceptions correctly - Change default MaskedColumn serialize method from None to 'null_value' - More narrative docs - Add another changelog entry - More docs tweaks / fixes
2b6d861
to
366c103
Compare
@bsipocz - can you (or someone) restart the ci/circleci 32bit tests? This one never got off the ground. The changes since last time passing are trivial, but I still feel uncomfortable merging without see at least that set of tests passing. |
@taldcroft - Done. |
This provides a workaround for #4708. It allows fully robust round-trip of
MaskedColumn
in ECSV, FITS, and HDF5 by explicitly writing the mask as a separate column (if there are any masked elements). However, the new machinery is only enabled by default for HDF5, where writing a masked column is currently entirely unsupported.For FITS the standard is mostly adequate, with the exception that it may not always be possible to choose a TNULL value that is not represented within the data. In this case it would be useful to issue a warning when writing, and point to a new opt-in feature to use this MaskedColumn serialization and store the data and mask separately.
For ECSV the current standard of using
""
is robust for all cases except storing a blank entry in a string column. Like FITS, this situation could generate a warning on write and allow for explicitly choosing to separate the data and mask.To get MaskedColumn to work I had to introduce some generalizations in the table serialization machinery. I added some more inline docs, though this really would deserve a longer treatment in some design docs somewhere (in a perfect world).