-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support record array columns in astropy.table.Table #3759
Conversation
@taldcroft: Any objections to this in principle? |
@@ -44,7 +44,7 @@ def descr(col): | |||
This returns a 3-tuple (name, type, shape) that can always be | |||
used in a structured array dtype definition. | |||
""" | |||
col_dtype_str = col.dtype.str if hasattr(col, 'dtype') else 'O' | |||
col_dtype_str = col.dtype if hasattr(col, 'dtype') else 'O' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a niggle, but the var name should probably be called col_dtype
now to correspond to the actual contents.
@mdboom - sorry, I looked at this a few days ago and was going to say "great!". The initial questions I had were:
|
One idea which would be to add a recarray column into the list of |
@@ -19,6 +19,8 @@ def setup_method(self, method): | |||
self.c = MaskedColumn(name='c', data=[7, 8, 9], mask=False) | |||
self.d_mask = np.array([False, True, False]) | |||
self.d = MaskedColumn(name='d', data=[7, 8, 7], mask=self.d_mask) | |||
self.e = MaskedColumn(name='e', data=[(7, 'a'), (8, 'b'), (7, 'c')], | |||
mask=[[False, True], [True, False], [False, False]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need a dtype=..
here? The dtype
that comes out seems to be string:
In [20]: e = MaskedColumn(name='e', data=[(7, 'a'), (8, 'b'), (7, 'c')],
mask=[[False, True], [True, False], [False, False]])
In [21]: e
Out[21]:
<MaskedColumn name='e' dtype='string168' shape=(2,) length=3>
7 .. --
-- .. b
7 .. c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Good catch.
I suspect table reading/writing with io.ascii may also not work (though strictly speaking it doesn't have to for this to be useful). |
@mdboom - I've been playing around with additional tests etc but got pulled away. I will try to have a PR for your branch this week. |
I guess we may have to put something in the |
May I ask what the use case for this feature is? [\rant on] Is that really worth the added code complexity, added test cases, added developer time for maintenance and, most importantly, added complexity that the user sees when he/she reads te documentation?
@mdboom: Sorry, feel free to ignore the rant. This is not particularly directed against your specific feature; I'm just warning of feature-creep in general. |
This is a way forward to serialize the kind of mixin columns (coordinates etc.) you describe to/from disk in a packed binary format. I wouldn't necessarily point to this as an example of feature creep -- outside of the testing and the pretty-printing, this is a 3-line bugfix to use numpy to properly use its generality. |
@hamogu - the biggest long-term worry about feature creep is that it grows the code complexity to the point where it becomes unmaintainable. So we need to manage that carefully, and I don't think this change impacts overall complexity. (But I do note that this PR may be currently incomplete in terms of preventing / warning the user from doing things like writing an ASCII table that won't work quite as expected, but that's something I'm still reviewing.) The other worry about feature-creep is draining resources, as you mentioned. That's a valid point, but with open-source software you often get features in an area where people are excited to develop. I'm excited about table functionality (and we have a GSoC student that is similarly motivated). Alas I don't personally care about plotting a spectrum. 😄 |
@mdboom - check out the BTW, this relies on a change that is outside the scope of this PR, namely allowing initialization of Table with a Column object that doesn't have a defined |
(So I'm likely going to put in a PR to make the |
Thanks for drilling down on the tests. I see three classes of failures on
|
Implementing this as a mixin column is an interesting idea. That would effectively solve the first point about problems in read/write. For masked arrays the current mixin implementation has a pseudo-mask which is a 1-d mask that is there for API compatibility but which raises an exception if any element is set to True. So again this would only allow operations that are known to work or else raise an informative exception. The downside is that in order to make it a mixin you need to be able to set attributes, which (I think) means taking the input ndarray column and making an ndarray-subclass that has the In the end it might be easier to make a new Table method The business about the nameless columns was mostly an artifact of a particular mixin test, so you're right that it is not really directly relevant here. Anyway though, #3781 is merged. |
@mdboom - here is a proof of concept for viewing any structured ndarray as a mixin column. It works for the simple test of adding a structured array column and confirming that it is indeed a mixin. Unfortunately it fails 21 tests in table. I don't have time today to look further and see if it is simple or a deep problem with the new failing tests. |
Thanks, @taldcroft. No rush on this -- I'll let you know if I find the time to look into this and find anything. |
I share a bit of @hamogu's concern, especially with |
@taldcroft: Stepped away from this for a while. Have you done any work on the remaining details here, or should I? (No problem either way, just don't want to duplicate work if possible). |
cbbbae2
to
93dc14c
Compare
@mdboom - I never got a chance to dig into this and see why so many tests were failing. If you have time to take a look and see if you understand the problem that would be good. It might be worth trying this in the context of #3731, which is a fairly significant overhaul of how mixins are handled. I'm hoping that this will get merged in a week (or two at the latest). |
@mdboom - I took a few minutes to rebase my |
@mdboom @mhvk - I have something which supports using an arbitrary ndarray as a Mixin column and passes tests. See: master...taldcroft:table-recarray-columns This is a mash of the original implementation by @mdboom and mixins. Note that the fixes that @mdboom made related to pprint'ing the dtype should all be moved into So the question now is whether this is the right direction. I think so, but welcome discussion. This is a variation on another useful trick, namely embedding a Table as a column. Having the |
Your branch seems to make sense, and I think it clearly better than what's in this PR. |
@taldcroft - i like the idea too, and the power of But your PR does pose a question about direction, since in some way More specifically: I'm a bit confused about the new |
This was driven by the need for a I think that I'm not sure of the real benefit to making Having said all this, having a big picture that works toward convergence of Column and mixins is a good thing to make things more and more seamless. |
As a side note, it occurred to me that an object with a MixinInfo |
Am not completely sold on the names still, but it no big deal, and I like the noting that On the But, as said, what I really like is that we now can start using |
@mdboom - I'll take this over now with my implementation. We should leave this open until I get some new PR's ready. My current branch is a messy mix right now. |
Superceded and closed by #3925. |
In the interest of generality over in ASDF (see asdf-format/asdf-standard#83), I thought it would be worth supporting table columns where the data type is a record array. This is not the same as nested Tables, obviously, which would be quite a bit more work, but it seemed like only a couple of minor changes were necessary to have a Column dtype like "<i4,S1".