New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support join, hstack, vstack for Quantity #5841
Conversation
astropy/table/operations.py
Outdated
idx0 = idx1 | ||
|
||
# If col_name_map supplied as a dict input, then update. | ||
if isinstance(_col_name_map, collections.Mapping): | ||
_col_name_map.update(col_name_map) | ||
|
||
# Merge column and table metadata | ||
_merge_col_meta(out, arrays, col_name_map, metadata_conflicts=metadata_conflicts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't strictly need to be inside _vstack
yet, but eventually empty_like
will merge other meta attributes like description
, so it that will need to have metadata_conflicts
.
astropy/table/column.py
Outdated
@@ -88,6 +88,34 @@ def col_copy(col, copy_indices=True): | |||
return newcol | |||
|
|||
|
|||
def _merge_ndarray_like_cols(cols): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear exactly where this should live, but for now in column.py
works.
astropy/units/quantity.py
Outdated
def empty_like(cls, cols, **kwargs): | ||
from ..table.column import Column | ||
length = kwargs['length'] | ||
return cls(Column.empty_like(cols, length=length)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Column
like this is a quick hack.
The name |
astropy/table/operations.py
Outdated
right[right_name].take(right_out)) | ||
cols = [left[left_name], right[right_name]] | ||
col_cls = _get_out_class(cols) | ||
out[out_name] = col_cls.empty_like(cols, n_out, metadata_conflicts, out_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do: check for no empty_like
method and raise a helpful exception if not.
astropy/table/column.py
Outdated
@@ -88,6 +88,40 @@ def col_copy(col, copy_indices=True): | |||
return newcol | |||
|
|||
|
|||
def _merge_ndarray_like_cols(cols, metadata_conflicts, name, attrs): | |||
""" | |||
Empty like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do: useful docstring.
|
||
from . import _np_utils | ||
from .np_utils import fix_column_name, TableMergeError | ||
|
||
__all__ = ['join', 'hstack', 'vstack', 'unique'] | ||
|
||
|
||
def _merge_col_meta(out, tables, col_name_map, idx_left=0, idx_right=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now handled "locally" from within empty_like
.
if issubclass(obj.__class__, out_class): | ||
out_class = obj.__class__ | ||
|
||
if any(not issubclass(out_class, obj.__class__) for obj in objs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now adds an explicit check that all the input objects have consistent inheritance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
astropy/table/operations.py
Outdated
|
||
col_cls = _get_out_class(cols) | ||
try: | ||
out[out_name] = col_cls.empty_like(cols, n_rows, metadata_conflicts, out_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do: I think it is more clear to check for an empty_like
method here instead of pre-checking above.
@@ -673,10 +606,22 @@ def _join(left, right, keys=None, join_type='inner', | |||
|
|||
left_name, right_name = col_name_map[out_name] | |||
if left_name and right_name: # this is a key which comes from left and right | |||
out[out_name] = out.ColumnClass(length=n_out, name=out_name, dtype=dtype, shape=shape) | |||
out[out_name] = np.where(right_mask, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a latent bug: the previous statement (setting out[out_name] = out.ColumnClass(..)
was actually being entirely ignored because this statement now makes a completely new column instead of in-place assignment. Oops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, nice to have solved that!
This is ready for review. |
@taldcroft - my day job is catching up with me, so I had only a brief look so far. But enough to know that I like the idea of letting the classes create an empty instance of themselves very much! It seems an obviously superior approach over mine, as it should be relatively easy to extend to My main comment so far is that I think this is even better done on top of p.s. I think I also like the idea of having an |
@mhvk - thanks for the quick look, so I'll run with this and get it into final review shape. |
@mhvk - I think this is ready for another look. I suggest looking at the last 6 commits one by one to see what happened. I'm thinking next about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very nice indeed! It seem all I could come up with was utter nitpicks.
astropy/table/column.py
Outdated
@@ -89,6 +88,19 @@ def col_copy(col, copy_indices=True): | |||
|
|||
|
|||
class FalseArray(np.ndarray): | |||
""" | |||
Class to create a stub ``mask`` property which is a boolean array of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice for the docstring to have the usual title and then description, i.e.,
"""Always unmasked array for use in columns without a proper mask.
Description...
Same for other classes. Also, if I recall correctly, periods at end of descriptions of parameters actually matter...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (I think). Not sure what you mean about "periods at end of descriptions of parameters actually matter". I did remove one period in a parameter description so they are all consistent (with no period). I also noticed that sphinx takes the first full sentence as the method description in the summary table, so that was kind of working. But yes, I got a little sloppy.
astropy/table/operations.py
Outdated
`tables` is a list of at least one element and that they are all | ||
Table (subclass) instances. This doesn't handle complicated | ||
From a list of input objects ``objs`` get merged output object class. | ||
This is just taken as the deepest subclass. This doesn't handle complicated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, add empty line here to split title from description (though I guess it doesn't matter much for a private function)
if issubclass(obj.__class__, out_class): | ||
out_class = obj.__class__ | ||
|
||
if any(not issubclass(out_class, obj.__class__) for obj in objs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
@@ -250,10 +186,9 @@ def vstack(tables, join_type='outer', metadata_conflicts='warn'): | |||
return tables[0] # no point in stacking a single table | |||
col_name_map = OrderedDict() | |||
|
|||
out = _vstack(tables, join_type, col_name_map) | |||
out = _vstack(tables, join_type, col_name_map, metadata_conflicts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not important, but to me it has become a bit unobvious why there is this private _vstack
method -- I think this made more sense when that method dealt with just the recarray, but now that one passes in the tables, there is an argument for just sticking it in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #5843.
@@ -673,10 +606,22 @@ def _join(left, right, keys=None, join_type='inner', | |||
|
|||
left_name, right_name = col_name_map[out_name] | |||
if left_name and right_name: # this is a key which comes from left and right | |||
out[out_name] = out.ColumnClass(length=n_out, name=out_name, dtype=dtype, shape=shape) | |||
out[out_name] = np.where(right_mask, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, nice to have solved that!
astropy/units/quantity.py
Outdated
# Make an empty quantity filled with Nan using the unit of the last one. | ||
shape = (length,) + attrs.pop('shape') | ||
dtype = attrs.pop('dtype') | ||
data = np.empty(shape=shape, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed to fill it with nan
? Or are you setting up for having that mean masked? In any case, a short-cut is
data = np.full(shape, np.nan, dtype=dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a left-over from an idea of always returning a "masked" object. But in the end I didn't go there, so this is removed.
astropy/units/quantity.py
Outdated
@@ -156,6 +156,48 @@ def _construct_from_dict(self, map): | |||
value = map.pop('value') | |||
return self._parent_cls(value, **map) | |||
|
|||
def empty_like(self, cols, length, metadata_conflicts='warn', name=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very much like the concept, but am not quite sold on the name, as np.empty_like
works differently. Maybe shaped_like
? Or new_like
? (not a big deal...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to rename since different than np.empty_like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, so will go with new_like
.
return out | ||
|
||
def getattrs(col): | ||
return {attr: getattr(col.info, attr) for attr in attrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The double getattr
makes this one-liner a bit less pleasant. Though I can also see why you didn't spell it out...
def getattrs(col):
out = {}
for attr in attrs:
value = getattr(col.info, attr, None):
if values is not None:
out[attr] = value
return out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the clarity provided by the functional expression is worth the double getattr.
docs/table/mixin_columns.rst
Outdated
@@ -211,12 +211,13 @@ that contain mixin columns: | |||
* - :ref:`grouped-operations` | |||
- Not implemented yet, but no fundamental limitation | |||
* - :ref:`stack-vertically` | |||
- Not implemented yet, pending definition of generic concatenation protocol | |||
- Available for `~astropy.units.Quantity` and any other mixin classes that provide an | |||
`empty_like() method`_ in the ``info`` descriptor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either here or below (probably better below), it should also be noted that the class has to allow setting elements (though I guess it is somewhat obvious...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
t2 = self.t2 | ||
t4 = self.t4 | ||
|
||
# Key col 'a', should last value ('km') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment sentence incomplete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, missed this one but will let it slide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments addressed, thanks!
@@ -250,10 +186,9 @@ def vstack(tables, join_type='outer', metadata_conflicts='warn'): | |||
return tables[0] # no point in stacking a single table | |||
col_name_map = OrderedDict() | |||
|
|||
out = _vstack(tables, join_type, col_name_map) | |||
out = _vstack(tables, join_type, col_name_map, metadata_conflicts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #5843.
astropy/units/quantity.py
Outdated
# Make an empty quantity filled with Nan using the unit of the last one. | ||
shape = (length,) + attrs.pop('shape') | ||
dtype = attrs.pop('dtype') | ||
data = np.empty(shape=shape, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a left-over from an idea of always returning a "masked" object. But in the end I didn't go there, so this is removed.
astropy/units/quantity.py
Outdated
@@ -156,6 +156,48 @@ def _construct_from_dict(self, map): | |||
value = map.pop('value') | |||
return self._parent_cls(value, **map) | |||
|
|||
def empty_like(self, cols, length, metadata_conflicts='warn', name=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, so will go with new_like
.
docs/table/mixin_columns.rst
Outdated
@@ -211,12 +211,13 @@ that contain mixin columns: | |||
* - :ref:`grouped-operations` | |||
- Not implemented yet, but no fundamental limitation | |||
* - :ref:`stack-vertically` | |||
- Not implemented yet, pending definition of generic concatenation protocol | |||
- Available for `~astropy.units.Quantity` and any other mixin classes that provide an | |||
`empty_like() method`_ in the ``info`` descriptor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
astropy/table/column.py
Outdated
@@ -89,6 +88,19 @@ def col_copy(col, copy_indices=True): | |||
|
|||
|
|||
class FalseArray(np.ndarray): | |||
""" | |||
Class to create a stub ``mask`` property which is a boolean array of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (I think). Not sure what you mean about "periods at end of descriptions of parameters actually matter". I did remove one period in a parameter description so they are all consistent (with no period). I also noticed that sphinx takes the first full sentence as the method description in the summary table, so that was kind of working. But yes, I got a little sloppy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments addressed, thanks!
Not completely sure what's up with |
@mhvk - thanks for merging this! However, I noticed the following issue with this pull request:
Would it be possible to fix this? Thanks! If you believe the above to be incorrect (which I - @astrobot - very much doubt) you can ping @astrofrog |
Darn, did we really forget the change log?! Could you make a quick follow-up PR? Sorry for not noticing that... |
Ohh, good hack-day project to finally set up astrobot to check for it while the PR is open... |
Yes, it could be another test required for merging... |
This is a different approach to #5811 that does two things:
empty_like
mixin protocol method for more generally improving high-level operations support for mixin columns like Time and SkyCoord.For
vstack
and innerjoin
with Time and SkyCoord, what we need is analogousempty_like
methods which take a list of objects and make an empty container that is consistent with the inputs. Then of course we need to be able to set elements in the new empty container. Details TBD in a future PR.cc: @mhvk