Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support join, hstack, vstack for Quantity #5841

Merged
merged 20 commits into from May 9, 2017

Conversation

taldcroft
Copy link
Member

@taldcroft taldcroft commented Feb 24, 2017

This is a different approach to #5811 that does two things:

  • Specifically supports join, hstack, and vstack for Quantity mixin columns
  • Introduces the new empty_like mixin protocol method for more generally improving high-level operations support for mixin columns like Time and SkyCoord.

For vstack and inner join with Time and SkyCoord, what we need is analogous empty_like methods which take a list of objects and make an empty container that is consistent with the inputs. Then of course we need to be able to set elements in the new empty container. Details TBD in a future PR.

cc: @mhvk

idx0 = idx1

# If col_name_map supplied as a dict input, then update.
if isinstance(_col_name_map, collections.Mapping):
_col_name_map.update(col_name_map)

# Merge column and table metadata
_merge_col_meta(out, arrays, col_name_map, metadata_conflicts=metadata_conflicts)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't strictly need to be inside _vstack yet, but eventually empty_like will merge other meta attributes like description, so it that will need to have metadata_conflicts.

@@ -88,6 +88,34 @@ def col_copy(col, copy_indices=True):
return newcol


def _merge_ndarray_like_cols(cols):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear exactly where this should live, but for now in column.py works.

def empty_like(cls, cols, **kwargs):
from ..table.column import Column
length = kwargs['length']
return cls(Column.empty_like(cols, length=length))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Column like this is a quick hack.

@taldcroft taldcroft changed the title WIP: support vstack for Quantity Support join, hstack, vstack for Quantity Feb 26, 2017
@taldcroft taldcroft added this to the v2.0.0 milestone Feb 26, 2017
@taldcroft
Copy link
Member Author

@mhvk - I did a wholesale copy of the excellent test_operations.py and conftest.py changes you made for #5811. Sorry you won't get commit credit for all that work if this PR ends up being merged.

@taldcroft
Copy link
Member Author

The name empty_like is obviously intended to invoke some familiarity from the numpy method, but I can imagine that this could be confusing because it is a bit different. So I would be happy to change the name.

right[right_name].take(right_out))
cols = [left[left_name], right[right_name]]
col_cls = _get_out_class(cols)
out[out_name] = col_cls.empty_like(cols, n_out, metadata_conflicts, out_name)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do: check for no empty_like method and raise a helpful exception if not.

@@ -88,6 +88,40 @@ def col_copy(col, copy_indices=True):
return newcol


def _merge_ndarray_like_cols(cols, metadata_conflicts, name, attrs):
"""
Empty like
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do: useful docstring.


from . import _np_utils
from .np_utils import fix_column_name, TableMergeError

__all__ = ['join', 'hstack', 'vstack', 'unique']


def _merge_col_meta(out, tables, col_name_map, idx_left=0, idx_right=1,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now handled "locally" from within empty_like.

if issubclass(obj.__class__, out_class):
out_class = obj.__class__

if any(not issubclass(out_class, obj.__class__) for obj in objs):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now adds an explicit check that all the input objects have consistent inheritance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice


col_cls = _get_out_class(cols)
try:
out[out_name] = col_cls.empty_like(cols, n_rows, metadata_conflicts, out_name)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do: I think it is more clear to check for an empty_like method here instead of pre-checking above.

@@ -673,10 +606,22 @@ def _join(left, right, keys=None, join_type='inner',

left_name, right_name = col_name_map[out_name]
if left_name and right_name: # this is a key which comes from left and right
out[out_name] = out.ColumnClass(length=n_out, name=out_name, dtype=dtype, shape=shape)
out[out_name] = np.where(right_mask,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a latent bug: the previous statement (setting out[out_name] = out.ColumnClass(..) was actually being entirely ignored because this statement now makes a completely new column instead of in-place assignment. Oops.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, nice to have solved that!

@taldcroft
Copy link
Member Author

This is ready for review.

@mhvk
Copy link
Contributor

mhvk commented Feb 26, 2017

@taldcroft - my day job is catching up with me, so I had only a brief look so far. But enough to know that I like the idea of letting the classes create an empty instance of themselves very much! It seems an obviously superior approach over mine, as it should be relatively easy to extend to Time and SkyCoord, etc., and keeps the knowledge of how mixin columns behave where it should be, in the columns themselves.

My main comment so far is that I think this is even better done on top of info. In part simply because that keeps all table-related stuff together (and thus makes instructions on how to become table-compatible clearer), but also because it seems something quite similar is already being done by the _construct_from_dict method you added to info for the yaml work. If so, the _get_out_class should be adapted.

p.s. I think I also like the idea of having an empty_like best, but obviously, info could also gain a concatenate method; indeed, that would seem sufficiently generically useful, that I wouldn't mind having that on the classes itself... (Of course, Time and SkyCoord allow the simplest form of concatenation already via the constructor.) But one of course does not quite exclude the other; maybe start with working via info and we can always move it down.

@taldcroft
Copy link
Member Author

@mhvk - thanks for the quick look, so I'll run with this and get it into final review shape.

@taldcroft
Copy link
Member Author

@mhvk - I think this is ready for another look. I suggest looking at the last 6 commits one by one to see what happened.

I'm thinking next about Time - adding __setitem__ and TimeInfo.empty_like. More on that later.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very nice indeed! It seem all I could come up with was utter nitpicks.

@@ -89,6 +88,19 @@ def col_copy(col, copy_indices=True):


class FalseArray(np.ndarray):
"""
Class to create a stub ``mask`` property which is a boolean array of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice for the docstring to have the usual title and then description, i.e.,

"""Always unmasked array for use in columns without a proper mask.

Description...

Same for other classes. Also, if I recall correctly, periods at end of descriptions of parameters actually matter...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (I think). Not sure what you mean about "periods at end of descriptions of parameters actually matter". I did remove one period in a parameter description so they are all consistent (with no period). I also noticed that sphinx takes the first full sentence as the method description in the summary table, so that was kind of working. But yes, I got a little sloppy.

`tables` is a list of at least one element and that they are all
Table (subclass) instances. This doesn't handle complicated
From a list of input objects ``objs`` get merged output object class.
This is just taken as the deepest subclass. This doesn't handle complicated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, add empty line here to split title from description (though I guess it doesn't matter much for a private function)

if issubclass(obj.__class__, out_class):
out_class = obj.__class__

if any(not issubclass(out_class, obj.__class__) for obj in objs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@@ -250,10 +186,9 @@ def vstack(tables, join_type='outer', metadata_conflicts='warn'):
return tables[0] # no point in stacking a single table
col_name_map = OrderedDict()

out = _vstack(tables, join_type, col_name_map)
out = _vstack(tables, join_type, col_name_map, metadata_conflicts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not important, but to me it has become a bit unobvious why there is this private _vstack method -- I think this made more sense when that method dealt with just the recarray, but now that one passes in the tables, there is an argument for just sticking it in here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #5843.

@@ -673,10 +606,22 @@ def _join(left, right, keys=None, join_type='inner',

left_name, right_name = col_name_map[out_name]
if left_name and right_name: # this is a key which comes from left and right
out[out_name] = out.ColumnClass(length=n_out, name=out_name, dtype=dtype, shape=shape)
out[out_name] = np.where(right_mask,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, nice to have solved that!

# Make an empty quantity filled with Nan using the unit of the last one.
shape = (length,) + attrs.pop('shape')
dtype = attrs.pop('dtype')
data = np.empty(shape=shape, dtype=dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed to fill it with nan? Or are you setting up for having that mean masked? In any case, a short-cut is

data = np.full(shape, np.nan, dtype=dtype)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a left-over from an idea of always returning a "masked" object. But in the end I didn't go there, so this is removed.

@@ -156,6 +156,48 @@ def _construct_from_dict(self, map):
value = map.pop('value')
return self._parent_cls(value, **map)

def empty_like(self, cols, length, metadata_conflicts='warn', name=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much like the concept, but am not quite sold on the name, as np.empty_like works differently. Maybe shaped_like? Or new_like? (not a big deal...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to rename since different than np.empty_like

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, so will go with new_like.

return out

def getattrs(col):
return {attr: getattr(col.info, attr) for attr in attrs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double getattr makes this one-liner a bit less pleasant. Though I can also see why you didn't spell it out...

def getattrs(col):
    out = {}
    for attr in attrs:
       value = getattr(col.info, attr, None):
       if values is not None:
            out[attr] = value
    return out

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the clarity provided by the functional expression is worth the double getattr.

@@ -211,12 +211,13 @@ that contain mixin columns:
* - :ref:`grouped-operations`
- Not implemented yet, but no fundamental limitation
* - :ref:`stack-vertically`
- Not implemented yet, pending definition of generic concatenation protocol
- Available for `~astropy.units.Quantity` and any other mixin classes that provide an
`empty_like() method`_ in the ``info`` descriptor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either here or below (probably better below), it should also be noted that the class has to allow setting elements (though I guess it is somewhat obvious...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

t2 = self.t2
t4 = self.t4

# Key col 'a', should last value ('km')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment sentence incomplete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, missed this one but will let it slide.

Copy link
Member Author

@taldcroft taldcroft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments addressed, thanks!

@@ -250,10 +186,9 @@ def vstack(tables, join_type='outer', metadata_conflicts='warn'):
return tables[0] # no point in stacking a single table
col_name_map = OrderedDict()

out = _vstack(tables, join_type, col_name_map)
out = _vstack(tables, join_type, col_name_map, metadata_conflicts)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #5843.

# Make an empty quantity filled with Nan using the unit of the last one.
shape = (length,) + attrs.pop('shape')
dtype = attrs.pop('dtype')
data = np.empty(shape=shape, dtype=dtype)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a left-over from an idea of always returning a "masked" object. But in the end I didn't go there, so this is removed.

@@ -156,6 +156,48 @@ def _construct_from_dict(self, map):
value = map.pop('value')
return self._parent_cls(value, **map)

def empty_like(self, cols, length, metadata_conflicts='warn', name=None):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, so will go with new_like.

@@ -211,12 +211,13 @@ that contain mixin columns:
* - :ref:`grouped-operations`
- Not implemented yet, but no fundamental limitation
* - :ref:`stack-vertically`
- Not implemented yet, pending definition of generic concatenation protocol
- Available for `~astropy.units.Quantity` and any other mixin classes that provide an
`empty_like() method`_ in the ``info`` descriptor.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -89,6 +88,19 @@ def col_copy(col, copy_indices=True):


class FalseArray(np.ndarray):
"""
Class to create a stub ``mask`` property which is a boolean array of
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (I think). Not sure what you mean about "periods at end of descriptions of parameters actually matter". I did remove one period in a parameter description so they are all consistent (with no period). I also noticed that sphinx takes the first full sentence as the method description in the summary table, so that was kind of working. But yes, I got a little sloppy.

Copy link
Member Author

@taldcroft taldcroft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments addressed, thanks!

@mhvk
Copy link
Contributor

mhvk commented May 9, 2017

Not completely sure what's up with coverage but since this is about the best-tested part of astropy, I'm not going to worry about it. So, merging...

@mhvk mhvk merged commit 2c5964a into astropy:master May 9, 2017
@astrobot
Copy link

astrobot commented May 9, 2017

@mhvk - thanks for merging this! However, I noticed the following issue with this pull request:

  • Changelog entry not present (or pull request number missing) and neither the Affects-dev nor the no-changelog-entry-needed label are set

Would it be possible to fix this? Thanks!

If you believe the above to be incorrect (which I - @astrobot - very much doubt) you can ping @astrofrog

@mhvk
Copy link
Contributor

mhvk commented May 9, 2017

Darn, did we really forget the change log?! Could you make a quick follow-up PR? Sorry for not noticing that...

@bsipocz
Copy link
Member

bsipocz commented May 9, 2017

Ohh, good hack-day project to finally set up astrobot to check for it while the PR is open...

@mhvk
Copy link
Contributor

mhvk commented May 9, 2017

Yes, it could be another test required for merging...

taldcroft added a commit to taldcroft/astropy that referenced this pull request May 9, 2017
bsipocz added a commit that referenced this pull request May 9, 2017
taldcroft added a commit to taldcroft/astropy that referenced this pull request May 13, 2017
@taldcroft taldcroft deleted the table-mixin-ops branch October 1, 2019 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants