Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEP: Add proposal for oindex and vindex. #6256

Merged
merged 1 commit into from Jun 21, 2018
Merged

Conversation

seberg
Copy link
Member

@seberg seberg commented Aug 27, 2015

NOTE: A working (but not final) implementation of oindex can be found in PR #6075.

This is meant for discussion more then actually adding it at this point. I will wait a bit longer with inviting the mailing list (I think it is maybe better for the release and governance to settle), but if anyone has more interest, this may be easier to comment on, at least for the actual text. There are still spelling mistakes, etc. but I think it is likely complete enough for some comments for anyone more seriously interested. Or for discussions about some of the less clear points for that matter.

Direct link to rendered version: https://github.com/numpy/numpy/pull/6256/files?short_path=01e4dd9 (can also be reached by clicking on files changed -> Display rich diff icon top right)

@jaimefrio
Copy link
Member

Some thoughts...

  • The proposal seems to lead to a situation where the exact same behavior that has been the default for years is no longer available. Is this a good thing? We might as well add a "legacy" .lindex attribute to help with backwards compatibility.
  • I would favor as short a name for indexable attributes as possible, e.g. prefer .oix and .vix over the fully spelled out index, as this often come up in complicated expressions involving more than one array, and we don't want to make 79 character lines impossible to achieve.
  • I need to mature it a little more, but I think that single integer indices (and perhaps even single item 1D indexing arrays) should no longer be considered indexing arrays, so that something like arr.vindex[0, :, 1, :, [2, 3, 4]], with arr.shape = (4, 5, 6, 7, 8) would result in a (5, 7, 3) shape, which seems much less surprising than the current (and proposed) (3, 5, 7). This would actually make even more sense in the default indexer, as it will make outer and vectorized indexing produce identical results for many common use cases.
  • What benefits do you see in the "never transpose" rule? I can see some consistency benefits, although I am not sure they are of the right kind. ;-) And your life as the person that ends up coding thiswould be a little easier. But it seems like a "fancy indexing, only it surprises you in even simpler cases" approach. Would at least single indexing arrays be transposed into position, or something like arr[:, [2, 3, 5]] have shape (3, rows) rather than (rows, 3)?

@jaimefrio
Copy link
Member

Regardless of my objections, wonderful write-up Sebastian, thanks for putting this together. And you even managed to keep the word "subspace" out of the explanation! ;-)

@seberg
Copy link
Member Author

seberg commented Aug 27, 2015

  1. Yes, having a legacy indexer is likely a good idea, then we can support it indefinitely and leaving actual removal for later (could document it as "forr backwards" compatibility without warnings, maybe).
  2. I think the names are completely open, I will add the suggestions when I edit it next time (and other suggestions should they come in).
  3. Yes it is annoying, but for legacy indexing changing anything seems impossible to me. For some far future version of normal indexing I think enabling this should be possible (I thought I have this in open points, but it is probably written in a non-understandable way). Also see 4.
  4. My argument against this: Use oindex (and later maybe plain indexing for simple cases) in this case! It obviously does not do the transposing. So in my opinion, you can already decide the transpose behaviour by choosing between vindex/oindex and there is just no reason to ever choose vindex if you only have one array!

I am sure I can squeeze in subspace somewhere ;).

@seberg seberg force-pushed the indexing-nep branch 3 times, most recently from 9dd0aed to f7417c5 Compare August 30, 2015 20:58
@pitrou
Copy link
Member

pitrou commented Sep 2, 2015

Some quick comments from a Numpy near-beginner :

  1. Fancy indexing looks confusing but this PEP doesn't make it any more understandable to me. Some terminology is a bit weird ("transposed" doesn't seem to allude to matrix transposition, for instance; and why "vectorized"?).
  2. The Unix naming convention of choosing cryptic shorthands has proved very bad, so is vindex vs. oindex. This will only speak to the initiated.
  3. I don't understand why both vindex and oindex are needed. It seems the real benefit of the PEP would be to provide a "safe indexing" API that doesn't broadcast stuff silently, so a single .something additional API would be sufficient.
  4. More generally, adding "new ways to index Numpy arrays" as the title suggests doesn't promise to make things simpler for new users - especially when reading other people's code.

@seberg
Copy link
Member Author

seberg commented Sep 2, 2015

Yeah, I should call it, "making indexing understandable" ;) (kidding, but the title is really bad). Did you check the examples? Without them it is unfortunatly impossible to understand things unless you are extremely well versed in "fancy indexing". Even with them it might be (please tell me if it is, I would like a final version that is understandable...).

Naming is difficult, and no vectorized only makes half sense. Actually I think broadcasted indexing might be more to the point, maybe I should add/change that.
Short hands can be good and bad, in some sense I would defend relatively short name in that it is clear it is indexing and to actually understand the details, you will have to read the docs in any case :(. (I doubt you will find even long names descriptive enough, but if you do, I owe you a beer).

The reason to add both is, I hope explained. Fancy indexing which is much like the proposed "vindex" has random transposing which is confusing as hell, plus the boolean part of it is useless. About the transposing, note that numpy uses np.transpose not just for inverting dimensions, but also for any permutation of them (arr.transpose(-2, -3, -1) works).

@jaimefrio
Copy link
Member

See, I told you, you should have used "subspace", it makes things soooo much more understandable... ;-)

@njsmith
Copy link
Member

njsmith commented Sep 2, 2015

The way I think about it is:
.
oindex is intended to be a second chance at implementing how default
indexing should have worked (in retrospect). I think a good design goal is
--whether or not we actually do this-- to pretend that we are going to make
default indexing warn in all the cases where it differs from oindex
indexing, and that well maintained code will stop using those cases.
.
But, there are some advanced cases where you need the power of
indexing-like-it-was-a-ufunc (e.g. getting/setting the upper triangular
part of a matrix). So vindex is intended to be the expert tool for use in
the 1% of cases where oindex is insufficient, and to do this in a more
consistent and less confusing way than the current broadcasting. Also given
that these are the rare expert cases it's nice to have there be an explicit
marker that hey, something confusing is happening here.
.
Note that there are already duck arrays in the wild where default indexing
is oindex-like, so another reason why vindex is useful is that it makes it
explicit that you really want the vectorised behavior and will get an error
if this is not available.
On Sep 2, 2015 11:07 AM, "Jaime" notifications@github.com wrote:

See, I told you, you should have used "subspace", it makes things soooo
much more understandable... ;-)


Reply to this email directly or view it on GitHub
#6256 (comment).

@njsmith
Copy link
Member

njsmith commented Sep 2, 2015

@seberg: Also, I haven't had a chance to look at the NEP yet (sorry!), but
pursuant to our conversation at SciPy about how unclear it is what boolean
indexing should do, here's an example to ponder that feels like it ought
to work...

select all rows that contain no NaNs

arr[np.any(np.isnan(arr), axis=0, keepdims=True)]
On Sep 2, 2015 4:10 PM, "Nathaniel Smith" njs@vorpus.org wrote:

The way I think about it is:
.
oindex is intended to be a second chance at implementing how default
indexing should have worked (in retrospect). I think a good design goal is
--whether or not we actually do this-- to pretend that we are going to make
default indexing warn in all the cases where it differs from oindex
indexing, and that well maintained code will stop using those cases.
.
But, there are some advanced cases where you need the power of
indexing-like-it-was-a-ufunc (e.g. getting/setting the upper triangular
part of a matrix). So vindex is intended to be the expert tool for use in
the 1% of cases where oindex is insufficient, and to do this in a more
consistent and less confusing way than the current broadcasting. Also given
that these are the rare expert cases it's nice to have there be an explicit
marker that hey, something confusing is happening here.
.
Note that there are already duck arrays in the wild where default indexing
is oindex-like, so another reason why vindex is useful is that it makes it
explicit that you really want the vectorised behavior and will get an error
if this is not available.
On Sep 2, 2015 11:07 AM, "Jaime" notifications@github.com wrote:

See, I told you, you should have used "subspace", it makes things soooo
much more understandable... ;-)


Reply to this email directly or view it on GitHub
#6256 (comment).

@seberg
Copy link
Member Author

seberg commented Sep 3, 2015

@njsmith, yet another one of those thinngs.... Actually have not even thought about that one too much. The biggest problem is that only in 1.10 we started deprecating the weird behaviour it used to use (plus implementation wise, it requires some extra logic at another place, but oh well). One of those things we could in principle change for new indexing attributes though I guess (and then later after deprecation also for standard indexing).

@seberg
Copy link
Member Author

seberg commented Sep 3, 2015

Actually, rethinking what I thought about vindex+boolean back then. Maybe it does make sense to have fancy boolean indexing, have to think about it. On the other hand the whole combination buisness (unless oindex) is likely a corner caes of a corner case.

@shoyer
Copy link
Member

shoyer commented Sep 3, 2015

I agree, there is probably a case for vectorized boolean indexing -- but not for combined integer/boolean indexing or more than one boolean indexer.

@seberg
Copy link
Member Author

seberg commented Sep 26, 2015

@shoyer, was that a case for a boolean indexing? Something like a bindex as well ;)? Right now I have a tendency that the vectorized boolean is maybe really weird enough. For it to make sense all boolean indices have to have the exact some nonzero elements, and I am not sure knowing the number of nonzero elements is a common sight when you work with boolean arrays.

@njsmith, I understand your case and would not mind it to work. For the moment we are at least making it an error. Making it work right will be annoying with all the transposing logic (unless you make it "works only if there is no other index" which is may be the only place where it makes a lot of sense). Anyway, I think it might be ugly to get there unfortunately, and I would prefer to delay it for the time being since the time table for that change would be at best in two years.... (though maybe we could mention it)

For anyone interested, I added a silly "motivational example" which features an Anna trying to do data analysis ;). You can bash it all you like, honestly! Thought an example like this might make it possible to understand at least the basic ideas for someone who is not me.

@seberg
Copy link
Member Author

seberg commented Sep 26, 2015

OK, it is way too silly ;). I will reword it some time, but I think the examples itself are useful.

Again, however, the columns need to be handled more individually (but in
groups), and the ``oindex`` attribute works well::

>>> arr.oindex[bad_data, [2, 5]] = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the RHS should be 0.

@shoyer
Copy link
Member

shoyer commented Sep 26, 2015

@seberg

For the moment we are at least making it an error. Making it work right will be annoying with all the transposing logic (unless you make it "works only if there is no other index" which is may be the only place where it makes a lot of sense)

Agreed. I don't think this is an essential feature, so I'm fine leaving it out -- there are lots of other ways to do it.

converted to integer array indices and then broadcast), and hardly
useful.
There is no well defined broadcast for booleans, so that boolean
indices are logically always "``outer``" type indices.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't entirely agree with this. Boolean indexing with a >1d indexer that flattens (e.g., like nonzero, x[x > 0]) is not outer indexing, either, because it does not follow the same rules for preserving the size of dimensions. Unfortunately, this sort of indexing is very popular. I would still allow it without a warning in normal indexing, but not with either oindex or vindex

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean outer indexing with respect to what to do with the "new" dimension. Broadcast it to the rest like vindex or keep it where it was like oindex. They are outer in this sense (maybe should add the example):

arr[arr[:, :, 0, 0] > 1, arr[0, 0, :, :] < 1]

returning a 2D array, not 1D like fancy indexing. Or you mean we should just not allow that? To me at least in oindex it makes sense, vindex is what I am not sure of.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should allow it. It breaks the rule that outer indexing is always scalars or 1d arrays. I guess it is still technically one index per axis, but it's not at all what I would expect for outer indexing. In particular, whereas outer indexing with at most 1d arrays can be easily done with labeled array libraries like pandas and xray, this cannot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I never anticipated a limitation to 1d arrays for outer indexing (just not much a difference from my point of view)! Is it really so bad if numpy supports a bit more than others in this regard?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we want multidimensional outer indexing. It took me some
thinking to work out what it is, but it does turn out to be pretty simple I
think. It should be fine for labeled arrays too, I think? Each dimension
of the index contributes one dimension to the output, so tick labels at
least are fine. (Axis names of course are another matter -- maybe call them
"{}-{}".format(indexed_axis_name, indexer_axis_name)?) Among other things,
arr[multidimlookup] is a common and non-confusing usage, so according to
the rule that we'd like to be able to deprecate places where getitem and
oindex disagree, we should support it in oindex...
On Sep 27, 2015 2:53 AM, "seberg" notifications@github.com wrote:

In doc/neps/indexing.rst
#6256 (comment):

+1. Most new users will usually expect outer indexing (consistent with

  • slicing). This is also the most common way of handling this in other
  • packages or languages.
    +2. The axes introduced by the array indices are at the front, unless
  • all array indices are consecutive, in which case one can deduce where
  • the user "expects" them to be:
  • * arr[:, [0, 1], :, [0, 1]] will have the first dimension shaped 2.
  • * arr[:, [0, 1], [0, 1]] will have the second dimension shaped 2.

+3. When a boolean array index is mixed with another boolean or integer

  • array, the result is very hard to understand (the boolean array is
  • converted to integer array indices and then broadcast), and hardly
  • useful.
  • There is no well defined broadcast for booleans, so that boolean
  • indices are logically always "outer" type indices.

Oh, I never anticipated a limitation to 1d arrays for outer indexing (just
not much a difference from my point of view)! Is it really so bad if numpy
supports a bit more than others in this regard?


Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/pull/6256/files#r40503106.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tick labels resulting from indexing are actually taken from the array being indexed rather than the array doing the indexing in xray and pandas. So indexing along multiple dimensions at once means that we need to collapse to something like a multi index / tuples. Likewise, it's really not obvious what to name the new axis name (I can think of several alternatives to your suggestion).

This doesn't mean we can't allow multidimensional indexing with oindex for numpy, but it is something unlikely to ever be supported by xray and pandas (or a hypothetical future version of numpy arrays with axis labels, for that matter). Of course, it's quite straightforward for xray/pandas to simply raise an error in these situations. We already do something similar when we get np.newaxis, which definitely should be OK to use with oindex.

Does it ever make sense to do multidimensional indexing with anything other than a Boolean array? If not, we should say so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ever? Definitely. Though it might be more from the use case where you want to use the indexing labels for the dimensions more likely than those being indexed.

I.e. the old example look_up_table[image].

@shoyer
Copy link
Member

shoyer commented Sep 27, 2015

@jaimefrio

What benefits do you see in the "never transpose" rule? I can see some consistency benefits, although I am not sure they are of the right kind. ;-)

Another benefit is that it makes implementation of vectorized indexing also much simpler for authors of duck array types. I think this example from dask.array is pretty compelling: https://github.com/blaze/dask/pull/439/files#diff-0dc21516366e88095fc3dbf2e9e62f9aR2358

@seberg
Copy link
Member Author

seberg commented Sep 27, 2015

Well, we could also provide code do to the transpose for you possibly. The question is, what is less surprising ;).
I am sure transposing will confuse users. But there is the problem of scalars toggling the transposing, which seemed even more confusing and which cannot really be avoided since scalars and 0d arrays are too close. I think it is mostly a tradeoff to avoid this:

arr[0, :, ind]

which I think has to transpose also because:

arr[n_dim_array, :, ind]

transposes for any N-Dimensional array including 0D and scalars and 0D are pretty much identical.

@seberg
Copy link
Member Author

seberg commented Oct 13, 2015

I think I will just post this to the list in a day or two (possibly closing this and opening a new issue). If anyone else has time to see if it is not way too hard to understand (which I am sure it is at least to some degree), please tell me :).

@seberg
Copy link
Member Author

seberg commented Nov 24, 2015

Considering the enthusiastic response on the mailing list ;) (I know, nobody cares for anything that sounds too preliminary...), @njsmith when you have some spare cycles, mind looking over this? You are much better at writing this kind of stuff and maybe I can bring it into a state where people actually comment on ;).
On the other hand, I probably just have to expand a lot of the text, so merely pointing that out is fine as well. Unfortunately, I think it will be almost christmas before I find some serious spare cycles.

@seberg
Copy link
Member Author

seberg commented Nov 29, 2015

Note that the current state of gh-6075 should (hopefully bug free) implement the NEP with the exception of deprecating non-obvious advanced indexing from plain indexing. The examples should all run fine.

@seberg
Copy link
Member Author

seberg commented Dec 4, 2015

OK, going to be a bit pushy, I know you guys probably follow things and have no time ;). but it really is now in a state where feedback and discussion is needed to do anything more on it. Of course I could refactor, but really the only huge pile of work would be the changed necessary because of deprecations and I am not even going to think about doing anything on it until we know relatively clear what the path is.

So, I would like to send this out again soon, or lets say before the end of the year, possibly to more than just the numpy list. If you got suggestions where to send it (scipy, xray?, dask?), otherwise helping to polish the NEP up would be appreciated, too. I am sure it can use it, but I am not about to do any larger changes.

@ahaldane
Copy link
Member

ahaldane commented Dec 6, 2015

I think this PR is subject to a reverse-bikeshedding effect: It's hard to evaluate such an important set of changes!

I was concerned about multidimensional integer arrays as indices, since that's my main use of fancy indexing. In summary, I think it all makes sense, though it took me a little bit to figure out what oindex with multidim indices does.

In case my examples are useful to anyone, here's what I tried out:

Here's my most common use case of fancy indexing at work: I have a 2d array, which I index with 2 2d int-index arrays (one for each axis), producing a new 2d array with the shape of the index arrays. (Basically a 2d lookup table).

>>> eps = np.random.rand(30,4)
>>> seq = np.random.randint(0,4,size=(100,30))
>>> eps.vindex[np.arange(30), seq].shape    # this is old behavior
(100, 30)
>>> eps.oindex[np.arange(30), seq].shape    # a funny new array
(30, 100, 30)

It took me some thought but I get what is going on now in the oindex case, though it's not useful to me here.

Second, I would guess a big use of fancy indexing is in lookup tables in image processing. Oindex does what is expected for a simple lut (this is with @seberg's PR):

>>> lut = np.random.rand(10,3)
>>> img = np.random.randint(0,10,size=(256,256))
>>> lut[img].shape
(256, 256, 3)

Overall this I like the way this looks, although it is going to be a little annoying (but perhaps clearer) to add vindex throughout my code!

@shoyer
Copy link
Member

shoyer commented Dec 6, 2015

So, I would like to send this out again soon, or lets say before the end of the year, possibly to more than just the numpy list. If you got suggestions where to send it (scipy, xray?, dask?), otherwise helping to polish the NEP up would be appreciated, too.

I can speak for xray and probably dask as well on this point (though @mrocklin may also have thoughts to share). I would suggest posting to the scipy-users and pydata lists, though these do have a lot of overlap with numpy-discussion. I can also tweet about this stuff :).

In my opinion, these changes will be extremely useful for xray, dask and other projects with similar goals (e.g., pandas):

  1. Outer indexing is often easier to implement then full vectorized indexing. If using vindex becomes the norm for vectorized indexing, then it will be much more obvious to users that we don't support it.
  2. Internally, xray could use oindex for numpy arrays, instead of our slower reimplementation.
  3. We could also test for the existence of these attributes on custom array types passed in to xray objects, so we could handle them properly without needing to coerce them. Basically, library code could always use the attributes to avoid ambiguity.

And of course there are all the advantages these would bring to users even if they only use numpy.

To me, these changes seem like a clear win with little downside. I'm happy with the proposed functionality.

My only thought is that perhaps we should use a longer, more explicit name for the legacy indexing attribute, e.g., legacy_index. Such longer names are much more self-evident, and explicit legacy indexing should be rare enough that this should not be an undue burden.

@seberg seberg force-pushed the indexing-nep branch 7 times, most recently from ab124c7 to fd09c61 Compare June 20, 2018 17:19
@shoyer
Copy link
Member

shoyer commented Jun 20, 2018

I wonder if we can cleanup/simplify boolean indexing as well. I would propose:

  • vindex prohibits booleans, only allowing integers
  • oindex allows 1d booleans -- these are not ambiguous
  • normal indexing [] warns/errors on any boolean index if (1) there is more than one boolean index or (2) that boolean index does not have the shape of the original array

I don't think we need a separate "boolean index", because a full boolean array is equivalent to oindex in the 1d case and prohibited by vindex.

@seberg
Copy link
Member Author

seberg commented Jun 20, 2018

@shoyer, yeah, I think you are right. I was trying to be unnecessarily general back then. It is a good idea to start of with a more minimal version. Making it allow more at some point would still be possible in any case...
Want to just change it and add yourself (and whoever else) as author?

just work. Subclasses that *do* provide it must be updated accordingly
and should preferably not subclass working versions of these attributes.

All subclasses will inherit the attributes, however, it seems possible
Copy link
Contributor

@mhvk mhvk Jun 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My tendency would be to require that subclasses override the new methods if they need to; I don't like the extra code.

But one possible solution to ensure things "just break" might be to go through the __getitem__ call with an extra keyword set, say index_method='outer|vector|fancy| - if a subclass cannot take this extra keyword, it will immediately fail. (Obviously, this keyword needs to have a default of None that can then change from meaning fancy to outer).

Indeed, a keyword argument would be super-nice if PEP 472 were accepted...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, the that was hanging me was that getting masked arrays right was damn annoying, plus we also need to have a way for things like memmap. I am not sure why a kwarg is super nice if that PEP is accepted, we get a mix of different meanings for kwargs, so trying to not use kwargs because of the PEP might be a point?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, on closer reading of the PEP, my suggestion would require the kwarg strategy; https://www.python.org/dev/peps/pep-0472/#strategy-kwargs-argument

Now it doesn't look like the PEP is actually going anywhere...

Anyway, my overall sense is that this should not hold up the NEP, or, indeed, the implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could potentially call self.__getitem__(indices, index_method='outer') directly. We just can't write self[indices, index_method='outer'].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is indeed what I had in mind, and I thought the PEP might eventually make that possible as well, but it turns out that is true only for one of its options. Anyway, at some level that is a separate concern. The main advantage of the index_method approach is that subclasses would fail automatically (and that one could change the default at some point).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misread the PEP, my gut feeling is, if arr[indices, index_method='outer'] would work, that would not be nice. Unless you want to use it instead of the arr.oindex[indices] syntax maybe. Simply because index_method='outer' could also mean get me "outer" from the axis "index_method"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would preclude having an axis called index_method!

Anyway, right now we cannot use keywords inside [...] at all, so that part is not so relevant. The main question is whether it is worth solving the subclass issue by having __getitem__ acquire an extra argument that we use internally (and oindex and vindex just being shortcuts to it).

I should add that I don't really have an opinion one way or another; I do think it is fine to expect subclasses to pay attention and adjust to new ndarray features (we just have to try our best not to break existing features).

(2, 5, 8)
>>> arr.vindex[:, [0], :, [0, 1]].shape
(2, 5, 7)
>>> arr.vindex[:, [0], 0, :].shape
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I echo @jaimefrio's comment from way back that having [0] remove a dimension and 0 keep it is extremely confusing. But it may just mean I still do not understand fancy indexing...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I am a bit confused what you mean :). Any index "removes" the dimension in some sense (for vindex, oindex in some sense replaces it), the question is just where to put the new dims. Jaime had a comment suggesting to maybe keep the axis swapping in a less confusing manner then it used to exist.
Right now, I disagree with this. vindex should be as consistent as possible, there is nothing special enough about [0] compared to [0, 1]. There might be a "less confusing then currently fancy indexing like transposing" thing, but I personally believe you can use oindex then, it already does what you want, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't understand why here a leading 1 appears. But as said, my understanding of fancy indexing is quite hopeless. I do agree with the always-transpose-to-the-front rule.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the thing is, there is a 1 sized dimension created by the [1] so it has to go somewhere? And in vindex that somewhere is the front?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is logical, but then why is it not happening in the examples right above? Or are those mistakes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, its vindex, not oindex. All indexes (except slices I guess) are broadcast together for the result (just like legacy fancy indexing) and iterated together. So the extra dimension at the start has size 2 because it has the size of [0] and [0, 1] broadcast together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A, see, I knew I was missing something. Though this is particularly silly. Sorry for the noise.

@mhvk
Copy link
Contributor

mhvk commented Jun 20, 2018

p.s. I wish we could just change __getitem__ to behave like oindex...

@seberg
Copy link
Member Author

seberg commented Jun 20, 2018

I dunno if that is in there, but I think we can add that as a Further steps thing or so. PS: I will not do anything here implementation wise at least for a few months, was just procrastinating...

@mattip
Copy link
Member

mattip commented Jun 20, 2018

Not sure why the build fails to find cython, but probably should be rebased off master anyway. Would be nice to merge this so it doesn't suffer bitrot

@seberg
Copy link
Member Author

seberg commented Jun 20, 2018

Rebased on master, not that I think it matters, probably just some fluke. I did not yet go through all of it to see it suffered bitrot anyway, and Stephan wants to change a few things? But could also squash it, we just merge it and then he/anyone else can update it too.

@shoyer
Copy link
Member

shoyer commented Jun 20, 2018 via email

@eric-wieser eric-wieser changed the title WIP,NEP: Add proposel for oindex and vindex. WIP,NEP: Add proposal for oindex and vindex. Jun 20, 2018
@mattip
Copy link
Member

mattip commented Jun 20, 2018

@shoyer any reason not to merge this as a draft?

@shoyer
Copy link
Member

shoyer commented Jun 20, 2018

Indeed, let's merge this. I'll revise it in a follow-on PR.

@shoyer
Copy link
Member

shoyer commented Jun 20, 2018

Want to just change it and add yourself (and whoever else) as author?

👍

@mattip
Copy link
Member

mattip commented Jun 20, 2018

squashed commits

@mattip mattip merged commit 4bca094 into numpy:master Jun 21, 2018
@mattip mattip changed the title WIP,NEP: Add proposal for oindex and vindex. NEP: Add proposal for oindex and vindex. Jun 21, 2018
@mattip
Copy link
Member

mattip commented Jun 21, 2018

Thanks @shoyer, @seberg

@seberg seberg deleted the indexing-nep branch June 21, 2018 14:20
@shoyer
Copy link
Member

shoyer commented Jun 24, 2018

See #11414 for my follow-on edits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet