Skip to content

Issue 365 getitem none#366

Merged
abergeron merged 3 commits intoTheano:masterfrom
kohr-h:issue-365__getitem_none
Mar 2, 2017
Merged

Issue 365 getitem none#366
abergeron merged 3 commits intoTheano:masterfrom
kohr-h:issue-365__getitem_none

Conversation

@kohr-h
Copy link
Copy Markdown
Contributor

@kohr-h kohr-h commented Mar 2, 2017

I ran this manually plus the old getitem tests, works for me. You'll notice that I moved quite a bit of logic from __cgetitem__ to __getitem__ simply because I am faster when I can do this kind of stuff in Python. If you feel this is gonna cost too much efficiency (which I cannot say for sure, but probably it's no issue), you may want to push it back to C. Although that would likely mean all of it.

Closes #365

@kohr-h
Copy link
Copy Markdown
Contributor Author

kohr-h commented Mar 2, 2017

Another remark: I changed the behavior of indexing with lists and sequences of lists that contain only integers from "same as tuple" to "raise exception". The reason is that in Numpy, this type of indexing is fundamentally different from indexing with tuples (or sequences containing slices or Ellipsis):

>>> x = np.array([1, 2])
>>> x[[0, 1, 1, 0, 1]]
array([1, 2, 2, 1, 2])
x[[slice(None)]]  # same as x[:]
array([1, 2])

Comment thread pygpu/gpuarray.pyx Outdated
# Add remaining entries from sliced.shape if existing (happens
# for 1 index or less if ndim >= 2).
new_shape.extend(sliced.shape[i:])
return sliced.reshape(new_shape)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid the reshape if there where no None. This is a performance issue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, avoiding unnecessary reshaping now.

Comment thread pygpu/gpuarray.pyx
# objects, not counting None entries and the Ellipsis itself
num_slcs = self.ga.nd - (len(key) - key.count(None) - 1)
fill_slices = (slice(None),) * num_slcs
key = key[:ell_idx] + fill_slices + key[ell_idx + 1:]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the Ellipsis processing outside of __cgetitem__ breaks a[...] = 1 or similar things.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I wasn't expecting that it uses __cgetitem__. That needs to change then, I'll check.

Comment thread pygpu/gpuarray.pyx
key = key[:ell_idx] + fill_slices + key[ell_idx + 1:]

# Remove the None entries for indexing
getitem_idcs = tuple(k for k in key if k is not None)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need a similar thing in the __setitem__ code (yes, numpy apparently just ignores the None for that case).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now use the view returned in __getitem__ to do the same thing as before, so no need for new code.

Comment thread pygpu/gpuarray.pyx
# a[()], which simply is a view in Numpy.
if len(getitem_idcs) <= 1:
getitem_idcs = (getitem_idcs +
(slice(None),) * (self.ga.nd - len(getitem_idcs)))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a[0] = [1, 2, 3, 4] (where a is shape (3, 4)) still works after moving this here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works

Comment thread pygpu/gpuarray.pyx Outdated
# Slice into array, then reshape, accommodating for None entries in key
sliced = self.__cgetitem__(getitem_idcs)
new_shape = []
i = 0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add cdef unsigned int iat the top.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@abergeron
Copy link
Copy Markdown
Member

I'm ok with the behaviour change wrt indexing with lists. It's better to not behave slightly differently.

Comment thread pygpu/gpuarray.pyx Outdated

def __setitem__(self, idx, v):
cdef GpuArray tmp = self.__cgetitem__(idx)
cdef GpuArray tmp = self.__getitem__(idx)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't do that. The point of __cgetitem__ was to avoid python overhead here (which is significant).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, alright. Let me find another way.

@kohr-h
Copy link
Copy Markdown
Contributor Author

kohr-h commented Mar 2, 2017

I'm ok with the behaviour change wrt indexing with lists.

Actually, I saw that there's the take1 function which implements a part of that behavior. I think you can use this function to implement indexing with lists or sequences of lists, by converting the input to an array, then flatten it using index strides and feed the flat array into take1.
In Numpy I would do

def __getitem__(self, idx):
    # Assuming idx is a list or sequence of lists
    idx_arr = np.asarray(idx, dtype=int)
    if self.ndim == 1:
        if idx_arr.ndim == 2:
            # Given as single-element list of index list
            flat_idx_arr = idx_arr[0]
        else:
            flat_idx_arr = idx_arr
    else:
        strides = np.concatenate(np.cumprod(self.shape[1:]), [1])
        flat_idx_arr = np.sum(idx_arr * strides[:, None], axis=0)
    return self.take1(flat_idx_arr)

Is that easily doable?

@abergeron
Copy link
Copy Markdown
Member

That would be doable, yes. But let's keep this for another PR.

It might be better to write different kernels to handle the multi dimension advanced indexing though. That is somewhere on my todo list.

Also, I would like to add support for setting elements with a list or multiple lists, which cannot use the current code.

@kohr-h
Copy link
Copy Markdown
Contributor Author

kohr-h commented Mar 2, 2017

That would be doable, yes. But let's keep this for another PR.

It might be better to write different kernels to handle the multi dimension advanced indexing though. That is somewhere on my todo list.

Also, I would like to add support for setting elements with a list or multiple lists, which cannot use the current code.

Sure, I was just wondering since I stumbled upon the function. A poor man's version with index flattening would be quite nice, better than nothing. On the other hand, as you say, __setitem__ cannot use the current code since the __getitem__ does not return a view in that case, so users may be surprised if setting is not supported.

@kohr-h
Copy link
Copy Markdown
Contributor Author

kohr-h commented Mar 2, 2017

So, old __cgetitem__ code back in place, and __setitem__ uses it again. I added the same checks for fancy indexing attempts to make sure that both getters and setters behave the same way. Are there further bottlenecks to worry about?

@abergeron abergeron merged commit 56b2df4 into Theano:master Mar 2, 2017
@kohr-h kohr-h deleted the issue-365__getitem_none branch March 3, 2017 08:55
@nouiz
Copy link
Copy Markdown
Member

nouiz commented Mar 3, 2017

Before having this PR merged, I would have liked to get some profiling. @abergeron you did speed up with cgetitem of code that was slow down otherwise in the new back-end. Do you remember that use case? We could profile it again before and after this PR. There is much python code in getitem before cgetitem is called. This could slow down computation a lot.

@abergeron
Copy link
Copy Markdown
Member

First, it's already been merged.

Second, there is no python code. All the code is Cython and will get converted to C. The only issue is that if you call python methods that can't be converted you pay a performance penalty.

I'm reasonably confident that this isn't the case here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants