Skip to content

Avoid np.prod in make_shared_array#2621

Merged
alejoe91 merged 6 commits into
SpikeInterface:mainfrom
h-mayorquin:patch_numpy_error
Mar 24, 2024
Merged

Avoid np.prod in make_shared_array#2621
alejoe91 merged 6 commits into
SpikeInterface:mainfrom
h-mayorquin:patch_numpy_error

Conversation

@h-mayorquin
Copy link
Copy Markdown
Collaborator

@h-mayorquin h-mayorquin commented Mar 22, 2024

np.prod produces a numpy scalars (not integers or floats!).

Numpy scalars behave like numpy values and can overflow. Example:

In [1]: a_billion_int = 1_000_000_000

In [2]: np.prod((a_billion_int, a_billion_int, a_billion_int, a_billion_int))
Out[2]: -5527149226598858752

Python integers on the other hand do not overflow.

This PR changes this so overflow errors do not happen.

This error appear to a user here:
#1871 (comment)

@h-mayorquin h-mayorquin requested a review from alejoe91 March 22, 2024 18:34

dtype = np.dtype(dtype)
nbytes = int(np.prod(shape) * dtype.itemsize)
nbytes = prod(shape) * dtype.itemsize
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be wrapped as an int no?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to Windows failing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

itemsize is int, shape is int, that's unecessary. Am I wrong?

Copy link
Copy Markdown
Member

@zm711 zm711 Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I check they are coming up as ints, but based on the fact the tests are failing just for Windows and all you did was remove the int over the operation maybe something was being converted.... Not sure.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing that converts is the np.prod. because of an overflow. Let me show you:

In [1]: a_billion_int = 1_000_000_000

In [2]: np.prod((a_billion_int, a_billion_int, a_billion_int, a_billion_int))
Out[2]: -5527149226598858752

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the only thing that I can't think off is that fork (the multiprocessing mode) does not exist on windows which implies some re-instantiation and some of the attributes estimated instead of accessed... But I am out of luck.

Where else is a possible point of overflow?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows and linux numpy handle overflow differently. See this. Based on the discussion the behavior will likely be changed in Numpy 2.0, but in short

np.full((2, 2), np.iinfo(np.int32).max, dtype=np.int32).sum(axis=0)
shows similar behavior: array([4294967294, 4294967294]) on ubuntu and array([-2, -2]) on windows.

However, this function returns an array instead of a scalar, so we can inspect the dtype. On Ubuntu the dtype is int64, but on Windows the dtype is int32.

So maybe this was due to the difference in how certain cases of overflow occur.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch for sharing the issue!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So numpy 2.0 should align this.

Copy link
Copy Markdown
Member

@zm711 zm711 Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything that makes the debugging easier among the operating systems. Much less surface area if one fix works for all of them. :) (although the final message wasn't completely confident sounding...)

Comment thread src/spikeinterface/core/core_tools.py Outdated

dtype = np.dtype(dtype)
nbytes = int(np.prod(shape) * dtype.itemsize)
shape = (int(x) for x in shape) # We need to be sure that shape comes in int and not number scalars
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
shape = (int(x) for x in shape) # We need to be sure that shape comes in int and not number scalars
shape = tuple(int(x) for x in shape) # We need to be sure that shape comes in int and not numy scalars

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool you saw it :)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks, I realized it just now.

You know this generator notation has fucked me up a couple of times, check this out:

is_one_equal_to_0 = np.all((1 == 0 for _ in range(10)))
if is_one_equal_to_0:
    print("This is great, 1 is equal to 0")

Syntactic suggar is nice until it is not.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a cool example. The tuple/generator for () and (even sometimes I forget) {} for set in addition to dictionary. Those mess with me. But yeah the accident generator is way too often.

@alejoe91 alejoe91 merged commit 263c1ec into SpikeInterface:main Mar 24, 2024
@h-mayorquin h-mayorquin deleted the patch_numpy_error branch March 24, 2024 15:37
@samuelgarcia
Copy link
Copy Markdown
Member

bien joué!

alejoe91 added a commit to alejoe91/spikeinterface that referenced this pull request Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants