New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numpy Performance Improvements #2063

Open
kain88-de opened this Issue Sep 5, 2018 · 3 comments

Comments

Projects
None yet
2 participants
@kain88-de
Member

kain88-de commented Sep 5, 2018

I recently came across two small numpy tricks for better performance. I think we use similar constructs in a lot of places in the code.

Faster norm2
Given 2 vectors we tend to use np.sum(a * b) in most places. I found that np.einsum('i,i', a, b) is about twice as fast on a recent numpy version (1.14.5, conda-forge). For 2 random vectors with shape (3, )

np.sum(a,b): 3.3 micro-seconds
np.einsum('i,i', a, b): 1.5 micro-seconds

Faster Copying of Arrays
A search shows that we use a.copy() around 180 times in the code. Often we have already allocated an array with the appropriate dimensions and would like to ideally copy directly into that skipping the reallocation. This happens often when we update TimeStep objects.

To illustrate take the following simplified example.

# preallocate
a = np.empty(b.shape)
print(id(a))
# copy
a = b.copy()
print(id(a))

Here we preallocate a and copy b into it. In the second line python creates a new object that overwrites a and therefore reallocates. This can be seen by the changing id. This code therefore does 2 allocations. Instead look at

a = np.empty(b.shape)
print(id(a))
np.copyto(a, b)
print(id(a))

The id of a did not change and no extra allocation occured.

@richardjgowers

This comment has been minimized.

Member

richardjgowers commented Sep 5, 2018

@kain88-de your example with preallocation doesn't make sense in Python, I think you mean something like a[:] = b.copy().

WRT reallocation Timesteps, I'm not sure if it's actually well defined if a Reader creates a new Timestep or reuses an old one. This affects things like [ts.positions for ts in u.trajectory], if we reuse the Timestep object then this list is (unintuitively) identical.

@kain88-de

This comment has been minimized.

Member

kain88-de commented Sep 5, 2018

I think you mean something like a[:] = b.copy().

This will likely still allocate the copy and then update a inplace. The copyto call is still cheaper when benchmarking with timeit. Surprisingly a = b.copy() is faster then a[:] = b.copy() in my benchmarks.

In [2]: b = np.ones((100, 100000)) # ~70 MB in size

In [3]: a = np.empty(b.shape)

In [4]: %timeit a[:] = b.copy()
47.5 ms ± 933 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [5]: %timeit np.copyto(a, b)
10 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [6]: %timeit a = b.copy()
37.8 ms ± 516 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

WRT reallocation Timesteps

True we should consider this carefully.

@kain88-de

This comment has been minimized.

Member

kain88-de commented Sep 6, 2018

I looked a bit further in the code. I haven't seen any 100% obvious cases for copyto in my quick grep search. I did notice though we often use the pattern a = b[:]. This is interesting because it creates two distinct references to the same array, meaning changes in a appear in b. It's more obvious when using a=b.

In [17]: a = b
In [19]: id(a) == id(b)
Out[19]: True
In [20]: a = b[:]
In [21]: id(a) == id(b)
Out[21]: False
In [22]: b
Out[22]: 
array([[1., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.],
       ...,
       [1., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.]])
In [23]: a[0, 0] = 5
In [24]: b
Out[24]: 
array([[5., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.],
       ...,
       [1., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.],
       [1., 1., 1., ..., 1., 1., 1.]])

There is also a performance difference of factor 5. But this is in the nanosecond range. The overall effect should still be small

In [11]: %timeit a = b[:]
187 ns ± 6.66 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [12]: %timeit a = b
23.8 ns ± 2.59 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment