# Numpy Performance Improvements #2063

Open
opened this Issue Sep 5, 2018 · 3 comments

Projects
None yet
2 participants
Member

### kain88-de commented Sep 5, 2018

 I recently came across two small numpy tricks for better performance. I think we use similar constructs in a lot of places in the code. Faster norm2 Given 2 vectors we tend to use `np.sum(a * b)` in most places. I found that `np.einsum('i,i', a, b)` is about twice as fast on a recent numpy version (1.14.5, conda-forge). For 2 random vectors with shape (3, ) `np.sum(a,b)`: 3.3 micro-seconds `np.einsum('i,i', a, b)`: 1.5 micro-seconds Faster Copying of Arrays A search shows that we use `a.copy()` around 180 times in the code. Often we have already allocated an array with the appropriate dimensions and would like to ideally copy directly into that skipping the reallocation. This happens often when we update `TimeStep` objects. To illustrate take the following simplified example. ```# preallocate a = np.empty(b.shape) print(id(a)) # copy a = b.copy() print(id(a))``` Here we preallocate a and copy b into it. In the second line python creates a new object that overwrites `a` and therefore reallocates. This can be seen by the changing id. This code therefore does 2 allocations. Instead look at ```a = np.empty(b.shape) print(id(a)) np.copyto(a, b) print(id(a))``` The id of `a` did not change and no extra allocation occured.

Member

### richardjgowers commented Sep 5, 2018

 @kain88-de your example with preallocation doesn't make sense in Python, I think you mean something like `a[:] = b.copy()`. WRT reallocation Timesteps, I'm not sure if it's actually well defined if a Reader creates a new Timestep or reuses an old one. This affects things like `[ts.positions for ts in u.trajectory]`, if we reuse the Timestep object then this list is (unintuitively) identical.
Member

### kain88-de commented Sep 5, 2018

 I think you mean something like a[:] = b.copy(). This will likely still allocate the copy and then update `a` inplace. The `copyto` call is still cheaper when benchmarking with timeit. Surprisingly `a = b.copy()` is faster then `a[:] = b.copy()` in my benchmarks. ```In [2]: b = np.ones((100, 100000)) # ~70 MB in size In [3]: a = np.empty(b.shape) In [4]: %timeit a[:] = b.copy() 47.5 ms ± 933 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [5]: %timeit np.copyto(a, b) 10 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [6]: %timeit a = b.copy() 37.8 ms ± 516 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)``` WRT reallocation Timesteps True we should consider this carefully.
Member

### kain88-de commented Sep 6, 2018

 I looked a bit further in the code. I haven't seen any 100% obvious cases for `copyto` in my quick grep search. I did notice though we often use the pattern `a = b[:]`. This is interesting because it creates two distinct references to the same array, meaning changes in `a` appear in `b`. It's more obvious when using `a=b`. ```In [17]: a = b In [19]: id(a) == id(b) Out[19]: True In [20]: a = b[:] In [21]: id(a) == id(b) Out[21]: False In [22]: b Out[22]: array([[1., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.], ..., [1., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.]]) In [23]: a[0, 0] = 5 In [24]: b Out[24]: array([[5., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.], ..., [1., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.]]) ``` There is also a performance difference of factor 5. But this is in the nanosecond range. The overall effect should still be small ```In [11]: %timeit a = b[:] 187 ns ± 6.66 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) In [12]: %timeit a = b 23.8 ns ± 2.59 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)```