## Notes on numpy and optimization

A nice numpy tutorial can be found in Chapter 2 of this <a href="https://jakevdp.github.io/PythonDataScienceHandbook/">online book</a> and some basics are also covered in this <a href="https://github.com/HarvardOpenData/numpy-pandas-bootcamp/blob/master/numpy_pandas_tutorial.ipynb">short tutorial.</a>

Below are examples of the numpy functions I often use. 

In [1]:
# import numpy
import numpy as np

### Array initialization

In [11]:
N1, N2 = 100, 100
# initialize a vector of size N1 filled with zeroes
z1 = np.zeros(N1)
z2 =np.zeros_like(z1) # arreay with the same dimensions as o1 filled with 0.
# the same but filled with ones
o1 = np.ones(N1)
o2 = np.ones_like(o1) # arreay with the same dimensions as o1 filled with 1.0
# initialize a 2D array of size (N1, N2)
a1 = np.empty((N1,N2))
a2 = np.empty_like(a1)

In [12]:
# size and shape of the array
print(np.size(a2), np.shape(a2))
# print 10 elements of the array
print(a2[:10,0])

10000 (100, 100)
[9.06102694e-312 0.00000000e+000 0.00000000e+000 0.00000000e+000
 0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
 0.00000000e+000 0.00000000e+000]


In [2]:
# initialize an array of given dimensions filled with specified number
f = np.full((3, 3), 9.9999)
print(f)

[[9.9999 9.9999 9.9999]
 [9.9999 9.9999 9.9999]
 [9.9999 9.9999 9.9999]]


In [54]:
# initialize a2 array of 64-bit integer type
i2 = np.empty((N1,N2), dtype=np.int64)

In [3]:
# numpy vectors can be reshaped, which is often useful in initialization
r = np.arange(9).reshape(3,3)
print(r)

[[0 1 2]
 [3 4 5]
 [6 7 8]]


See more on numpy data types <a href="https://numpy.org/devdocs/user/basics.types.html">here</a>. Full list of array initialization routines can be found <a href="https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.array-creation.html">here.</a> 

### Efficient indexing (aka slicing) of numpy array and lists

See <a href="https://numpy.org/devdocs/user/basics.indexing.html">for a concise introduction</a>

In [9]:
x = np.arange(100)
print(x[0], x[-1]) # the first and last elements
print(x[:10]) # first 10 elements
print(x[10:]) # 10th and all elements after 
print(x[-10:]) # last 10 elements
print(x[::5]) # every 5th element
print(x[::-5]) # every 5th element starting from last down to the first
print(x[1:50:5]) # every 5th element starting from 2nd and until 50th

0 99
[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]
[90 91 92 93 94 95 96 97 98 99]
[ 0  5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95]
[99 94 89 84 79 74 69 64 59 54 49 44 39 34 29 24 19 14  9  4]
[ 1  6 11 16 21 26 31 36 41 46]


For multi-dimensional array, each individual dimension can be indexed like in the examples above. 

The following slicing operation is particularly useful in many situations, as it reverses the order of elements in the array. 

In [10]:
print(x[::-1])

[99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76
 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52
 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28
 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4
  3  2  1  0]


### Generate a sequence or grid of numbers

In [69]:
# generate a sequence of N equally spaced numbers from xmin to xmax
xmin, xmax = 0., 100.
N = 1000
xg = np.linspace(xmin, xmax, N+1)
print(xg[:10])
#generate numbers equally spaced on log scale
expg = np.linspace(0.,10, 10)
xg = 10.**expg
print(xg)
# this is equivalent to 
xg = np.logspace(0.,10,10)
print(xg)

# generate a sequence of integers from N1 to N2-1
ig = np.arange(10,100)
print(ig[:10])
# the same but generating numbers with increment of 2
ig = np.arange(10,100,2)
print(ig[:10])

[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]
[1.00000000e+00 1.29154967e+01 1.66810054e+02 2.15443469e+03
 2.78255940e+04 3.59381366e+05 4.64158883e+06 5.99484250e+07
 7.74263683e+08 1.00000000e+10]
[1.00000000e+00 1.29154967e+01 1.66810054e+02 2.15443469e+03
 2.78255940e+04 3.59381366e+05 4.64158883e+06 5.99484250e+07
 7.74263683e+08 1.00000000e+10]
[10 11 12 13 14 15 16 17 18 19]
[10 12 14 16 18 20 22 24 26 28]


In [66]:
# you can always check help info for a given function
print(help(np.arange))

Help on built-in function arange in module numpy.core.multiarray:

arange(...)
    arange([start,] stop[, step,], dtype=None)
    
    Return evenly spaced values within a given interval.
    
    Values are generated within the half-open interval ``[start, stop)``
    (in other words, the interval including `start` but excluding `stop`).
    For integer arguments the function is equivalent to the Python built-in
    `range <http://docs.python.org/lib/built-in-funcs.html>`_ function,
    but returns an ndarray rather than a list.
    
    When using a non-integer step, such as 0.1, the results will often not
    be consistent.  It is better to use ``linspace`` for these cases.
    
    Parameters
    ----------
    start : number, optional
        Start of interval.  The interval includes this value.  The default
        start value is 0.
    stop : number
        End of interval.  The interval does not include this value, except
        in some cases where `step` is not an integer and

### A note on array assignments and copying 

In [20]:
print(a[1][0])
b = a
print(b[1][0])
b[1][0] = 0.
print(b[1][0])
print(a[1][0])

0.6006913464593936
0.6006913464593936
0.0
0.0


When numpy array is copied in this way no copy is created, a pointer to memory location is passed on and b array simply points to the same memory location as a. As you can see modifying b above modified the same element in a. 

In [21]:
%timeit b = a
%timeit b = np.copy(a)

100000000 loops, best of 3: 19.5 ns per loop
10 loops, best of 3: 36.5 ms per loop


The second operation, which involves copying, is 1000 times more expensive! 
but now b and a are independent arrays. 

**Lesson:** pass pointer to array where appropriate for speed, but use explicit copy if you want a truly independent array.

In [23]:
b = np.copy(a)
b[1][0] = 1
print(b[1][0])
print(a[1][0])

1.0
0.0


### Checking all elements of array

In [4]:
# generate a vector of uniformly distributed pseudo-random numbers in the interval [0,1)
r = np.random.uniform(0.,1.0,size=10000)
# np.all checks whether *all* elements of an array conform to specified condition
print(np.all(r>0.5))
print(np.all(r<1.0))
#np.any instead evaluates whether any of the elements satisfy the condition
print(np.any(r>0.5))


False
True
True


In [5]:
# any array is an object (class) has many attributes, including functions such as min() and max()
print(r.min(), r.max())

4.980443971347448e-06 0.9999553105039661


In [6]:
#if you want to see all attributes, run array name with a question mark as below
r?

### Vectorizing a function

We can vectorize a function, that was not designed for vectorized execution using <a href="https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html">numpy.vectorize</a>

### Optimization issues 

In [5]:
import numpy as np

Ng = 100
a = np.random.rand(Ng, Ng, Ng); b = np.random.rand(Ng, Ng, Ng);

from timeit import default_timer

tstart = default_timer()
for i in range(Ng):
    for j in range(Ng):
        for k in range(Ng):
            a[i,j,k] = b[i,j,k]**(2./3)

print("explicit loop takes %.2f seconds"%(default_timer()-tstart))

# vector operation on numpy arrays
tstart = default_timer()

a = b**(2./3)
print("numpy vector operation takes %.2f seconds"%(default_timer()-tstart))


explicit loop takes 0.46 seconds
numpy vector operation takes 0.03 seconds


**Lesson:** use numpy instead of explicit loops whenever possible (unless loops have small iteration count and inexpensive), but be aware of some pitfalls. 

Here a few examples illustrating how seeminly similiar choices can affect the speed of your calculations. 

In [15]:
N = 10000000
%timeit a = np.ones(N); a *= 2

10 loops, best of 3: 40.7 ms per loop


In [12]:
%timeit a = np.ones(N); b = a * 2

10 loops, best of 3: 66.5 ms per loop


In [13]:
%timeit a = np.ones(N); a = a * 2

10 loops, best of 3: 67.2 ms per loop


The first calculation (a *= 2) is fastest. 

**Lesson:** calculate "in-place" (like in the first example) whenever possible. 

In [16]:
n, d = 100000, 100
# create a 2-dimension array of dimensions nxd and fill it with random numbers
a = np.random.random_sample((n, d));

# select every 10th array item
# 1st using direct indexing (array view)
# then using "fancy indexing" with a numpy function arange
a1 = a[::10]; a2 = a[np.arange(0, n, 10)]
print("Are a1 and a2 the same?") 
if np.array_equal(a1, a2): 
    print("yes!")
else:
    print("no!")
    
print("timing direct index slicing")
%timeit a[::10]
print("timing indirect index slicing")
%timeit a[np.arange(0, n, 10)]

Are a1 and a2 the same?
yes!
timing direct index slicing
The slowest run took 11.90 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 168 ns per loop
timing indirect index slicing
100 loops, best of 3: 4.09 ms per loop


Direct indexing performs selection operation in place without creating a copy of array. Hence, it is much faster than fancy indexing selection which does create a copy.

**Lesson:** use direct slicing like <tt>a[::10]</tt> whenever possible. 

The flatten and ravel methods of an array reshape it into a 1D vector (flattened array). The former method always returns a copy, whereas the latter returns a copy only if necessary. So ravel performs flattening "in place" in memory location of array without copying it elsewhere. 

In [18]:
%timeit a.flatten()

10 loops, best of 3: 37.6 ms per loop


In [19]:
%timeit a.ravel()

The slowest run took 29.80 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 164 ns per loop


**Lesson:** ravel is much faster than flatten. 

Just like with arrays in other languages scanning different indices of array incurs very different costs:

In [24]:
a = np.random.rand(5000, 5000)
%timeit a[0,:].sum()
%timeit a[:,0].sum()

The slowest run took 2207.87 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.12 µs per loop
The slowest run took 5.53 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 38.8 µs per loop


That's because scanning the first index is contiguous in memory and thus most data is brought into cache for fast access, while scanning the second is not. 

More details on numpy and numpy performance tricks can be found <a href="http://www.scipy-lectures.org/advanced/advanced_numpy/">here</a>.

In [25]:
from math import cos
import numpy as np

def func(t, y, **kwargs):
    return y*cos(t)

def func_exact(t, **kwargs):
    return np.exp(np.sin(t) - np.sin(kwargs["t0"]))



Solver routines below are adapted to solve a single ODE or a system of ODEs.
The input is can be a scalar or a vector
if y is a vector, function f should be able to handle a vector input and output. Keyword arguments can also be passed to f via kwargs dictionary. See <a href="https://pythontips.com/2013/08/04/args-and-kwargs-in-python-explained/"></a> if you are not familiar with this concept.  

In [26]:
# 1st order Runge-Kutta method (forward Euler) with constant step
# routine using list append function to accumulate solution and time vector
def rk1(f, y_start, t_start, t_end, dt, **kwargs):
    t = np.copy(t_start); y = np.copy(y_start); 
    tout = [t]; yout = [y]
    while t < t_end:
        y += dt * f(t, y, **kwargs)
        t += dt
        tout.append(t); yout.append(y)
    return tout, yout

# the same function but using vstack instead of append 
def rk1npstack(f, y_start, t_start, t_end, dt, **kwargs):
    t = np.copy(t_start); y = np.copy(y_start); 
    tout = np.copy(t_start); yout = np.copy(y_start); 
    while t < t_end:
        y += dt * f(t, y, **kwargs)
        t += dt
        np.vstack((tout,t)); np.vstack((yout,y))
    return tout, yout


In [28]:
from time import time

t0 = 0.; 
tf = 100.; dt = 0.0001

kwargs = {"t0": 0.} # argument for func_stiff
y0 = func_exact(t0, **kwargs); 

tpast = time()
t1, y1 = rk1(func, y0, t0, tf, dt, **kwargs)
texec_rk1 = time() - tpast
print("RK1 with list appends solved in %.4f s"%(texec_rk1))

tpast = time()
t1s, y1s = rk1npstack(func, y0, t0, tf, dt, **kwargs)
texec_rk1s = time() - tpast
print("RK1 with np stack functions solved in %.4f s"%(texec_rk1s))

RK1 with list appends solved in 5.3958 s
RK1 with np stack functions solved in 14.2069 s


Function using numpy arrays for everything is ~3 times slower. The culprit is np.vstack calls. vstack and hstack methods in numpy involve copying of arrays, while list method [].append just appends array to existing memory location without copying. 

**Lesson:** avoid using hstack and vstack operations in long sequences involving large arrays. Use lists and append instead. Convert list to numpy array in the end.