This chapter explains the basic anatomy of NumPy arrays, especially regarding the memory layout, view, copy and the data type. They are critical notions to understand if you want your computation to benefit from NumPy philosophy.

Let’s consider a simple example where we want to clear all the values from an array which has the data type np.float32. How does one write it to **maximize speed**? The below syntax is rather obvious (at least for those familiar with NumPy) but the above question asks to find the fastest operation.

In [1]:
import numpy as np
Z = np.ones(4*1000000, np.float32) #create an array of ones of size 4 *1000000
print(Z)
Z[...] = 0 #clear the array,sets every value to 0
print(Z)
print(Z.dtype)#prints the datatype of Z

[1. 1. 1. ... 1. 1. 1.]
[0. 0. 0. ... 0. 0. 0.]
float32


If you look more closely at both the <font color='red'>dtype</font> and the size of the array, you can observe that this array can be casted (i.e. viewed) into many other “compatible” data types. By compatible, I mean that <font color='red'>Z.size * Z.itemsize</font> can be divided by the new dtype <font color='red'>itemsize.</font>

In [3]:
import numpy as np
from tools import timeit #get timeit from tools.py(custom module)
Z = np.ones(4*1000000, np.float32) #create an array of size 4*10000000 np.float32

print("np.float16:")
#time required to view array as np.float16
timeit("Z.view(np.float16)[...] = 0", globals())

print("np.int16:")
#time required to view array as np.int16
timeit("Z.view(np.int16)[...] = 0", globals())

print("np.int32:")
#time required to view array as np.int32
timeit("Z.view(np.int32)[...] = 0", globals())

print("np.float32:")
#time required to view array as np.float32
timeit("Z.view(np.float32)[...] = 0", globals())

print("np.int64:")
#time required to view array as np.int64
timeit("Z.view(np.int64)[...] = 0", globals())

print("np.float64:")
#time required to view array as np.float64
timeit("Z.view(np.float64)[...] = 0", globals())

print("np.complex128:")
#time required to view array as np.complex128
timeit("Z.view(np.complex128)[...] = 0", globals())

print("np.int8:")
#time required to view array as np.int8
timeit("Z.view(np.int8)[...] = 0", globals())

print("np.float16:")
#time required to view array as np.float16
timeit("Z.view(np.float16)[...] = 0", globals())


print("np.int16:")
#time required to view array as np.int16
timeit("Z.view(np.int16)[...] = 0", globals())

print("np.int32:")
#time required to view array as np.int32
timeit("Z.view(np.int32)[...] = 0", globals())

print("np.float32:")
#time required to view array as np.float32
timeit("Z.view(np.float32)[...] = 0", globals())

print("np.int64:")
#time required to view array as np.int64
timeit("Z.view(np.int64)[...] = 0", globals())

print("np.float64:")
#time required to view array as np.float64
timeit("Z.view(np.float64)[...] = 0", globals())

print("np.complex128:")
#time required to view array as np.complex128
timeit("Z.view(np.complex128)[...] = 0", globals())

print("np.int8:")
#time required to view array as np.int8
timeit("Z.view(np.int8)[...] = 0", globals())

np.float16:
100 loops, best of 3: 483 usec per loop
np.int16:
100 loops, best of 3: 476 usec per loop
np.int32:
100 loops, best of 3: 480 usec per loop
np.float32:
100 loops, best of 3: 480 usec per loop
np.int64:
100 loops, best of 3: 478 usec per loop
np.float64:
100 loops, best of 3: 476 usec per loop
np.complex128:
100 loops, best of 3: 519 usec per loop
np.int8:
1000 loops, best of 3: 244 usec per loop
np.float16:
100 loops, best of 3: 491 usec per loop
np.int16:
100 loops, best of 3: 493 usec per loop
np.int32:
100 loops, best of 3: 477 usec per loop
np.float32:
100 loops, best of 3: 478 usec per loop
np.int64:
100 loops, best of 3: 467 usec per loop
np.float64:
100 loops, best of 3: 463 usec per loop
np.complex128:
100 loops, best of 3: 530 usec per loop
np.int8:
1000 loops, best of 3: 245 usec per loop


Here timeit is a custom function used. Interestingly enough, the obvious way of clearing all the values is not the fastest. The total number of CPU cycle to execute each above instruction are 100 but the two instruction take less time per loop. By casting the array into a larger data type such as <font color='red'>np.float64</font>, we gained a 25% speed factor. But, by viewing the array as a byte array <font color='red'>(np.int8)</font>, we gained a 50% factor. The reason for such speedup is to be found in the internal NumPy machinery and the compiler optimization.

Q. How can you increase the speed factor for clearing data from an array(setting all values in an array to 0)? <br>

A. timeit("Z.view(np.float64)[...] = 0", globals())

In [4]:
timeit("Z.view(np.float64)[...] = 0", globals())

100 loops, best of 3: 489 usec per loop


In [5]:
timeit("Z.view(np.float16)[...] = 0", globals())

100 loops, best of 3: 488 usec per loop


In [6]:
timeit("Z.view(np.int8)[...] = 0", globals())

100 loops, best of 3: 253 usec per loop
