# Demonstration of NumPy

NumPy is a well known library for doing numeric computation with Python. The library itself is written in C and is vastly faster than native Python code. However, it has a convenient Python interface that makes doing advanced numeric processing easy (well, easier). Some of the code snippets below have been taken from *Learning Python by Building Data Science Applications* by Philipp Kats and David Katz.

First, let's look at a pythonic way of doing a large-ish numeric computation:

In [2]:
# A, B, and C are three long lists (arrays) of integers, each containing 5000 elements.
A, B, C = [1, 2, 3, 4, 5] * 1000, [2, 3, 4, 5, 6] * 1000, [3, 4, 5, 6, 7] * 1000

# The value referenced by zipped is an list (array) of tuples (triples), e. g., [(1, 2, 3), (2, 3, 4), etc...].
# Note that zipped is a "zip object" and not an actual list. However, it can be iterated.
zipped = zip(A, B, C)

# Now let us create a list (array) of integers by summing each tupl, e. g., [6, 9, etc...]
%timeit result = [sum(row) for row in zip(A, B, C)]  # But... try this using zipped instead!


1.6 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Now let us try this with NumPy

In [3]:
import numpy as np
Anp, Bnp, Cnp = np.array(A), np.array(B), np.array(C)

%timeit resultnp = Anp + Bnp + Cnp

16 µs ± 2.31 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [9]:
1620.0 / 17.4

93.10344827586208

So it appears that NumPy is over 90 times faster than raw Python. However, the reality is more complicated than that. Most of the time spent by the Python code is in the zip operation that combines the three lists into one. The timing above includes that operation in the measurement. The NumPy code skips that operation entirely because it adds the array elements "in place" without creating a temporary zipped list (filled with tuples) first. The NumPy code is more "primitive" and dispenses with the overhead of creating a large number of Python tuple objects.

In fact, the Python code can be hugely improved by simply computing the zipped list once. The the list comprehension that is timed just adds the values in the tuples of that list as it creates a new list. This is actually considerably faster than the NumPy version. However, be aware that the measured timing includes an artificial loop. In a real program that does these things just once, the NumPy version would likely be better again.

Let's look at some other things NumPy can do...

In [12]:
m1 = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print("Dimensions  : {}\nShape       : {}\nElement Type: {}\nSize        : {}\nItem Size   : {}".format(
     m1.ndim, m1.shape, m1.dtype, m1.size, m1.itemsize))
m1

Dimensions  : 2
Shape       : (2, 5)
Element Type: int32
Size        : 10
Item Size   : 4


array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

The above shows a basic two-dimensional array of integers, along with some of the attributes all arrays support. Since arrays in NumPy must have elements that are all the same type, what happens when we try to create an array with heterogenous types? First, mixing integers and floating point values does the expected thing:

In [14]:
m2 = np.array([[1.0, 2.0, 3.0, 4.0, 5.0], [6, 7, 8, 9, 10]])
print("Dimensions  : {}\nShape       : {}\nElement Type: {}\nSize        : {}\nItem Size   : {}".format(
     m2.ndim, m2.shape, m2.dtype, m2.size, m2.itemsize))
m2

Dimensions  : 2
Shape       : (2, 5)
Element Type: float64
Size        : 10
Item Size   : 8


array([[ 1.,  2.,  3.,  4.,  5.],
       [ 6.,  7.,  8.,  9., 10.]])

As you can see, the values were all "up converted" to floats. Let's make a helper function for printing array information. That might be useful for future convenience.

In [16]:
def print_array_info(m):
    print("Dimensions  : {}\nShape       : {}\nElement Type: {}\nSize        : {}\nItem Size   : {}".format(
         m.ndim, m.shape, m.dtype, m.size, m.itemsize))


Now let's try mixing more "interesting" combination of types.

In [19]:
m3 = np.array([[1, 2, "Hello"], [3.14, 2.78, "World"]])
print_array_info(m3)
m3

Dimensions  : 2
Shape       : (2, 3)
Element Type: <U32
Size        : 6
Item Size   : 128


array([['1', '2', 'Hello'],
       ['3.14', '2.78', 'World']], dtype='<U32')

Everything is converted to a string (UTF-32 format) as a kind of "universal" data type. It gets even more interesting when you try to include more complex object types in the arrays:

In [20]:
m4 = np.array([[1, 2, (3, 4)], [(5, 6), 7, 8]])

  m4 = np.array([[1, 2, (3, 4)], [(5, 6), 7, 8]])


In [22]:
m4 = np.array([[1, 2, (3, 4)], [(5, 6), 7, 8]], dtype='object')
print_array_info(m4)
m4

Dimensions  : 2
Shape       : (2, 3)
Element Type: object
Size        : 6
Item Size   : 8


array([[1, 2, (3, 4)],
       [(5, 6), 7, 8]], dtype=object)

Note: *This is an atypical way to use NumPy!* The purpose of NumPy is to process arrays of *numbers*. However, this illustrates that NumPy insists every element of its arrays has the same type, "up converting" everything to "object" if it has to.

What happens if the nested sequences aren't ragged?

In [24]:
m5 = np.array([[(1, 2), (3, 4), (5, 6)], [(7, 8), (9, 10), (11, 12)]])
print_array_info(m5)
m5

Dimensions  : 3
Shape       : (2, 3, 2)
Element Type: int32
Size        : 12
Item Size   : 4


array([[[ 1,  2],
        [ 3,  4],
        [ 5,  6]],

       [[ 7,  8],
        [ 9, 10],
        [11, 12]]])

Here it actually creates a 3-dimensional array. The first dimension is taken from the original list-of-lists, the second dimension is taken from the tuples in each list, and the third dimension is taken from the tuples themselves (which all have the same size).

It is possible to create a large array either initialized to zeros (or ones) or uninitialized. This is useful when you know the size, but not yet the values that need to be put into the array. Although arrays can be resized dynamically, doing so is slow, so it's best to avoid that when possible.

In [26]:
m6 = np.zeros((100, 100), dtype=complex)   # The shape of the array. Here 2-dimensional with 100x100 (= 10,000) elements.
print_array_info(m6)
m6

Dimensions  : 2
Shape       : (100, 100)
Element Type: complex128
Size        : 10000
Item Size   : 16


array([[0.+0.j, 0.+0.j, 0.+0.j, ..., 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, ..., 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, ..., 0.+0.j, 0.+0.j, 0.+0.j],
       ...,
       [0.+0.j, 0.+0.j, 0.+0.j, ..., 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, ..., 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, ..., 0.+0.j, 0.+0.j, 0.+0.j]])

Math operations are done element-wise, including multiplication. The '@' operator can be used to do matrix multiplication.

In [46]:
a = np.array([[ 1,  2,  3], [ 4,  5,  6], [ 7,  8,  9]], dtype=float)
b = np.array([[11, 12, 13], [14, 15, 16], [17, 18, 19]], dtype=float)
print("\na + b =")
print(a + b)
print("\n2 * a =")
print(2 * a)
print("\na * b =")
print(a * b)
print("\na @ b =")
print(a @ b)


a + b =
[[12. 14. 16.]
 [18. 20. 22.]
 [24. 26. 28.]]

2 * a =
[[ 2.  4.  6.]
 [ 8. 10. 12.]
 [14. 16. 18.]]

a * b =
[[ 11.  24.  39.]
 [ 56.  75.  96.]
 [119. 144. 171.]]

a @ b =
[[ 90.  96. 102.]
 [216. 231. 246.]
 [342. 366. 390.]]


Let us now create an array filled with random values in the range \[0.0 .. 1.0) and access various parts of it.

In [53]:
generator = np.random.default_rng(None)
values = generator.random((5, 5))
print_array_info(values)
values

Dimensions  : 2
Shape       : (5, 5)
Element Type: float64
Size        : 25
Item Size   : 8


array([[0.73670193, 0.16771559, 0.55766573, 0.6516379 , 0.62271542],
       [0.44877273, 0.95539251, 0.49400179, 0.04968141, 0.47193075],
       [0.60575024, 0.52599709, 0.42231291, 0.27625504, 0.52952039],
       [0.75769367, 0.0702347 , 0.92410521, 0.59823278, 0.54465728],
       [0.87362086, 0.8091215 , 0.81580954, 0.15806203, 0.67271339]])

In [54]:
values[0]

array([0.73670193, 0.16771559, 0.55766573, 0.6516379 , 0.62271542])

In [55]:
values[0:3]

array([[0.73670193, 0.16771559, 0.55766573, 0.6516379 , 0.62271542],
       [0.44877273, 0.95539251, 0.49400179, 0.04968141, 0.47193075],
       [0.60575024, 0.52599709, 0.42231291, 0.27625504, 0.52952039]])

In [56]:
values[0:4:2]

array([[0.73670193, 0.16771559, 0.55766573, 0.6516379 , 0.62271542],
       [0.60575024, 0.52599709, 0.42231291, 0.27625504, 0.52952039]])

In [57]:
values[-1]

array([0.87362086, 0.8091215 , 0.81580954, 0.15806203, 0.67271339])

In [58]:
values[-1:-3:-1]

array([[0.87362086, 0.8091215 , 0.81580954, 0.15806203, 0.67271339],
       [0.75769367, 0.0702347 , 0.92410521, 0.59823278, 0.54465728]])

In [59]:
values[:, 0:2]

array([[0.73670193, 0.16771559],
       [0.44877273, 0.95539251],
       [0.60575024, 0.52599709],
       [0.75769367, 0.0702347 ],
       [0.87362086, 0.8091215 ]])

In [60]:
values[:, -1]

array([0.62271542, 0.47193075, 0.52952039, 0.54465728, 0.67271339])

In [61]:
values[:, -1:-3:-1]

array([[0.62271542, 0.6516379 ],
       [0.47193075, 0.04968141],
       [0.52952039, 0.27625504],
       [0.54465728, 0.59823278],
       [0.67271339, 0.15806203]])

In [62]:
values[1:3, 2:]

array([[0.49400179, 0.04968141, 0.47193075],
       [0.42231291, 0.27625504, 0.52952039]])