# Introduction to NumPy

NumPy is a package for scientific computing with Python. It provides a powerful N-dimensional Array Object called ndarray (or numpy array) and routines to manipulate it. These arrays provide much more efficient storage and data operations as the arrays grow larger in size.

## 1. The Basics of NumPy Arrays

In [1]:
import numpy as np

In [2]:
x1 = np.random.randint(10, size=6)  # One-dimensional array
x1

array([9, 6, 4, 3, 4, 7])

In [3]:
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x2

array([[6, 8, 6, 9],
       [0, 4, 2, 4],
       [4, 1, 0, 3]])

In [4]:
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array
x3

array([[[6, 5, 1, 6, 9],
        [9, 0, 1, 1, 0],
        [4, 0, 7, 0, 1],
        [5, 5, 7, 4, 3]],

       [[4, 6, 9, 0, 9],
        [9, 1, 2, 4, 9],
        [8, 5, 3, 4, 9],
        [1, 2, 1, 7, 3]],

       [[2, 9, 4, 2, 8],
        [3, 0, 3, 6, 1],
        [6, 6, 6, 7, 9],
        [2, 8, 1, 6, 1]]])

Each array has attributes ndim (the number of dimensions), shape (the size of each dimension), and size (the total size of the array):

In [5]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60


Another useful attribute is the dtype, the data type of the array:

In [6]:
print("dtype:", x3.dtype)

dtype: int32


Other attributes include itemsize, which lists the size (in bytes) of each array element, and nbytes, which lists the total size (in bytes) of the array:

In [7]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

itemsize: 4 bytes
nbytes: 240 bytes


## Array Indexing: Accessing Single Elements

In a one-dimensional array, the $i^{th}$ value (counting from zero) can be accessed by specifying the desired index in square brackets, just as with Python lists:

In [8]:
x1

array([9, 6, 4, 3, 4, 7])

In [9]:
x1[0]

9

In [10]:
x1[4]

4

We can also index from the end of the array using negative indices:

In [11]:
x1[-1]

7

In [12]:
x1[-2]

4

In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices:

In [13]:
x2

array([[6, 8, 6, 9],
       [0, 4, 2, 4],
       [4, 1, 0, 3]])

In [14]:
x2[0, 0]

6

In [15]:
x2[1, 2]

2

In [16]:
x2[2, -1]

3

Values can also be modified using any of the above index notation:

In [17]:
x2[0, 0] = 12
x2

array([[12,  8,  6,  9],
       [ 0,  4,  2,  4],
       [ 4,  1,  0,  3]])

Keep in mind that, unlike Python lists, NumPy arrays have a fixed type. This means, for example, that if you attempt to insert a floating-point value to an integer array, the value will be silently truncated. Don't be caught unaware by this behavior!

In [18]:
x1[0] = 3.14159  # this will be truncated!
x1

array([3, 6, 4, 3, 4, 7])

## Array Slicing: Accessing Subarrays

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the slice notation, marked by the colon (:) character. The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array x, use this:

x[start:stop:step]

If any of these are unspecified, they default to the values start=0, stop=size of dimension, step=1. We'll take a look at accessing sub-arrays in one dimension and in multiple dimensions.

### One-dimensional subarrays

In [19]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [20]:
x[:5]  # first five elements

array([0, 1, 2, 3, 4])

In [21]:
x[5:]  # elements after index 5

array([5, 6, 7, 8, 9])

In [22]:
x[4:7]  # middle sub-array

array([4, 5, 6])

In [23]:
x[::2]  # every other element

array([0, 2, 4, 6, 8])

In [24]:
x[1::2]  # every other element, starting at index 1

array([1, 3, 5, 7, 9])

A potentially confusing case is when the step value is negative. In this case, the defaults for start and stop are swapped. This becomes a convenient way to reverse an array:

In [25]:
x[::-1]  # all elements, reversed

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [26]:
x[5::-2] # reversed every other from index 5

array([5, 3, 1])

### Multi-dimensional subarrays

In [27]:
x2

array([[12,  8,  6,  9],
       [ 0,  4,  2,  4],
       [ 4,  1,  0,  3]])

In [28]:
x2[:2, :3]  # first two rows, three columns

array([[12,  8,  6],
       [ 0,  4,  2]])

In [29]:
x2[:3, ::2]  # all rows, every other column

array([[12,  6],
       [ 0,  2],
       [ 4,  0]])

Finally, subarray dimensions can even be reversed together:

In [30]:
x2[::-1, ::-1]

array([[ 3,  0,  1,  4],
       [ 4,  2,  4,  0],
       [ 9,  6,  8, 12]])

In [31]:
x2[::-1] # reverse the rows

array([[ 4,  1,  0,  3],
       [ 0,  4,  2,  4],
       [12,  8,  6,  9]])

**Accessing array rows and columns**

One commonly needed routine is accessing of single rows or columns of an array. This can be done by combining indexing and slicing, using an empty slice marked by a single colon (:):

In [32]:
x2[:, 0] # first column of x2

array([12,  0,  4])

In [33]:
x2[0, :] # first row of x2

array([12,  8,  6,  9])

In [34]:
x2[0] # equivalent to x2[0, :]

array([12,  8,  6,  9])

In [35]:
x2[:, 2]

array([6, 2, 0])

### Subarrays as no-copy views¶

One important–and extremely useful–thing to know about array slices is that they return views rather than copies of the array data. This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies. Consider our two-dimensional array from before:

In [36]:
x2

array([[12,  8,  6,  9],
       [ 0,  4,  2,  4],
       [ 4,  1,  0,  3]])

Let's extract a $2 \times 2$ subarray from this:

In [37]:
x2_sub = x2[:2, :2]
x2_sub

array([[12,  8],
       [ 0,  4]])

Now if we modify this subarray, we'll see that the original array is changed! Observe:

In [38]:
x2_sub[0, 0] = 99
x2_sub

array([[99,  8],
       [ 0,  4]])

In [39]:
x2

array([[99,  8,  6,  9],
       [ 0,  4,  2,  4],
       [ 4,  1,  0,  3]])

This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.

### Creating copies of arrays

In [40]:
x2_sub_copy = x2[:2, :2].copy()
x2_sub_copy

array([[99,  8],
       [ 0,  4]])

If we now modify this subarray, the original array is not touched:

In [41]:
x2_sub_copy[0, 0] = 1
x2_sub_copy

array([[1, 8],
       [0, 4]])

In [42]:
x2

array([[99,  8,  6,  9],
       [ 0,  4,  2,  4],
       [ 4,  1,  0,  3]])

### Reshaping of Arrays

Another useful type of operation is reshaping of arrays. The most flexible way of doing this is with the reshape method. 

For example, if you want to put the numbers 1 through 9 in a $3 \times 3$ grid, you can do the following:

In [43]:
grid = np.arange(1, 10).reshape((3, 3))
grid

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Note that for this to work, the size of the initial array must match the size of the reshaped array. Where possible, the reshape method will use a no-copy view of the initial array, but with non-contiguous memory buffers this is not always the case.

If you want to convert a one-dimentional array into a two-dimentional row or column matrix is using the newaxis method with a slice operation.


In [44]:
x = np.array([1, 2, 3])
print(x)
# reshape this into a two dimentional row vector
x.reshape((1, 3))

[1 2 3]


array([[1, 2, 3]])

In [45]:
x[np.newaxis, :]

array([[1, 2, 3]])

In [46]:
# column vector via reshape
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

In [47]:
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

### Array Concatenation and Splitting

All of the preceding routines worked on single arrays. It's also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays. We'll take a look at those operations here.

In [48]:
x = np.array([1, 2, 3])
y = np.array([3,2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

You can also concatenate more than two arrays at once:


In [49]:
z = np.array([99, 99, 99])
np.concatenate([x, y, z])

array([ 1,  2,  3,  3,  2,  1, 99, 99, 99])

It can also be used for two-dimensional arrays:

In [50]:

grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [51]:
# concatenate along the first axis

np.concatenate([grid, grid], axis=0)

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [52]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

If you are working with arrays of mixed dimensions, it can be clearer to use the np.vstack (vertical stack) and np.hstack (horizontal stack) functions:

In [53]:
x = np.array([1, 2, 3])

grid = np.array([[9, 8, 7],
                 [6, 5, 64]])

np.vstack([x, grid])

array([[ 1,  2,  3],
       [ 9,  8,  7],
       [ 6,  5, 64]])

In [54]:
y = np.array([[99],
              [99]])

np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5, 64, 99]])

### Splitting of arrays


The opposite of concatenation is splitting, which is implemented by the functions np.split, np.hsplit, and np.vsplit. For each of these, we can pass a list of indices giving the split points:

In [55]:
x = [1, 2, 3, 99, 99,  3, 2, 1]

# we can pass a list of indices giving the split points:
x1, x2, x3 = np.split(x, [3, 5])
x1, x2, x3

(array([1, 2, 3]), array([99, 99]), array([3, 2, 1]))

Notice that N split-points, leads to N + 1 subarrays. The related functions np.hsplit and np.vsplit are similar:

In [56]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [57]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print()
print(lower)

[[0 1 2 3]
 [4 5 6 7]]

[[ 8  9 10 11]
 [12 13 14 15]]


In [58]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


# Computation on NumPy Arrays: Universal Functions

Up until now, we have been discussing some of the basic nuts and bolts of NumPy; in the next few sections, we will dive into the reasons that NumPy is so important in the Python data science world. Namely, it provides an easy and flexible interface to optimized computation with arrays of data.

Computation on NumPy arrays can be very fast, or it can be very slow. The key to making it fast is to use vectorized operations, generally implemented through NumPy's universal functions (ufuncs). This section motivates the need for NumPy's ufuncs, which can be used to make repeated calculations on array elements much more efficient. It then introduces many of the most common and useful arithmetic ufuncs available in the NumPy package.

In [59]:
np.random.seed(0)

def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output

values = np.random.randint(1, 10, size=5)
compute_reciprocals(values)

array([0.16666667, 1.        , 0.25      , 0.25      , 0.125     ])

The slowness of loops:

In [60]:
big_array = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)

1.45 s ± 40.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


It takes several seconds to compute these million operations and to store the result!

##  Introducing UFuncs

For many types of operations, NumPy provides a convenient interface into just this kind of statically typed, compiled routine. This is known as a vectorized operation. This can be accomplished by simply performing an operation on the array, which will then be applied to each element. This vectorized approach is designed to push the loop into the compiled layer that underlies NumPy, leading to much faster execution.

Compare the results of the following two:

In [61]:
print(compute_reciprocals(values))
print(1.0 / values)

[0.16666667 1.         0.25       0.25       0.125     ]
[0.16666667 1.         0.25       0.25       0.125     ]


Looking at the execution time for our big array, we see that it completes orders of magnitude faster than the Python loop:

In [62]:
%timeit (1.0 / big_array)

3.75 ms ± 227 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Vectorized operations in NumPy are implemented via ufuncs, whose main purpose is to quickly execute repeated operations on values in NumPy arrays. 

Ufuncs are extremely flexible. Before we saw an operation between a scalar and an array, but we can also operate between two arrays:

In [63]:
np.arange(5) / np.arange(1, 6)

array([0.        , 0.5       , 0.66666667, 0.75      , 0.8       ])

In [64]:
x = np.arange(9).reshape((3, 3))
2 ** x

array([[  1,   2,   4],
       [  8,  16,  32],
       [ 64, 128, 256]], dtype=int32)

## Array arithmetic

NumPy's ufuncs feel very natural to use because they make use of Python's native arithmetic operators. The standard addition, subtraction, multiplication, and division can all be used:


In [65]:
x = np.arange(4)
print("x     =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2)  # floor division

x     = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0.  0.5 1.  1.5]
x // 2 = [0 0 1 1]


There is also a unary ufunc for negation, and a ** operator for exponentiation, and a % operator for modulus:

In [66]:
print("-x     = ", -x)
print("x ** 2 = ", x ** 2)
print("x % 2  = ", x % 2)

-x     =  [ 0 -1 -2 -3]
x ** 2 =  [0 1 4 9]
x % 2  =  [0 1 0 1]


In addition, these can be strung together however you wish, and the standard order of operations is respected:

In [67]:
-(0.5*x + 1) ** 2

array([-1.  , -2.25, -4.  , -6.25])

Each of these arithmetic operations are simply convenient wrappers around specific functions built into NumPy; for example, the + operator is a wrapper for the add function:

In [68]:
np.add(x, 2)

array([2, 3, 4, 5])

## Absolute value

Just as NumPy understands Python's built-in arithmetic operators, it also understands Python's built-in absolute value function:

The corresponding NumPy ufunc is np.absolute, which is also available under the alias np.abs:

In [69]:
x = np.array([-2, -1, 0, 1, 2])
np.absolute(x)

array([2, 1, 0, 1, 2])

In [70]:
np.abs(x)

array([2, 1, 0, 1, 2])

###  Exponents and logarithms

Another common type of operation available in a NumPy ufunc are the exponentials:

In [71]:
x = [1, 2, 3]
print("x     =", x)
print("e^x   =", np.exp(x))
print("2^x   =", np.exp2(x))
print("3^x   =", np.power(3, x))

x     = [1, 2, 3]
e^x   = [ 2.71828183  7.3890561  20.08553692]
2^x   = [2. 4. 8.]
3^x   = [ 3  9 27]


In [72]:
x = [1, 2, 4, 10]
print("x        =", x)
print("ln(x)    =", np.log(x))
print("log2(x)  =", np.log2(x))
print("log10(x) =", np.log10(x))


x        = [1, 2, 4, 10]
ln(x)    = [0.         0.69314718 1.38629436 2.30258509]
log2(x)  = [0.         1.         2.         3.32192809]
log10(x) = [0.         0.30103    0.60205999 1.        ]


### Specifying output

For large calculations, it is sometimes useful to be able to specify the array where the result of the calculation will be stored. Rather than creating a temporary array, this can be used to write computation results directly to the memory location where you'd like them to be. For all ufuncs, this can be done using the out argument of the function:

In [73]:
x = np.arange(5)
y = np.empty(5)

np.multiply(x, 10, out=y)
print(y)

[ 0. 10. 20. 30. 40.]


### Aggregates

For binary ufuncs, there are some interesting aggregates that can be computed directly from the object. 

For example, if we'd like to reduce an array with a particular operation, we can use the reduce method of any ufunc.

A reduce repeatedly applies a given operation to the elements of an array until only a single result remains.

For example, calling reduce on the add ufunc returns the sum of all elements in the array:

In [74]:
x = np.arange(1, 6)
np.add.reduce(x)

15

In [75]:
np.multiply.reduce(x)

120

If we'd like to store all the intermediate results of the computation, we can instead use accumulate:

In [76]:
np.add.accumulate(x)

array([ 1,  3,  6, 10, 15], dtype=int32)

### Aggregations: Min, Max, and Everything In Between

Often when faced with a large amount of data, a first step is to compute summary statistics for the data in question. Perhaps the most common summary statistics are the mean and standard deviation, which allow you to summarize the "typical" values in a dataset, but other aggregates are useful as well (the sum, product, median, minimum and maximum, quantiles, etc.).

NumPy has fast built-in aggregation functions for working on arrays; we'll discuss and demonstrate some of them here.

### Summing the Values in an Array

In [77]:
L = np.random.random(100)
np.sum(L)

50.46175845319564

### Minimum and Maximum

In [78]:
big_array = np.random.rand(1000000)

In [79]:
min(big_array), max(big_array)

(7.071203171893359e-07, 0.9999997207656334)

NumPy's corresponding functions have similar syntax, and again operate much more quickly:

In [80]:
np.min(big_array), np.max(big_array)

(7.071203171893359e-07, 0.9999997207656334)

In [81]:
%timeit min(big_array)
%timeit np.min(big_array)

55.2 ms ± 8.58 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
308 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


For min, max, sum, and several other NumPy aggregates, a shorter syntax is to use methods of the array object itself:

In [82]:
print(big_array.min(), big_array.max(), big_array.sum())

7.071203171893359e-07 0.9999997207656334 500216.8034810001


### Multi dimensional aggregates

One common type of aggregation operation is an aggregate along a row or column. Say you have some data stored in a two-dimensional array:



In [83]:
M = np.random.random((3, 4))
print(M)

[[0.79832448 0.44923861 0.95274259 0.03193135]
 [0.18441813 0.71417358 0.76371195 0.11957117]
 [0.37578601 0.11936151 0.37497044 0.22944653]]


Aggregation functions take an additional argument specifying the axis along which the aggregate is computed. For example, we can find the minimum value within each column by specifying axis=0

In [84]:
#The function returns four values, corresponding to the four
# columns of numbers.

M.sum(axis=0) 

array([1.35852862, 1.2827737 , 2.09142498, 0.38094904])

In [85]:
M.sum(axis=1)

array([2.23223702, 1.78187484, 1.09956449])

Similarly, we can find the maximum value within each row:

In [86]:
M.max(axis=1)

array([0.95274259, 0.76371195, 0.37578601])

# Computation on Arrays: Broadcasting

Recall that for arrays of the same size, binary operations are performed on an element-by-element basis:

In [87]:
a = np.array([0, 1, 2])
b = np.array([5, 5, 5])
a + b

array([5, 6, 7])

Broadcasting allows these types of binary operations to be performed on arrays of different sizes–for example, we can just as easily add a scalar (think of it as a zero-dimensional array) to an array:

In [88]:
a + 5

array([5, 6, 7])

We can think of this as an operation that stretches or duplicates the value 5 into the array [5, 5, 5], and adds the results. The advantage of NumPy's broadcasting is that this duplication of values does not actually take place, but it is a useful mental model as we think about broadcasting.

We can similarly extend this to arrays of higher dimension. Observe the result when we add a one-dimensional array to a two-dimensional array:

In [89]:
M = np.ones((3, 3))
M

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

Here the one-dimensional array a is stretched, or broadcast across the second dimension in order to match the shape of M.

In [90]:
M + a

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

While these examples are relatively easy to understand, more complicated cases can involve broadcasting of both arrays. Consider the following example:

In [91]:
a = np.arange(3)
b = np.arange(3)[:, np.newaxis]

print(a)
print(b)

[0 1 2]
[[0]
 [1]
 [2]]


In [92]:
a + b

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

Just as before we stretched or broadcasted one value to match the shape of the other, here we've stretched both a and b to match a common shape, and the result is a two-dimensional array! 

# Comparisons, Masks, and Boolean Logic

This section covers the use of Boolean masks to examine and manipulate values within NumPy arrays. 

Masking comes up when you want to extract, modify, count, or otherwise manipulate values in an array based on some criterion: 

for example, you might wish to count all values greater than a certain value, or perhaps remove all outliers that are above some threshold. In NumPy, Boolean masking is often the most efficient way to accomplish these types of tasks.

In [93]:
x = np.array([1, 2, 3, 4, 5])

In [94]:
x < 3

array([ True,  True, False, False, False])

The result of these comparison operators is always an array with a Boolean data type. All six of the standard comparison operations are available.

In [95]:
x > 3

array([False, False, False,  True,  True])

In [96]:
x <= 3

array([ True,  True,  True, False, False])

In [97]:
x != 3

array([ True,  True, False,  True,  True])

In [98]:
x == 3

array([False, False,  True, False, False])

It is also possible to do an element-wise comparison of two arrays, and to include compound expressions:

In [99]:
(2 * x) == (x ** 2)

array([False,  True, False, False, False])

The same applies to arrays of any size and shape. 
Here is a two-dimensional example:

In [100]:
rng = np.random.RandomState(0)
x = rng.randint(10, size=(3, 4))
x

array([[5, 0, 3, 3],
       [7, 9, 3, 5],
       [2, 4, 7, 6]])

To count the number of True entries in a Boolean array, np.count_nonzero is useful:

In [101]:
np.count_nonzero(x < 6)

8

If we're interested in quickly checking whether any or all the values are true, we can use (you guessed it) np.any or np.all:

In [102]:
np.any(x > 8)

True

In [103]:
np.all(x < 10)

True

## Boolean Arrays as Masks

A more powerful pattern is to use Boolean arrays as masks, to select particular subsets of the data themselves. Returning to our x array from before, suppose we want an array of all values in the array that are less than, say, 5:

In [104]:
x

array([[5, 0, 3, 3],
       [7, 9, 3, 5],
       [2, 4, 7, 6]])

We can obtain a Boolean array for this condition easily, as we've already seen:

In [105]:
x < 5

array([[False,  True,  True,  True],
       [False, False,  True, False],
       [ True,  True, False, False]])

Now to select these values from the array, we can simply index on this Boolean array; this is known as a masking operation:

In [106]:
x[x < 5]

array([0, 3, 3, 3, 2, 4])

# Fancy Indexing

In this section, we'll look at another style of array indexing, known as fancy indexing. Fancy indexing is like the simple indexing we've already seen, but we pass arrays of indices in place of single scalars. This allows us to very quickly access and modify complicated subsets of an array's values.

In [107]:
rand = np.random.RandomState(42)

x = rand.randint(100, size=10)
print(x)

[51 92 14 71 60 20 82 86 74 74]


Suppose we want to access three different elements. We could do it like this:

In [108]:
[x[3], x[7], x[2]]

[71, 86, 14]

Alternatively, we can pass a single list or array of indices to obtain the same result:

In [109]:
ind = [3, 7, 4]
x[ind]

array([71, 86, 60])

Fancy indexing also works in multiple dimensions. Consider the following array:

In [110]:
X = np.arange(12).reshape((3, 4))
X

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Like with standard indexing, the first index refers to the row, and the second to the column:

In [111]:
row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
X[row, col]

array([ 2,  5, 11])

### Combined Indexing

For even more powerful operations, fancy indexing can be combined with the other indexing schemes we've seen:

In [112]:
print(X)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [113]:
X[2, [2, 0, 1]] # row with index 2, columns with index 2, 0 and 1

array([10,  8,  9])

We can also combine fancy indexing with slicing:



In [114]:
X[1:, [2, 0, 1]]

array([[ 6,  4,  5],
       [10,  8,  9]])

### Modifying Values with Fancy Indexing

Just as fancy indexing can be used to access parts of an array, it can also be used to modify parts of an array. For example, imagine we have an array of indices and we'd like to set the corresponding items in an array to some value:

In [115]:
x = np.arange(10)
i = np.array([2, 1, 8, 4])
x[i] = 99
print(x)

[ 0 99 99  3 99  5  6  7 99  9]


We can use any assignment-type operator for this. For example:

In [116]:
x[i] -= 10
print(x)

[ 0 89 89  3 89  5  6  7 89  9]


# Sorting Arrays

This section covers algorithms related to sorting values in NumPy arrays.

By default np.sort uses an $\mathcal{O}[N\log N]$, quicksort algorithm, though mergesort and heapsort are also available. For most applications, the default quicksort is more than sufficient.

To return a sorted version of the array without modifying the input, you can use np.sort:

In [117]:
x = np.array([2, 1, 4, 3, 5])
np.sort(x)

array([1, 2, 3, 4, 5])

A related function is argsort, which instead returns the indices of the sorted elements:

In [118]:
x = np.array([2, 1, 4, 3, 5])
i = np.argsort(x)
print(i)

[1 0 3 2 4]


### Sorting along rows or columns

A useful feature of NumPy's sorting algorithms is the ability to sort along specific rows or columns of a multidimensional array using the axis argument. For example:

In [119]:
rand = np.random.RandomState(42)
X = rand.randint(0, 10, (4, 6))
print(X)

[[6 3 7 4 6 9]
 [2 6 7 4 3 7]
 [7 2 5 4 1 7]
 [5 1 4 0 9 5]]


In [120]:
# sort each column of X
np.sort(X, axis=0)

array([[2, 1, 4, 0, 1, 5],
       [5, 2, 5, 4, 3, 7],
       [6, 3, 7, 4, 6, 7],
       [7, 6, 7, 4, 9, 9]])

In [121]:
# sort each row of X
np.sort(X, axis=1)

array([[3, 4, 6, 6, 7, 9],
       [2, 3, 4, 6, 7, 7],
       [1, 2, 4, 5, 7, 7],
       [0, 1, 4, 5, 5, 9]])

# Structured Data: NumPy's Structured Arrays

This section demonstrates the use of NumPy's structured arrays and record arrays, which provide efficient storage for compound, heterogeneous data.

In [122]:
name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

But this is a bit clumsy. There's nothing here that tells us that the three arrays are related; it would be more natural if we could use a single structure to store all of this data. NumPy can handle this through structured arrays, which are arrays with compound data types.

In [123]:
# Recall that previously we created a simple array using an expression like this:
x = np.zeros(4, dtype=int)

We can similarly create a structured array using a compound data type specification:

In [124]:
# Use a compound data type for structured arrays
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
print(data.dtype)

[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]


Now that we've created an empty container array, we can fill the array with our lists of values:

In [125]:
data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)

[('Alice', 25, 55. ) ('Bob', 45, 85.5) ('Cathy', 37, 68. )
 ('Doug', 19, 61.5)]


The handy thing with structured arrays is that you can now refer to values either by index or by name:

In [126]:
# Get all names
data['name']

array(['Alice', 'Bob', 'Cathy', 'Doug'], dtype='<U10')

In [127]:
# Get first row of data
data[0]

('Alice', 25, 55.)

Using Boolean masking, this even allows you to do some more sophisticated operations such as filtering on age:

In [128]:
# Get names where age is under 30
data[data['age'] < 30]['name']

array(['Alice', 'Doug'], dtype='<U10')