In [1]:
import numpy as np

# 4 NumPy Basics: Arrays and Vectorized Computation

One of the reasons NumPy is so important for numerical computations in Python is because it is designed for efficiency on large arrays of data. There are a number of reasons for this:

* NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects. NumPy’s library of algorithms written in the C lan‐ guage can operate on this memory without any type checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences.


* NumPy operations perform complex computations on entire arrays without the need for Python for loops.

To give you an idea of the performance difference, consider a NumPy array of one million integers, and the equivalent Python list:

In [1]:
import numpy as np

In [2]:
my_arr = np.arange(1000000)

In [3]:
my_list = list(range(1000000))

In [5]:
%time for _ in range(10): my_arr2 = my_arr * 2

CPU times: user 25.2 ms, sys: 15.4 ms, total: 40.6 ms
Wall time: 47.1 ms


In [6]:
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

CPU times: user 759 ms, sys: 184 ms, total: 944 ms
Wall time: 967 ms


NumPy-based algorithms are generally 10 to 100 times faster (or more) than their pure Python counterparts and use significantly less memory.

## 4.1 The NumPy ndarray: A Multidimensional Array Object

One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large datasets in Python. Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar elements.

<b>To give you a flavor of how NumPy enables batch computations with similar syntax to scalar values on built-in Python objects, I first import NumPy and generate a small array of random data:

In [7]:
import numpy as np

In [8]:
data = np.random.randn(2, 3)

In [9]:
data

array([[ 0.59055361, -0.01473664,  0.11025476],
       [ 1.10394016, -1.01167162,  0.38683953]])

<b>Then write mathematical operations with data:

In [10]:
data * 10

array([[  5.90553608,  -0.14736635,   1.10254756],
       [ 11.03940161, -10.11671625,   3.86839527]])

In [11]:
data + data

array([[ 1.18110722, -0.02947327,  0.22050951],
       [ 2.20788032, -2.02334325,  0.77367905]])

<b>In the first example, all of the elements have been multiplied by 10. In the second, the corresponding values in each “cell” in the array have been added to each other.

An ndarray is a generic multidimensional container for homogeneous data; that is, all of the elements must be the same type. Every array has a shape, a tuple indicating the size of each dimension, and a dtype, an object describing the data type of the array:

In [12]:
data.shape

(2, 3)

In [13]:
data.dtype

dtype('float64')

## Creating ndarrays

<b>The easiest way to create an array is to use the array function.

In [14]:
data1 = [6, 7.5, 8, 0, 1]

In [15]:
arr1 = np.array(data1)

In [16]:
arr1

array([6. , 7.5, 8. , 0. , 1. ])

<b>Nested sequences, like a list of equal-length lists, will be converted into a multidimen‐ sional array:

In [17]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]

In [18]:
arr2 = np.array(data2)

In [19]:
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

Since data2 was a list of lists, the NumPy array arr2 has two dimensions with shape inferred from the data. We can confirm this by inspecting the ndim and shape attributes:

In [20]:
arr2.ndim

2

In [21]:
arr2.shape

(2, 4)

Unless explicitly specified, np.array tries to infer a good data type for the array that it creates. The data type is stored in a special dtype metadata object; for example, in the previous two examples we have:

In [22]:
arr1.dtype

dtype('float64')

In [23]:
arr2.dtype

dtype('int64')

In addition to np.array, there are a number of other functions for creating new arrays. As examples, zeros and ones create arrays of 0s or 1s, respectively, with a given length or shape. empty creates an array without initializing its values to any par‐ ticular value. To create a higher dimensional array with these methods, pass a tuple for the shape:

In [24]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [25]:
np.zeros((3, 6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [26]:
np.empty((2, 3, 2))

array([[[2.00000000e+000, 1.29074003e-231],
        [3.95252517e-323, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000]],

       [[0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000]]])

In [27]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

## Data Types for ndarrays

In [28]:
arr1 = np.array([1, 2, 3], dtype=np.float64)

In [29]:
arr2 = np.array([1, 2, 3], dtype=np.int32)

In [30]:
arr1.dtype

dtype('float64')

In [31]:
arr2.dtype

dtype('int32')

<b>You can explicitly convert or cast an array from one dtype to another using ndarray’s astype method:

In [32]:
arr = np.array([1, 2, 3, 4, 5])

In [33]:
arr.dtype

dtype('int64')

In [34]:
float_arr = arr.astype(np.float64)

In [35]:
float_arr.dtype

dtype('float64')

<b>In this example, integers were cast to floating point. If I cast some floating-point numbers to be of integer dtype, the decimal part will be truncated:

In [36]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])

In [37]:
arr

array([ 3.7, -1.2, -2.6,  0.5, 12.9, 10.1])

In [38]:
arr.astype(np.int32)

array([ 3, -1, -2,  0, 12, 10], dtype=int32)

<b>If you have an array of strings representing numbers, you can use astype to convert
them to numeric form:

In [39]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)

In [40]:
numeric_strings.astype(float)

array([ 1.25, -9.6 , 42.  ])

<b>You can also use another array’s dtype attribute:

In [41]:
int_array = np.arange(10)

In [42]:
calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)

In [43]:
int_array.astype(calibers.dtype)

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

<b>There are shorthand type code strings you can also use to refer to a dtype:

In [45]:
empty_uint32 = np.empty(8, dtype='u4')

In [48]:
empty_uint32

array([         0, 1075314688,          0, 1075707904,          0,
       1075838976,          0, 1072693248], dtype=uint32)

* Calling astype always creates a new array (a copy of the data), even if the new dtype is the same as the old dtype.

## Arithmetic with NumPy Arrays

<b>Arrays are important because they enable you to express batch operations on data without writing any for loops. NumPy users call this vectorization. Any arithmetic operations between equal-size arrays applies the operation element-wise:

In [52]:
arr = np.array([[1.,2.,3.], [4.,5.,6.]])

In [53]:
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [54]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [55]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

<b>Arithmetic operations with scalars propagate the scalar argument to each element in the array:

In [56]:
1 / arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [57]:
arr ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

<b>Comparisons between arrays of the same size yield boolean arrays:

In [58]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])

In [59]:
arr2

array([[ 0.,  4.,  1.],
       [ 7.,  2., 12.]])

In [60]:
arr2 > arr

array([[False,  True, False],
       [ True, False,  True]])

## Basic Indexing and Slicing

In [61]:
arr = np.arange(10)

In [62]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [63]:
arr[5]

5

In [64]:
arr[5:8]

array([5, 6, 7])

In [65]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

As you can see, if you assign a scalar value to a slice, as in arr[5:8] = 12, the value is propagated (or broadcasted henceforth) to the entire selection. An important first dis‐ tinction from Python’s built-in lists is that array slices are views on the original array. This means that the data is not copied, and any modifications to the view will be reflected in the source array.

In [66]:
arr_slice = arr[5:8]

In [67]:
arr_slice

array([5, 6, 7])

<b>Now, when I change values in arr_slice, the mutations are reflected in the original array arr:

In [69]:
arr_slice[1] = 12345

In [70]:
arr

array([    0,     1,     2,     3,     4,     5, 12345,     7,     8,
           9])

<b>The “bare” slice [:] will assign to all values in an array:

In [71]:
arr_slice[:] = 64

In [72]:
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

<b>With higher dimensional arrays, you have many more options. In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays:

In [73]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

In [74]:
arr2d[2]

array([7, 8, 9])

In [75]:
arr2d[0][2]

3

In [76]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

In [77]:
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

<b>arr3d[0] is a 2 × 3 array:

In [78]:
arr3d[0]

array([[1, 2, 3],
       [4, 5, 6]])

<b>Both scalar values and arrays can be assigned to arr3d[0]:

In [79]:
old_values = arr3d[0].copy()

In [80]:
arr3d[0] = 42

In [81]:
arr3d

array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [82]:
arr3d[0] = old_values

In [83]:
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

<b>Similarly, arr3d[1, 0] gives you all of the values whose indices start with (1, 0),
forming a 1-dimensional array:

In [84]:
arr3d[1, 0]

array([7, 8, 9])

<b>This expression is the same as though we had indexed in two steps:

In [85]:
x = arr3d[1]

In [86]:
x

array([[ 7,  8,  9],
       [10, 11, 12]])

In [87]:
x[0]

array([7, 8, 9])

## Indexing with slices

<b>Like one-dimensional objects such as Python lists, ndarrays can be sliced with the familiar syntax:

In [88]:
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

<b>Consider the two-dimensional array from before, arr2d. Slicing this array is a bit different:

In [89]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [90]:
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

<b>You can pass multiple slices just like you can pass multiple indexes:

In [91]:
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

<b>When slicing like this, you always obtain array views of the same number of dimensions. By mixing integer indexes and slices, you get lower dimensional slices.

In [95]:
arr2d[1, :2]

array([4, 5])

In [96]:
arr2d[:2,2]

array([3, 6])

In [97]:
arr2d[:,:1]

array([[1],
       [4],
       [7]])

In [98]:
arr2d[:2, 1:] = 0

In [99]:
arr2d

array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

## Boolean Indexing

In [114]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In [115]:
data = np.random.randn(7,4)

In [116]:
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [117]:
data

array([[-0.97021831, -0.47834603, -0.29984588, -0.07791196],
       [-2.57926121, -0.31225694, -0.74851638,  1.10186048],
       [-0.95189413,  0.56015005,  0.07580671,  0.79067415],
       [-0.15561284, -0.23534573,  1.19814486,  0.52455521],
       [-0.30192072,  0.56916127, -1.2014923 , -0.47335005],
       [-1.87285988, -0.32306263, -1.31150185,  1.69826139],
       [-0.63130189,  1.26133042, -2.71249664,  1.40010792]])

In [118]:
names == 'Bob'

array([ True, False, False,  True, False, False, False])

In [119]:
data[names == 'Bob']

array([[-0.97021831, -0.47834603, -0.29984588, -0.07791196],
       [-0.15561284, -0.23534573,  1.19814486,  0.52455521]])

In [120]:
data[names == 'Bob', 2:]

array([[-0.29984588, -0.07791196],
       [ 1.19814486,  0.52455521]])

In [121]:
data[names == 'Bob', 3]

array([-0.07791196,  0.52455521])

In [122]:
names != 'Bob'

array([False,  True,  True, False,  True,  True,  True])

In [123]:
data[~(names == 'Bob')]

array([[-2.57926121, -0.31225694, -0.74851638,  1.10186048],
       [-0.95189413,  0.56015005,  0.07580671,  0.79067415],
       [-0.30192072,  0.56916127, -1.2014923 , -0.47335005],
       [-1.87285988, -0.32306263, -1.31150185,  1.69826139],
       [-0.63130189,  1.26133042, -2.71249664,  1.40010792]])

In [124]:
cond = names == 'Bob'

In [125]:
data[~cond]

array([[-2.57926121, -0.31225694, -0.74851638,  1.10186048],
       [-0.95189413,  0.56015005,  0.07580671,  0.79067415],
       [-0.30192072,  0.56916127, -1.2014923 , -0.47335005],
       [-1.87285988, -0.32306263, -1.31150185,  1.69826139],
       [-0.63130189,  1.26133042, -2.71249664,  1.40010792]])

In [126]:
mask = (names == 'Bob') | (names == 'Will')

In [127]:
mask

array([ True, False,  True,  True,  True, False, False])

In [128]:
data[mask]

array([[-0.97021831, -0.47834603, -0.29984588, -0.07791196],
       [-0.95189413,  0.56015005,  0.07580671,  0.79067415],
       [-0.15561284, -0.23534573,  1.19814486,  0.52455521],
       [-0.30192072,  0.56916127, -1.2014923 , -0.47335005]])

In [129]:
data[data < 0] = 0

In [130]:
data

array([[0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 1.10186048],
       [0.        , 0.56015005, 0.07580671, 0.79067415],
       [0.        , 0.        , 1.19814486, 0.52455521],
       [0.        , 0.56916127, 0.        , 0.        ],
       [0.        , 0.        , 0.        , 1.69826139],
       [0.        , 1.26133042, 0.        , 1.40010792]])

In [131]:
data[names != 'Joe'] = 7

## Fancy Indexing

<b>Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays. Suppose we had an 8 × 4 array:

In [132]:
arr = np.empty((8,4))

In [133]:
for i in range(8):
    arr[i] = i

In [134]:
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

In [135]:
arr[[4, 3, 0, 6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

In [136]:
arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

In [137]:
arr = np.arange(32).reshape((8,4))

In [138]:
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [139]:
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

## Transposing Arrays and Swapping Axes

Transposing is a special form of reshaping that similarly returns a view on the under‐ lying data without copying anything. Arrays have the transpose method and also the special T attribute:

In [140]:
arr = np.arange(15).reshape((3,5))

In [141]:
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [142]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

When doing matrix computations, you may do this very often—for example, when computing the inner matrix product using np.dot:

In [143]:
arr = np.random.randn(6,3)

In [144]:
arr

array([[ 0.08631211, -0.35326525,  0.5001174 ],
       [ 0.47375671,  1.38610836,  0.87921714],
       [ 2.12903909, -0.82504707, -1.94834317],
       [ 0.97399259, -0.10133224, -1.00091466],
       [-2.7788188 , -0.57701134, -0.49571307],
       [-0.36188773,  0.91544101,  1.42409333]])

In [145]:
np.dot(arr.T, arr)

array([[13.56616083,  0.04305583, -3.8011461 ],
       [ 0.04305583,  3.90803794,  4.34062138],
       [-3.8011461 ,  4.34062138,  8.0947847 ]])

For higher dimensional arrays, transpose will accept a tuple of axis numbers to per‐ mute the axes (for extra mind bending):

In [146]:
arr = np.arange(16).reshape((2,2,4))

In [147]:
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [148]:
arr.transpose((1, 0, 2))

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

Simple transposing with .T is a special case of swapping axes. ndarray has the method swapaxes, which takes a pair of axis numbers and switches the indicated axes to rear‐ range the data:

In [149]:
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [150]:
arr.swapaxes(1,2)

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

# 4.2 Universal Functions: Fast Element-Wise Array Functions

In [151]:
arr = np.arange(10)

In [152]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [153]:
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [154]:
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

These are referred to as unary ufuncs. Others, such as add or maximum, take two arrays (thus, binary ufuncs) and return a single array as the result:

In [155]:
x = np.random.randn(8)

In [156]:
y = np.random.randn(8)

In [157]:
x

array([-1.46045053, -1.90958696,  0.73288984, -0.55501242, -1.01481428,
        0.11370813,  1.24775686, -1.13436835])

In [158]:
y

array([-1.37296853, -0.7373905 ,  0.46062114, -1.66567007,  0.94183275,
        1.67255355,  0.02774114,  1.56999295])

In [159]:
np.maximum(x,y)

array([-1.37296853, -0.7373905 ,  0.73288984, -0.55501242,  0.94183275,
        1.67255355,  1.24775686,  1.56999295])

<b>Here, numpy.maximum computed the element-wise maximum of the elements in x and y.

<b>While not common, a ufunc can return multiple arrays. modf is one example, a vec‐ torized version of the built-in Python divmod; it returns the fractional and integral parts of a floating-point array:

In [160]:
arr = np.random.randn(7) * 5

In [161]:
arr

array([-1.50489188,  5.01241384, -6.97769747, -1.67766857, -4.4145417 ,
        3.54221811,  1.04343551])

In [162]:
remainder, whole_part = np.modf(arr)

In [163]:
remainder

array([-0.50489188,  0.01241384, -0.97769747, -0.67766857, -0.4145417 ,
        0.54221811,  0.04343551])

In [164]:
whole_part

array([-1.,  5., -6., -1., -4.,  3.,  1.])

<b>Ufuncs accept an optional out argument that allows them to operate in-place on arrays:

In [165]:
arr

array([-1.50489188,  5.01241384, -6.97769747, -1.67766857, -4.4145417 ,
        3.54221811,  1.04343551])

In [166]:
np.sqrt(arr)

  np.sqrt(arr)


array([       nan, 2.23884207,        nan,        nan,        nan,
       1.88207814, 1.02148691])