# The NumPy ndarray:

One of the key features of NumPy is its N-dimensional array object, or ndarray,
which is a fast, flexible container for large datasets in Python. Arrays enable you to
perform mathematical operations on whole blocks of data using similar syntax to the
equivalent operations between scalar elements.


In [1]:
import numpy as np
data = np.random.randn(2, 3)
data

array([[ 0.57970995,  0.96560127, -0.46138148],
       [ 0.02451126,  0.6514051 ,  0.21259157]])

Mathematical operations with data:

In [2]:
data * 10

array([[ 5.79709949,  9.65601266, -4.61381476],
       [ 0.24511262,  6.514051  ,  2.12591574]])

In [3]:
data + data

array([[ 1.1594199 ,  1.93120253, -0.92276295],
       [ 0.04902252,  1.3028102 ,  0.42518315]])

An ndarray is a generic multidimensional container for homogeneous data; that is, all
of the elements must be the same type. Every array has a shape, a tuple indicating the
size of each dimension, and a dtype, an object describing the data type of the array:

In [4]:
data.shape

(2, 3)

In [5]:
data.dtype

dtype('float64')

## Creating ndarrays

In [6]:
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1

array([6. , 7.5, 8. , 0. , 1. ])

Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array:

In [7]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

Since data2 was a list of lists, the NumPy array arr2 has two dimensions with shape
inferred from the data. We can confirm this by inspecting the ndim and shape
attributes:

In [8]:
arr2.ndim

2

In [9]:
arr2.shape

(2, 4)

In addition to np.array, there are a number of other functions for creating new
arrays. As examples, zeros and ones create arrays of 0s or 1s, respectively, with a
given length or shape. empty creates an array without initializing its values to any particular value. 
To create a higher dimensional array with these methods, pass a tuple
for the shape:

In [10]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [11]:
np.zeros((3, 6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [12]:
np.empty((2, 3, 2))

array([[[1.01855798e-312, 1.10343781e-312],
        [1.01855798e-312, 9.54898106e-313],
        [1.06099790e-312, 1.01855798e-312]],

       [[1.23075756e-312, 1.08221785e-312],
        [1.12465777e-312, 9.76118064e-313],
        [1.14587773e-312, 1.90979621e-312]]])

__It’s not safe to assume that np.empty will return an array of all
zeros. In some cases, it may return uninitialized “garbage” values.__

arange is an array-valued version of the built-in Python range function:

In [13]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

You can explicitly convert or cast an array from one dtype to another using ndarray’s
astype method:

In [14]:
arr = np.array([1, 2, 3, 4, 5])
arr.dtype

dtype('int32')

In [15]:
float_arr = arr.astype(np.float64)
float_arr.dtype

dtype('float64')

Floating-point numbers to integer dtype, the decimal part will be truncated:

In [16]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr

array([ 3.7, -1.2, -2.6,  0.5, 12.9, 10.1])

In [17]:
arr.astype(np.int32)

array([ 3, -1, -2,  0, 12, 10])

If you have an array of strings representing numbers, you can use astype to convert
them to numeric form:

In [18]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
numeric_strings.astype(float)

array([ 1.25, -9.6 , 42.  ])

__Calling astype always creates a new array (a copy of the data), even
if the new dtype is the same as the old dtype.__

## Arithmetic with NumPy Arrays

Any arithmetic
operations between equal-size arrays applies the operation element-wise:


In [19]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [20]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [21]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

Arithmetic operations with scalars propagate the scalar argument to each element in
the array:

In [22]:
1/arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [23]:
arr ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

In [24]:
arr ** 2

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

Comparisons between arrays of the same size yield boolean arrays:

In [25]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr2

array([[ 0.,  4.,  1.],
       [ 7.,  2., 12.]])

In [26]:
arr2 > arr

array([[False,  True, False],
       [ True, False,  True]])

## Basic Indexing and Slicing

One-dimensional arrays are simple; on
the surface they act similarly to Python lists:


In [27]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [28]:
arr[5]

5

In [29]:
arr[5:8]

array([5, 6, 7])

In [30]:
arr[5:8] = 12
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

If you assign a scalar value to a slice, as in arr[5:8] = 12, the value is
propagated (or broadcasted henceforth) to the entire selection. An important first distinction from Python’s built-in lists is that array slices are views on the original array.
This means that the data is not copied, and any modifications to the view will be
reflected in the source array.

In [31]:
arr_slice = arr[5:8]
arr_slice

array([12, 12, 12])

Now, when I change values in arr_slice, the mutations are reflected in the original
array arr:

In [32]:
arr_slice[1] = 12345
arr

array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,
           9])

The “bare” slice [:] will assign to all values in an array:


In [33]:
arr_slice[:] = 64
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

__If you want a copy of a slice of an ndarray instead of a view, you
will need to explicitly copy the array—for example,
arr[5:8].copy().__

With higher dimensional arrays, you have many more options. In a two-dimensional
array, the elements at each index are no longer scalars but rather one-dimensional
arrays:

In [35]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[2]

array([7, 8, 9])

Individual elements can be accessed recursively. But that is a bit too much
work, so you can pass a comma-separated list of indices to select individual elements.
So these are equivalent:

In [36]:
arr2d[0][2]

3

In [37]:
arr2d[0, 2]

3

### Indexing with slices

In [38]:
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

In [39]:
arr[1:6]

array([ 1,  2,  3,  4, 64])

In [40]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [41]:
# select the first two rows of arr2d
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

You can pass multiple slices just like you can pass multiple indexes:

In [42]:
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

A colon by itself means to take the entire axis, so you can slice only higher dimensional axes by doing:

In [43]:
arr2d[:, :1]

array([[1],
       [4],
       [7]])

assigning to a slice expression assigns to the whole selection:

In [44]:
arr2d[:2, 1:] = 0
arr2d

array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

### Boolean Indexing

In [45]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = np.random.randn(7, 4)

In [46]:
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [47]:
data

array([[ 0.57587871, -0.96054844, -1.80112814, -0.52330554],
       [-1.53592071, -1.09186493,  1.17565671, -0.46708226],
       [ 0.10106781,  0.12712255, -1.06316659, -0.54279639],
       [ 1.49183943,  0.54156035, -0.56698271,  0.57058497],
       [-2.00228314,  0.18957217, -0.60499269,  0.68012445],
       [ 2.18616023,  1.47389489,  1.12811454, -1.13350162],
       [-0.95563486,  0.76947963,  0.06908691,  1.54412095]])

In [48]:
names == 'Bob'

array([ True, False, False,  True, False, False, False])

In [49]:
data[names == 'Bob']

array([[ 0.57587871, -0.96054844, -1.80112814, -0.52330554],
       [ 1.49183943,  0.54156035, -0.56698271,  0.57058497]])

__The boolean array must be of the same length as the array axis it’s indexing.__

In [50]:
data[names == 'Bob', 2:]

array([[-1.80112814, -0.52330554],
       [-0.56698271,  0.57058497]])

In [51]:
data[names == 'Bob', 3]

array([-0.52330554,  0.57058497])

To select everything but 'Bob', you can either use != or negate the condition using ~:


In [52]:
names != 'Bob'

array([False,  True,  True, False,  True,  True,  True])

In [53]:
data[~(names == 'Bob')]

array([[-1.53592071, -1.09186493,  1.17565671, -0.46708226],
       [ 0.10106781,  0.12712255, -1.06316659, -0.54279639],
       [-2.00228314,  0.18957217, -0.60499269,  0.68012445],
       [ 2.18616023,  1.47389489,  1.12811454, -1.13350162],
       [-0.95563486,  0.76947963,  0.06908691,  1.54412095]])

The ~ operator can be useful when you want to invert a general condition:

In [54]:
cond = names == 'Bob'
data[~cond]

array([[-1.53592071, -1.09186493,  1.17565671, -0.46708226],
       [ 0.10106781,  0.12712255, -1.06316659, -0.54279639],
       [-2.00228314,  0.18957217, -0.60499269,  0.68012445],
       [ 2.18616023,  1.47389489,  1.12811454, -1.13350162],
       [-0.95563486,  0.76947963,  0.06908691,  1.54412095]])

Selecting two of the three names to combine multiple boolean conditions, use
boolean arithmetic operators like & (and) and | (or):

In [55]:
mask = (names == 'Bob') | (names == 'Will')
mask

array([ True, False,  True,  True,  True, False, False])

_The Python keywords and and or do not work with boolean arrays.
Use __&__ (and) and __|__ (or) instead._

Selecting data from an array by boolean indexing always creates a copy of the data,
even if the returned array is unchanged.


Setting values with boolean arrays works in a common-sense way. To set all of the
negative values in data to 0 we need only do:

In [56]:
data[data < 0] = 0
data

array([[0.57587871, 0.        , 0.        , 0.        ],
       [0.        , 0.        , 1.17565671, 0.        ],
       [0.10106781, 0.12712255, 0.        , 0.        ],
       [1.49183943, 0.54156035, 0.        , 0.57058497],
       [0.        , 0.18957217, 0.        , 0.68012445],
       [2.18616023, 1.47389489, 1.12811454, 0.        ],
       [0.        , 0.76947963, 0.06908691, 1.54412095]])

Setting whole rows or columns using a one-dimensional boolean array is also easy:

In [57]:
data[names != 'Joe'] = 7
data

array([[7.        , 7.        , 7.        , 7.        ],
       [0.        , 0.        , 1.17565671, 0.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [2.18616023, 1.47389489, 1.12811454, 0.        ],
       [0.        , 0.76947963, 0.06908691, 1.54412095]])

### Fancy Indexing


In [58]:
arr = np.empty((8, 4))
for i in range(8):
    arr[i] = i
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

To select out a subset of the rows in a particular order, you can simply pass a list or
ndarray of integers specifying the desired order:

In [59]:
arr[[4, 3, 0, 6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

Using negative indices selects rows from
the end:

In [60]:
arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

Passing multiple index arrays does something slightly different; it selects a onedimensional array of elements corresponding to each tuple of indices:

In [61]:
arr = np.arange(32).reshape((8, 4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [62]:
arr[[1, 5, 7, 2], [0, 3, 1, 2]]

array([ 4, 23, 29, 10])

## Transposing Arrays and Swapping Axes

Transposing is a special form of reshaping that similarly returns a view on the underlying data without copying anything. 
Arrays have the transpose method and also the
special T attribute:

In [63]:
arr = np.arange(15).reshape((3, 5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [64]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

Computing the inner matrix product using np.dot:

In [65]:
arr = np.random.randn(6, 3)
arr

array([[-0.63836273, -0.93063095,  0.15818949],
       [ 0.51526449, -0.39477328,  1.332353  ],
       [ 0.4341722 , -1.20435669,  0.09904081],
       [-0.1232145 ,  0.72476486, -0.24307986],
       [ 1.0715036 , -0.28111401, -2.31792807],
       [ 0.84542993,  0.67887873, -0.95425324]])

In [66]:
np.dot(arr.T, arr)

array([[ 2.7395635 ,  0.05119746, -2.63193886],
       [ 0.05119746,  3.53758047, -0.96486979],
       [-2.63193886, -0.96486979,  8.15247513]])

For higher dimensional arrays, transpose will accept a tuple of axis numbers to per‐
mute the axes (for extra mind bending):

In [67]:
arr = np.arange(16).reshape((2, 2, 4))
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [68]:
arr.transpose((1, 0, 2))

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

_Here, the axes have been reordered with the second axis first, the first axis second,
and the last axis unchanged._

ndarray has the method
swapaxes, which takes a pair of axis numbers and switches the indicated axes to rearrange the data:


In [69]:
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [70]:
arr.swapaxes(1, 2)

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])