NumPy is an open source library available in Python that aids in mathematical, scientific, engineering, and data science programming.
NumPy is a programming language that deals with multi-dimensional arrays and matrices. Arrays are similar to lists in Python, except that every element of an array must be of the same type, typically a numeric type like float or int. Arrays make operations with large amounts of numeric data very fast and are generally much more efficient than lists.

On top of the arrays and matrices, NumPy supports a large number of mathematical operations

# Importing

In [38]:
 %pip install numpy

Note: you may need to restart the kernel to use updated packages.


In [6]:
# import numpy
import numpy as np

In [40]:
# !pip install numpy

In [41]:
np.__version__

'1.22.1'

To display NumPy's built-in documentation, you can use this:

In [42]:
#?np

# Python List vs Numpy Array
Because of Python's dynamic typing, we can create heterogeneous lists:

In [43]:
l1 = []
for item in L:
  l1.append(type(item))

NameError: name 'L' is not defined

In [None]:
l1

In [4]:
L = [True, "2", 3.0, 4]
print(L[0])
print([type(item) for item in L])
L[-1]

True
[<class 'bool'>, <class 'str'>, <class 'float'>, <class 'int'>]


4

This flexibility comes at a cost: to allow these flexible types, each item in the list must contain its own type info, reference count, and other information–that is, each item is a complete Python object. In the special case that all variables are of the same type, much of this information is redundant: it can be much more efficient to store data in a fixed-type array. The difference between a dynamic-type list and a fixed-type (NumPy-style) array is illustrated in the following figure:

![alt text](https://jakevdp.github.io/PythonDataScienceHandbook/figures/array_vs_list.png)

At the implementation level, the array essentially contains a single pointer to one contiguous block of data. The Python list, on the other hand, contains a pointer to a block of pointers, each of which in turn points to a full Python object like the Python integer we saw earlier. Again, the advantage of the list is flexibility: because each list element is a full structure containing both data and type information, the list can be filled with data of any desired type. Fixed-type NumPy-style arrays lack this flexibility, but are much more efficient for storing and manipulating data.

# NumPy Arrays
The central feature of NumPy is the array object class. Arrays are similar to lists in Python, except that every element of an array must be of the same type, typically a numeric type like float or int. Arrays make operations with large amounts of numeric data very fast and are generally much more efficient than lists.

Creating Arrays

In [1]:
np.array([])

NameError: name 'np' is not defined

In [7]:
# Manual construction of 1-D arrays
one_D = np.array([0, 1, 2, 3])
print(one_D)
type(one_D)

[0 1 2 3]


numpy.ndarray

In [3]:
# Check number of array dimensions
one_D.ndim

NameError: name 'one_D' is not defined

Unlike Python lists, NumPy is constrained to arrays that all contain the same type. If types do not match, NumPy will upcast if possible (here, integers are up-cast to floating point):

In [8]:
np.array([3.14, 4, 2, 3])

array([3.14, 4.  , 2.  , 3.  ])

If we want to explicitly set the data type of the resulting array, we can use the **dtype** keyword:

In [9]:
np.array([1, 2, 3, 4], dtype='float128')

array([1., 2., 3., 4.], dtype=float128)

In [48]:
# Manual construction of 2 and 3-D arrays
two_D = np.array([[0, 1, 2],
                  [3, 4, 5]])
print(two_D.ndim)
three_D = np.array([
                    [[1], [2]],
                    [[3], [4]]
                    ])
print(three_D.ndim)

2
3


In [50]:
 #n1 * n2 * n3 * n4 * n5

In [51]:
# Shape of array
print(two_D.shape)
print(three_D.shape)
print(three_D)
# print(type(three_D))

(2, 3)
(2, 2, 1)
[[[1]
  [2]]

 [[3]
  [4]]]


In [52]:
# returns the size of the first dimension
len(two_D)

2

## Functions for creating arrays
In practice, we rarely enter items one by one

In [56]:
np.zeros(5)
np.zeros((2,2))

array([[0., 0.],
       [0., 0.]])

In [54]:
np.zeros(5, dtype=int)

array([0, 0, 0, 0, 0])

In [57]:
np.ones((2, 4), dtype=float)

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [11]:
np.full((2, 4), 3.14)

array([[3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14]])

###### Evenly spaced

In [12]:
np.arange(10) # 0 .. n-1 RANGE

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [60]:
np.arange(1, 9, 2) # start, end (exclusive), step

array([1, 3, 5, 7])

In [None]:
# inclusive exclusive

###### By number of points

In [10]:
np.linspace(0, 1, 6)  # start, end, num-points

array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])

In [None]:
# ?np.linspace

In [65]:
np.linspace(0, 1, 6, endpoint=False)

array([0.        , 0.16666667, 0.33333333, 0.5       , 0.66666667,
       0.83333333])

In [66]:
np.ones((3, 3))  # reminder: (3, 3) is a tuple

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [67]:
np.zeros((2, 2))

array([[0., 0.],
       [0., 0.]])

In [71]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [16]:
np.diag(np.array([1, 2, 3, 4, 5, 8]))

array([[1, 0, 0, 0, 0, 0],
       [0, 2, 0, 0, 0, 0],
       [0, 0, 3, 0, 0, 0],
       [0, 0, 0, 4, 0, 0],
       [0, 0, 0, 0, 5, 0],
       [0, 0, 0, 0, 0, 8]])

In [17]:
# Create an uninitialized array of 2x3 size
np.empty([2, 3], dtype = int)

array([[1, 2, 3],
       [4, 5, 8]])

In [None]:
# 

# Random numbers

In [18]:
# create a numpy array of 4 uniformly distributed numbers
np.random.rand(4)       # uniform in [0, 1]

array([0.05028644, 0.27853313, 0.38553728, 0.90299319])

In [21]:
#create a numpy array of 7 normally distributed numbers 
np.random.randn(5)      # Gaussian

array([ 1.1081987 , -0.02484801, -0.85848381,  0.04040281, -0.0327394 ])

In [22]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

array([[0.2440696 , 0.73088994, 0.7272409 ],
       [0.06058888, 0.09789026, 0.79232486],
       [0.65346337, 0.66533473, 0.66151075]])

In [23]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(10, 5, (3, 3))

array([[13.33914199,  6.72020821, 13.0685127 ],
       [ 6.47589174, 10.64797535,  3.50712612],
       [ 9.5145263 ,  4.05944847, 12.67874074]])

In [87]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

array([[7, 1, 4],
       [7, 5, 8],
       [9, 7, 4]])

# NumPy Standard Data Types

NumPy arrays contain values of a single type, so it is important to have detailed knowledge of those types and their limitations. Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other related languages.

The standard NumPy data types are listed in the following table. Note that when constructing an array, they can be specified using a string:

In [88]:
np.zeros(10, dtype='int16')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

In [89]:
# or
np.zeros(10, dtype=np.int16)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

| Data type	    | Description |
|---------------|-------------|
| ``bool_``     | Boolean (True or False) stored as a byte |
| ``int_``      | Default integer type (same as C ``long``; normally either ``int64`` or ``int32``)| 
| ``intc``      | Identical to C ``int`` (normally ``int32`` or ``int64``)| 
| ``intp``      | Integer used for indexing (same as C ``ssize_t``; normally either ``int32`` or ``int64``)| 
| ``int8``      | Byte (-128 to 127)| 
| ``int16``     | Integer (-32768 to 32767)|
| ``int32``     | Integer (-2147483648 to 2147483647)|
| ``int64``     | Integer (-9223372036854775808 to 9223372036854775807)| 
| ``uint8``     | Unsigned integer (0 to 255)| 
| ``uint16``    | Unsigned integer (0 to 65535)| 
| ``uint32``    | Unsigned integer (0 to 4294967295)| 
| ``uint64``    | Unsigned integer (0 to 18446744073709551615)| 
| ``float_``    | Shorthand for ``float64``.| 
| ``float16``   | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa| 
| ``float32``   | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa| 
| ``float64``   | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa| 
| ``complex_``  | Shorthand for ``complex128``.| 
| ``complex64`` | Complex number, represented by two 32-bit floats| 
| ``complex128``| Complex number, represented by two 64-bit floats| 

# NumPy Array Attributes

In [92]:
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array
print(x3)

[[[5 7 8 6 1]
  [8 5 6 8 6]
  [7 9 1 8 6]
  [0 1 7 6 6]]

 [[5 0 2 6 7]
  [5 3 2 5 3]
  [6 9 4 9 0]
  [4 1 5 0 7]]

 [[5 0 4 8 6]
  [3 0 5 1 7]
  [7 2 6 9 3]
  [4 7 4 8 5]]]


In [93]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60


In [94]:
print("dtype:", x3.dtype)

dtype: int64


In [None]:
# 64 bit int - 8 byte 
# 32 bit int - 4 byte

Other attributes include `itemsize`, which lists the size (in bytes) of each array element, and `nbytes`, which lists the total size (in bytes) of the array:

In [None]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

In general, we expect that `nbytes` is equal to `itemsize` times `size`.

# Indexing and slicing
The items of an array can be accessed and assigned to the same way as other Python lists.

In [95]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
a[0], a[2], a[-1]

For multidimensional arrays, indexes are tuples of integers.

In [96]:
np.arange(3)

array([0, 1, 2])

In [97]:
a = np.diag(np.arange(3))
a

array([[0, 0, 0],
       [0, 1, 0],
       [0, 0, 2]])

In [None]:
a[2][-1] = 55
a

In [None]:
a[2, 1] = 77 # third line, second column  (for lists a[2][1])
a

Keep in mind, that if we attempt to insert a floating-point value to an integer array, the value will be silently truncated.

In [None]:
a[0, 1] =  3.14
a

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the slice notation, marked by the colon (:) character. The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array x, use this:

`x[start:stop:step]`

## One-dimensional subarrays

In [None]:
a = np.arange(10)
a

In [None]:
a[2:9:3] # [start:end:step]

Note that the last index is not included

In [None]:
a[:4]

In [None]:
a[1:3]

In [None]:
a[::2]

In [None]:
a[3:]

In [None]:
a[::-1]  # all elements, reversed

It is possible to combine assignment and slicing 

In [None]:
a[5:] = 10
a

In [None]:
b = np.arange(5)
b

In [None]:
a[5:] = b[::-1]
a

## Multi-dimensional subarrays
Multi-dimensional slices work in the same way, with multiple slices separated by commas. 

In [None]:


x3 = [[2, 5, 4, 4, 90],
      [6, 7, 16, 17, 13],
      [43, 52, 61, 71, 81]]

In [None]:
x3

In [None]:
# x2 = x3[0]
# print(x2)

In [None]:
x2 = np.array([[1, 3, 4, 6, 7],
      [2, 5, 16, 16, 13],
      [4, 5, 6, 7, 8]])

In [None]:
x2[:2][:3]

In [None]:
x2[:2, :3] # two rows, three columns

In [None]:
x2[:3, ::2]  # all rows, every other column

Finally, subarray dimensions can even be reversed together:

In [None]:
x2[::-1, ::-1]

## Accessing array rows and columns
One commonly needed routine is accessing of single rows or columns of an array. This can be done by combining indexing and slicing, using an empty slice marked by a single colon (:):

In [None]:
print(x2)

In [None]:
print(x2[:, 0])  # first column of x2

In [None]:
print(x2[0, :])  # first row of x2

In [None]:
print(x2[0])  # equivalent to x2[0, :]

## Subarrays as no-copy views
One important and extremely useful thing to know about array slices is that they return views rather than copies of the array data. This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies.

In [None]:
print(x2)

In [None]:
x2_sub = x2[:2, :2]
print(x2_sub)

In [24]:
zzz = np.diag(np.arange(3))

In [25]:
zzz

array([[0, 0, 0],
       [0, 1, 0],
       [0, 0, 2]])

In [27]:
zzz_sub = zzz[:2, :2]
zzz_sub

array([[0, 0],
       [0, 1]])

In [28]:
zzz_sub[0, 0] = 55
zzz_sub

array([[55,  0],
       [ 0,  1]])

In [None]:
zzz

Now if we modify this subarray, we'll see that the original array is changed! 

In [26]:
x2_sub[0, 0] = 99
print(x2_sub)

NameError: name 'x2_sub' is not defined

In [None]:
print(x2)

In [None]:
# checks if a and b share the same memory block
np.may_share_memory(zzz, zzz_sub) 

### Creating copies of arrays
Despite the nice features of array views, it is sometimes useful to instead explicitly copy the data within an array or a subarray. This can be most easily done with the copy() method:

In [None]:
x2 = zzz

In [None]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

In [None]:
x2_sub_copy[0, 0] = 777
print(x2_sub_copy)

In [None]:
# checks if a and b share the same memory block
np.may_share_memory(x2, x2_sub_copy) 

In [None]:
print(x2)

# Modifying Arrays

## Reshaping Arrays

Another useful type of operation is reshaping of arrays. The most flexible way of doing this is with the reshape method. For example, if you want to put the numbers 1 through 9 in a 3×3 grid, you can do the following:

In [98]:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


Note that for this to work, **the size of the initial array must match the size of the reshaped array**. Where possible, the reshape method will use a no-copy view of the initial array, but with non-contiguous memory buffers this is not always the case.

Another common reshaping pattern is the conversion of a one-dimensional array into a two-dimensional row or column matrix. This can be done with the reshape method, or more easily done by making use of the `newaxis` keyword within a slice operation:

In [29]:
np.array([1, 2, 3])

array([1, 2, 3])

In [100]:
x[0, 0]

NameError: name 'x' is not defined

In [30]:
x = np.array([1, 2, 3])
print(x)
print(x.shape)


# row vector via reshape
x = x.reshape((1, 3))
print(x)
print(x.shape)

[1 2 3]
(3,)
[[1 2 3]]
(1, 3)


In [102]:
a = np.array([[1,2,3],
              [2,3,4]])

a = a.flatten()
print(a)
a= a.reshape((3, 2))
print(a)

[1 2 3 2 3 4]
[[1 2]
 [3 2]
 [3 4]]


In [103]:
np.array([1, 2, 3]).shape

(3,)

In [104]:
x = np.array([1, 2, 3])

# row vector via newaxis
x = x[np.newaxis, :]
print(x.shape)

(1, 3)


In [105]:
np.array([1, 2, 3])

array([1, 2, 3])

In [106]:
np.array([1, 2, 3])[np.newaxis, :]

array([[1, 2, 3]])

In [107]:
# column vector via reshape
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

In [108]:
x = np.array([1, 2, 3])

# column vector via newaxis
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

## Resizing
Size of an array can be changed with ndarray.resize.

In [115]:
type((1,))

tuple

In [32]:
a = np.arange(4)
print(a)
a.resize((16))
print(a)

[0 1 2 3]
[0 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0]


In [33]:
a = np.resize(a, 15)
a

array([0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [113]:
a[10:] = 5
a

array([0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 5, 5, 5, 5, 5])

In [None]:
#?np.resize

In [36]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

c= np.dot(a, b)
c

32

## Concatenating Arrays
Concatenation, or joining of two arrays in **NumPy**, is primarily accomplished using the routines `np.concatenate`, `np.vstack`, and `np.hstack`. 

`np.concatenate` takes a tuple or list of arrays as its first argument, as we can see here:

In [116]:
x = np.array([1, 2, 2, 3,, 2, 3, 4, 5, 8, 2, 3, 4, 5, 84, 5, 8, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

In [117]:
# concatenate more than two arrays
z = [99, 88, 77]
print(np.concatenate([x, y, z]))

[ 1  2  3  3  2  1 99 88 77]


It can be also used for 2-D arrays:

In [119]:
grid1 = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [120]:
grid2 = np.array([[7, 8, 9],
                 [6, 6, 6]])

In [121]:
# concatenate along the first axis
# if the axis property is not set, defaults to axis = 0
np.concatenate([grid1, grid2], axis = 0) 

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9],
       [6, 6, 6]])

In [122]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid1, grid2], axis=1)

array([[1, 2, 3, 7, 8, 9],
       [4, 5, 6, 6, 6, 6]])

For working with arrays of mixed dimensions, it can be clearer to use the `np.vstack` (vertical stack) and `np.hstack` (horizontal stack) functions:

In [None]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

In [None]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

Similary, `np.dstack` will stack arrays along the third axis.


## Splitting Arrays
The opposite of concatenation is splitting, which is implemented by the functions `np.split`, `np.hsplit`, and `np.vsplit`. For each of these, we can pass a list of indices giving the split points:

In [37]:
np.array([1, 2, 3, 99, 'a', 3, 2, 1])

array(['1', '2', '3', '99', 'a', '3', '2', '1'], dtype='<U21')

In [38]:
x = [1, 2, 3, 99, 'a', 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

['1' '2' '3'] ['99' 'a'] ['3' '2' '1']


In [39]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [128]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [129]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


Similarly, np.dsplit will stack arrays along the third axis.

# Computation on Arrays: Broadcasting
Recall that for arrays of the same size, binary operations are performed on an element-by-element basis:

In [132]:
a * 3

array([0, 3, 6])

In [133]:
import numpy as np

In [134]:
a = np.array([0, 1, 2])
b = np.array([5, 5, 5])
a + b

array([5, 6, 7])

Broadcasting allows these types of binary operations to be performed on arrays of different sizes–for example, we can just as easily add a scalar (think of it as a zero-dimensional array) to an array:

In [135]:
a + np.array([5, 5, 5])
# 

array([5, 6, 7])

We can think of this as an operation that stretches or duplicates the value `5` into the array `[5, 5, 5]`, and adds the results. The advantage of NumPy's broadcasting is that this duplication of values does not actually take place, but it is a useful mental model as we think about broadcasting.

We can similarly extend this to arrays of higher dimension. Observe the result when we add a one-dimensional array to a two-dimensional array:

In [136]:
M = np.ones((3, 3))
M

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [137]:
a

array([0, 1, 2])

In [138]:
M + a

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

## Rules of Broadcasting
Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:

1. If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
2. If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
3. If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

![alt text](https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png)

The light boxes represent the broadcasted values: again, this extra memory is not actually allocated in the course of the operation.

### Broadcasting example 1 

In [139]:
M = np.ones((2, 3))
a = np.arange(3)

In [141]:
M

array([[1., 1., 1.],
       [1., 1., 1.]])

In [142]:
a

array([0, 1, 2])

In [143]:
M + a

array([[1., 2., 3.],
       [1., 2., 3.]])

Let's consider an operation on these two arrays. The shape of the arrays are

- ``M.shape = (2, 3)``
- ``a.shape = (3,)``

We see by rule 1 that the array ``a`` has fewer dimensions, so we pad it on the left with ones:

- ``M.shape -> (2, 3)``
- ``a.shape -> (1, 3)``

By rule 2, we now see that the first dimension disagrees, so we stretch this dimension to match:

- ``M.shape -> (2, 3)``
- ``a.shape -> (2, 3)``

The shapes match, and we see that the final shape will be ``(2, 3)``:

In [140]:
M + a

array([[1., 2., 3.],
       [1., 2., 3.]])

### Broadcasting example 2

Let's take a look at an example where both arrays need to be broadcast:

In [40]:
a = np.arange(3).reshape((3, 1))
b = np.arange(3)

In [41]:
a

array([[0],
       [1],
       [2]])

In [42]:
b

array([0, 1, 2])

In [43]:
a + b

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

Again, we'll start by writing out the shape of the arrays:

- ``a.shape = (3, 1)``
- ``b.shape = (3,)``

Rule 1 says we must pad the shape of ``b`` with ones:

- ``a.shape -> (3, 1)``
- ``b.shape -> (1, 3)``

And rule 2 tells us that we upgrade each of these ones to match the corresponding size of the other array:

- ``a.shape -> (3, 3)``
- ``b.shape -> (3, 3)``

Because the result matches, these shapes are compatible. We can see this here:

In [44]:
a + b

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

### Broadcasting example 3

Now let's take a look at an example in which the two arrays are not compatible:

In [50]:
M = np.ones((3, 2))
a = np.arange(3)

In [57]:
M+a

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

In [51]:
M

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

In [52]:
a

array([0, 1, 2])

In [53]:
M + a

ValueError: operands could not be broadcast together with shapes (3,2) (3,) 

This is just a slightly different situation than in the first example: the matrix ``M`` is transposed.
How does this affect the calculation? The shape of the arrays are

- ``M.shape = (3, 2)``
- ``a.shape = (3,)``

Again, rule 1 tells us that we must pad the shape of ``a`` with ones:

- ``M.shape -> (3, 2)``
- ``a.shape -> (1, 3)``

By rule 2, the first dimension of ``a`` is stretched to match that of ``M``:

- ``M.shape -> (3, 2)``
- ``a.shape -> (3, 3)``

Now we hit rule 3–the final shapes do not match, so these two arrays are incompatible, as we can observe by attempting this operation:

In [54]:
M + a

ValueError: operands could not be broadcast together with shapes (3,2) (3,) 

In [55]:
M = np.ones((3, 3))
a = np.arange(3)

In [56]:
M + a

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

Note that while we've been focusing on the `+` operator here, these broadcasting rules apply to any binary `ufunc`.

### Practical Example: Centering an Array

We saw that ufuncs allow a NumPy user to remove the need to explicitly write slow Python loops. Broadcasting extends this ability.
One commonly seen example is when centering an array of data.
Imagine you have an array of 10 observations, each of which consists of 3 values. We'll store this in a $10 \times 3$ array:

In [77]:
X = np.random.random((10, 3))
print(X)

[[0.00329795 0.12101653 0.89917996]
 [0.45522319 0.01996995 0.63234474]
 [0.71517148 0.58752482 0.88558215]
 [0.83454448 0.463803   0.35316971]
 [0.2974878  0.70858077 0.30245255]
 [0.02914455 0.16937224 0.40549414]
 [0.13267903 0.72006285 0.99979888]
 [0.96274342 0.32339115 0.55894815]
 [0.80275563 0.50216978 0.36487611]
 [0.00601397 0.12873789 0.1477405 ]]


We can compute the mean of each feature using the ``mean`` aggregate across the first dimension:

In [80]:
Xmean = X.mean(axis=0)
Xmean

array([0.42390615, 0.3744629 , 0.55495869])

And now we can center the ``X`` array by subtracting the mean (this is a broadcasting operation):

In [70]:
X_centered = X - Xmean

In [71]:
X_centered

array([[ 0.39072461, -0.15061629, -0.12984062],
       [ 0.34350386,  0.23800222, -0.2750331 ],
       [-0.25202504, -0.08478044,  0.47125066],
       [-0.02995656,  0.43617073, -0.25909353],
       [-0.23727982, -0.27679448,  0.00324419],
       [ 0.25889723,  0.53569756, -0.12613659],
       [-0.47014577, -0.284654  , -0.15028957],
       [-0.34625547, -0.08441529,  0.56266497],
       [ 0.12858694, -0.16250482,  0.01871465],
       [ 0.21395002, -0.16610519, -0.11548106]])

To double-check that we've done this correctly, we can check that the centered array has near zero mean:

In [72]:
X_centered.mean(axis=0)

array([ 9.99200722e-17, -5.55111512e-17,  2.77555756e-17])

To within machine precision, the mean is now zero.