# Module 12.1: NumPy

Recall that we always import NumPy using the standard `np` abbreviation.

NumPy is a Python package used for numerical calculations, it is mainly used to construct arrays and perform mathematical operations on arrays.

In [4]:
import numpy as np

## Arrays and Python lists

A NumPy array is different from a Python list. The data types stored in a Python list can all be different.

In [2]:
python_list = [ 1, -0.038, 'gear', True]
print(python_list)

[1, -0.038, 'gear', True]


The Python list above contains four different data types: `1` is an integer, `-0.038` is a float, `gear` is a string, and `True` is a boolean.

The values stored in a NumPy array **must all share the same data type**. Consider the NumPy array below:

In [3]:
A = np.array([1.0, 3.1, 5e-04, 0.007])
print(A)

[1.0e+00 3.1e+00 5.0e-04 7.0e-03]


Python list multiply by a scalar -->  **list repetition**

In [4]:
lst = [1, 2, 3, 4]
lst*2

[1, 2, 3, 4, 1, 2, 3, 4]

In [5]:
lst = [1, 2, 3, 4]
for i in range(len(lst)):
    lst[i] = lst[i]*2
lst

[2, 4, 6, 8]

## Regular arrays

In this section we will learn some different methods for creating NumPy arrays.

### Example 1

* [0, 0, 0, ..., 0] (length 10)

In [3]:
# create constant list, default element data type -- floats
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [4]:
# check its size/shape: 1 dimention, 13 elements
np.zeros(13).shape

(13,)

In [5]:
# you can also customize the format
np.zeros(13,dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [7]:
np.zeros((2, 3, 4))

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]])

In [8]:
# 3 dimentions: 2 elements in D1, 3 elements in D2, 4 elements in D3
np.zeros((2, 3, 4)).shape

(2, 3, 4)

We can also do it using lists as well. This next method uses something called **list comprehension**, which we will see a lot more of later this week.

In [3]:
[0 for _ in range(10)]

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Notice I used `_` instead of something like `i` or `j`, because it isn’t actually used anywhere in the code.

This next example uses something called list concatenation.

In [6]:
[0] + [0]

[0, 0]

So if we want to add `[0]` to itself 10 times, we can use `*`.

In [5]:
[0]*10

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

### Example 2

* A five-by-seven matrix containing all 4s.

Here we're trying to make a 5x7 NumPy array of all zeros (then we will add 4 to it).  

**The following is a common mistake.**

In [6]:
np.zeros(5,7)

TypeError: Cannot interpret '7' as a data type

Notice in the documentation that the first argument for `shape` is supposed to be an integer or a tuple of integers.  The second positional argument is called `dtype`, and NumPy is trying to interpret our second input `7` as a `dtype`.  That was our mistake; we should have given a single input argument as a tuple, not two separate input arguments.

In [7]:
help(np.zeros)

Help on built-in function zeros in module numpy:

zeros(...)
    zeros(shape, dtype=float, order='C', *, like=None)
    
    Return a new array of given shape and type, filled with zeros.
    
    Parameters
    ----------
    shape : int or tuple of ints
        Shape of the new array, e.g., ``(2, 3)`` or ``2``.
    dtype : data-type, optional
        The desired data-type for the array, e.g., `numpy.int8`.  Default is
        `numpy.float64`.
    order : {'C', 'F'}, optional, default: 'C'
        Whether to store multi-dimensional data in row-major
        (C-style) or column-major (Fortran-style) order in
        memory.
    like : array_like, optional
        Reference object to allow the creation of arrays which are not
        NumPy arrays. If an array-like passed in as ``like`` supports
        the ``__array_function__`` protocol, the result will be defined
        by it. In this case, it ensures the creation of an array object
        compatible with that passed in via this arg

Notice, the second argument to `np.zeros` needs to be the data type. The error is telling us that 7 cannot be interpreted as a data type. We need to pass the dimensions of our array as a tuple.

In [8]:
np.zeros((5,7))

array([[0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0.]])

Once we have the array of all zeros, it's easy to get the array of all 4s.  We just add 4, and NumPy automatically "broadcasts" the 4 (like the "element-wise operation" in MATLAB) to all of the entries in the array.

In [7]:
np.zeros((5,7))+4

array([[4., 4., 4., 4., 4., 4., 4.],
       [4., 4., 4., 4., 4., 4., 4.],
       [4., 4., 4., 4., 4., 4., 4.],
       [4., 4., 4., 4., 4., 4., 4.],
       [4., 4., 4., 4., 4., 4., 4.]])

Here is a similar strategy, using the `ones` function instead of the `zeros` function.

In [8]:
np.ones((5,7))

array([[1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1.]])

In [9]:
np.ones((5,7))*4

array([[4., 4., 4., 4., 4., 4., 4.],
       [4., 4., 4., 4., 4., 4., 4.],
       [4., 4., 4., 4., 4., 4., 4.],
       [4., 4., 4., 4., 4., 4., 4.],
       [4., 4., 4., 4., 4., 4., 4.]])

If we would rather have integers on the inside (notice how the decimal point goes away), we can specify `int` as the `dtype`.

In [12]:
np.ones((5,7), int)*4

array([[4, 4, 4, 4, 4, 4, 4],
       [4, 4, 4, 4, 4, 4, 4],
       [4, 4, 4, 4, 4, 4, 4],
       [4, 4, 4, 4, 4, 4, 4],
       [4, 4, 4, 4, 4, 4, 4]])

### Example 3

* [0, ..., 100] (length 5, evenly distributed).

In MATLAB, this would be made using `linspace(0,100,5)`, and NumPy has its own version of `linspace`.

In [13]:
np.linspace(0,100,5)

array([  0.,  25.,  50.,  75., 100.])

Unlike many python functions, the endpoint **is included!**

### Example 4

* The 3x3 matrix $\begin{pmatrix} 0 & 0 & 0 \\ 1 & 1 & 1 \\ 2 & 2 & 2 \end{pmatrix}$

In [13]:
arr = np.zeros((3,3),dtype=int)
arr

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

In [14]:
# get the entire index-1 row (2nd row)
arr[1]

array([0, 0, 0])

In [15]:
for i in range(3):
    arr[i] = i

In [16]:
arr

array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2]])

In [17]:
# transpose of the arr
arr.T

array([[0, 1, 2],
       [0, 1, 2],
       [0, 1, 2]])

### Example 5

* The 2x5 matrix $
\begin{pmatrix} 2 & 5 & 8 & 11 & 14 \\ 17 & 20 & 23 & 26 & 29 \end{pmatrix}
$

Here you will see two different ways of creating this array. The first method will be very similar to MATLAB. The second method will be using techniques unique to NumPy.

**Method 1:** We notice that the difference between consecutive elements is 3. We’ll use a for-loop to create this array.

In [18]:
arr = np.zeros((2,5),dtype=int)
for i in range(2):
    for j in range(5):
        arr[i,j] = 2 + 3*j

In [19]:
arr

array([[ 2,  5,  8, 11, 14],
       [ 2,  5,  8, 11, 14]])

It is close but not correct. Notice that inside the for-loop, our assignment only depends on `j`. It should depend on `i` as well.

==================================================

_**<font color = blue>In-class Exercise 1</font>**_: Correct the above code. 

In [20]:
arr = np.zeros((2,5),dtype=int)
for i in range(2):
    for j in range(5):
        arr[i,j] = 2 + 3*j + 15*i

==================================================

In [21]:
arr

array([[ 2,  5,  8, 11, 14],
       [17, 20, 23, 26, 29]])

**Method 2:** Recall our key python data type `range()`. We will see a very similar function `np.arange` in NumPy.

In [None]:
help(np.arange)

In [22]:
list(range(2,29,3))

[2, 5, 8, 11, 14, 17, 20, 23, 26]

In [23]:
np.arange(2,29,3)

array([ 2,  5,  8, 11, 14, 17, 20, 23, 26])

When using the `np.arange`, the step size **does not have to be integer values.**

In [25]:
np.arange(1,5,0.5)

array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])

In [24]:
range(1,5,0.5) # step needs to be an integer

TypeError: 'float' object cannot be interpreted as an integer

Remember that the right endpoint is usually not included in Python.  Since we want our array to end at 29, we should list some number bigger than 29 (and less than or equal to 32) as our right endpoint.

In [19]:
np.arange(2,30,3)

array([ 2,  5,  8, 11, 14, 17, 20, 23, 26, 29])

Now we will use the `reshape` method to turn this length-10 one-dimensional NumPy array into a 2x5 two-dimensional NumPy array.

In [20]:
np.arange(2,30,3).reshape((2,5))

array([[ 2,  5,  8, 11, 14],
       [17, 20, 23, 26, 29]])

## Random numbers

* Make a length 10 NumPy array of random integers between 0 (inclusive) and 40 (exclusive).

You will often see code like the following in code examples, but this is actually the "old" way of getting random numbers in NumPy.

In [2]:
np.random.randint(0, 40, size=10)

array([30, 35, 37,  3,  0, 18, 12, 17, 38, 16])

In [3]:
help(np.random.randint)

Help on built-in function randint:

randint(...) method of numpy.random.mtrand.RandomState instance
    randint(low, high=None, size=None, dtype=int)
    
    Return random integers from `low` (inclusive) to `high` (exclusive).
    
    Return random integers from the "discrete uniform" distribution of
    the specified dtype in the "half-open" interval [`low`, `high`). If
    `high` is None (the default), then results are from [0, `low`).
    
    .. note::
        New code should use the ``integers`` method of a ``default_rng()``
        instance instead; please see the :ref:`random-quick-start`.
    
    Parameters
    ----------
    low : int or array-like of ints
        Lowest (signed) integers to be drawn from the distribution (unless
        ``high=None``, in which case this parameter is one above the
        *highest* such integer).
    high : int or array-like of ints, optional
        If provided, one above the largest (signed) integer to be drawn
        from the distributi

Look at the note in the help called above! We need to call the integers method of `default_rng`. This is an excellent example of something called Object-Oriented Programming (OOP):
* Instead of calling a function that will generate random numbers, we will create an object that has methods on it which will generate random numbers.
* Here is the link to the `default_rng` help page: https://numpy.org/doc/stable/reference/random/generator.html

Here’s how I actually want you to generate random numbers:

In [5]:
# Creating a random number generator object using np.random.default_rng
rng = np.random.default_rng()

In [6]:
type(rng)

numpy.random._generator.Generator

The note above said to use the `integers` method, which is what we do here.  This is the *modern* way (as of Summer 2022) to make random numbers in NumPy.

In [7]:
# Generating random numbers using the RNG object
arr = rng.integers(0, 40, size=10)

In [11]:
arr

array([34, 24, 37, 16,  4,  1,  8, 21, 19, 39])

For the rest of this section, we will see a variety of examples of different sorts of methods we can use from our random number generator `rng` object.

* Choose 6 of those numbers (with replacement) and put them into a NumPy array.

In [10]:
rng.choice(arr, size=6)

array([23, 30, 23, 30, 30, 39])

* Make a three-by-five NumPy array of random real numbers between -1 and 4.

`rng.random` will always get us random numbers between `[0,1)`.

In [11]:
rng.random(-1, 4, size=(3,5))

TypeError: random() got multiple values for keyword argument 'size'

Notice that the outputs are between 0 and 1.

In [12]:
help(rng.random)

Help on built-in function random:

random(...) method of numpy.random._generator.Generator instance
    random(size=None, dtype=np.float64, out=None)
    
    Return random floats in the half-open interval [0.0, 1.0).
    
    Results are from the "continuous uniform" distribution over the
    stated interval.  To sample :math:`Unif[a, b), b > a` multiply
    the output of `random` by `(b-a)` and add `a`::
    
      (b - a) * random() + a
    
    Parameters
    ----------
    size : int or tuple of ints, optional
        Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
        ``m * n * k`` samples are drawn.  Default is None, in which case a
        single value is returned.
    dtype : dtype, optional
        Desired dtype of the result, only `float64` and `float32` are supported.
        Byteorder must be native. The default value is np.float64.
    out : ndarray, optional
        Alternative output array in which to place the result. If size is not None,
        it

In [15]:
# Check the "Parameters" section in the above documentation. Here (3,5) is the size.
arr = rng.random((3,5))
arr

array([[0.14373169, 0.04151085, 0.85160258, 0.5625753 , 0.77434184],
       [0.09323801, 0.63691547, 0.83783101, 0.60707909, 0.69066845],
       [0.65647018, 0.45466071, 0.77161217, 0.53020513, 0.71808796]])

The random numbers in `arr` span in `[0,1)`.  We want `[-1,4)`, so we eventually want a width of 5.  That is why we will multiply by 5.

In [16]:
arr*5

array([[0.71865844, 0.20755425, 4.25801288, 2.81287651, 3.87170919],
       [0.46619004, 3.18457733, 4.18915507, 3.03539543, 3.45334223],
       [3.28235092, 2.27330357, 3.85806084, 2.65102563, 3.59043979]])

The above array goes between 0 and 5 (`[0,5)`), but we want it to go between -1 and 4, so we subract 1.

In [17]:
arr*5-1

array([[-0.28134156, -0.79244575,  3.25801288,  1.81287651,  2.87170919],
       [-0.53380996,  2.18457733,  3.18915507,  2.03539543,  2.45334223],
       [ 2.28235092,  1.27330357,  2.85806084,  1.65102563,  2.59043979]])

Here is a way to make this array all at once (of course the specific random numbers will be different).

In [18]:
# We can write the above steps in one line
5*rng.random(size=(3,5)) - 1

array([[-0.01973026,  1.61155096,  2.9866675 ,  1.71899874, -0.01933733],
       [ 1.36928355,  2.32786313, -0.00448403, -0.82777187,  0.89141526],
       [ 3.13028253, -0.38060258,  3.05444416,  0.42173001,  1.3847107 ]])

* Make a length 10 NumPy array of random numbers following a normal distribution with mean 2 and standard deviation 0.1.

As just a last example in this section, here is how we can make normally distributed random numbers.

In [19]:
help(rng.normal)

Help on built-in function normal:

normal(...) method of numpy.random._generator.Generator instance
    normal(loc=0.0, scale=1.0, size=None)
    
    Draw random samples from a normal (Gaussian) distribution.
    
    The probability density function of the normal distribution, first
    derived by De Moivre and 200 years later by both Gauss and Laplace
    independently [2]_, is often called the bell curve because of
    its characteristic shape (see the example below).
    
    The normal distributions occurs often in nature.  For example, it
    describes the commonly occurring distribution of samples influenced
    by a large number of tiny, random disturbances, each with its own
    unique distribution [2]_.
    
    Parameters
    ----------
    loc : float or array_like of floats
        Mean ("centre") of the distribution.
    scale : float or array_like of floats
        Standard deviation (spread or "width") of the distribution. Must be
        non-negative.
    size : int o

Notice that the result of the following does seem to be clustered around 2.  That makes sense because we had mean 2 with a relatively low standard deviation (0.1).

In [20]:
rng.normal(2, 0.1, size=10)

array([1.97074003, 2.10991763, 1.98318881, 2.00287628, 2.10707454,
       2.02365598, 1.89343959, 1.97254061, 1.97359416, 1.96238154])

The array is displayed almost like it's two-dimensional, but notice that there is only one set of brackets.  It really is one-dimensional, as we can verify by checking its `shape` attribute.

In [12]:
rng.normal(2, 0.1, size=10).shape

(10,)

## Changing rows and columns

Here, as an example, we use for-loop to generate a 3-by-4 matrix.

In [14]:
arr = np.zeros((4,4), dtype=int)
for i in range(4):
    arr[i] = i

In [15]:
arr

array([[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2],
       [3, 3, 3, 3]])

If we use indexing of the form `arr[i]` where `arr` is a two-dimensional array, this `arr[i]` will represent the i-th row of `arr`.

In [16]:
arr[2]

array([2, 2, 2, 2])

Here we set the second row (the row with an index of 2) to be all 10s.

In [17]:
arr[2] = 10

In [18]:
arr

array([[ 0,  0,  0,  0],
       [ 1,  1,  1,  1],
       [10, 10, 10, 10],
       [ 3,  3,  3,  3]])

Accessing a column is slightly harder.  You should read `arr[:,2]` as indicating "every row, the second column".

In [19]:
arr[:,2] = -4

In [20]:
arr

array([[ 0,  0, -4,  0],
       [ 1,  1, -4,  1],
       [10, 10, -4, 10],
       [ 3,  3, -4,  3]])

The same sort of indexing with a colon could be used with rows.  Here `arr[1,:]` is the same as `arr[1]`.  The `:` in this case can be read as, "every column".

In [21]:
# this is the same as arr[1]
arr[1,:] = 1

In [22]:
arr

array([[ 0,  0, -4,  0],
       [ 1,  1,  1,  1],
       [10, 10, -4, 10],
       [ 3,  3, -4,  3]])

In the following, we are storing the second column with the variable name `v`.  This column is a one-dimensional NumPy array.

In [24]:
# Although arr[:, 2] is a column, v is stored as a row array.
v = arr[:, 2]
print(v)

[-4  1 -4 -4]


Notice how `v` is displayed horizontally.  Looking at `v`, there is no evidence that this corresponds to a column in our array.

In [57]:
v.shape

(4,)

The following code is trying to assign the 3rd row to have the values `[2,10]`.  In this case, we get an error message. 

In [58]:
arr[3] = [2,10]

ValueError: cannot copy sequence with size 2 to array axis with dimension 4

Here is an example of two different shapes where we can broadcast from one to the other.  In this case, `[1,3,4,7]` has shape `(4,)` and `arr[:]` stands for all the entries in the entire array `arr`, which has shape `(4,4)`.  These two shapes are compatible with respect to broadcasting.

In [59]:
arr[3] = [1,3,4,7]

In [60]:
arr

array([[ 0,  0, -4,  0],
       [ 1,  1,  1,  1],
       [10, 10, -4, 10],
       [ 1,  3,  4,  7]])

In [62]:
# Take everything in arr
arr[:] = [1,3,4,7]
arr

array([[1, 3, 4, 7],
       [1, 3, 4, 7],
       [1, 3, 4, 7],
       [1, 3, 4, 7]])

In [64]:
w = np.array([1,3,4,7])
w.shape

(4,)

In [65]:
arr.shape

(4, 4)

If you want it to be broadcasted differently (along the horizontal direction), you can change the shape.

In [67]:
v = np.array([1,3,4,7]).reshape(4,1) # v is a 4-by-1 array
arr[:] = v
arr

array([[1, 1, 1, 1],
       [3, 3, 3, 3],
       [4, 4, 4, 4],
       [7, 7, 7, 7]])

Here is a for-loop version of creating an array `arr`.

In [68]:
arr = np.zeros((3,3),dtype=int)
for i in range(3):
    arr[i] = i

In [69]:
arr

array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2]])

Here is a way that uses broadcasting followed by taking the matrix transpose.

In [2]:
arr = np.zeros((3,3),dtype=int)
arr[:] = np.arange(3)
arr.T

array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2]])

Here is what `arr` looked like before we took the matrix transpose.  We broadcast from `np.arange(3)` to the 3x3 array `arr`.  If we want to actually change `arr`, we need to reassign `arr` after taking the transpose, using `arr = arr.T`.

In [72]:
arr

array([[0, 1, 2],
       [0, 1, 2],
       [0, 1, 2]])

In [76]:
# Recall the method of generating a 3-by-3 array using (vertical) broadcasting
arr = np.zeros((3,3),dtype=int)
arr[:] = np.arange(3)
arr = arr.T
arr

array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2]])

Here is the code for another way (that avoids taking the transpose).  It is a little complicated, so we will break down below what was happening.

In [75]:
arr = np.zeros((3,3),dtype=int)
arr[:] = np.arange(3).reshape((3,1)) # RHS is a 3-by-1 array
arr

array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2]])

Let's look in particular at the right-hand side of `arr[:] = np.arange(3).reshape((3,1))`.

In [32]:
x = np.arange(3).reshape((3,1))

In [33]:
x

array([[0],
       [1],
       [2]])

Notice that these dimensions are compatible: for the right-most dimensions, we have 1 and 3, which are compatible (since one of them is equal to 1), and in the next dimension, we have 3 and 3, which are compatible (since they are equal).

In [34]:
x.shape

(3, 1)

In [35]:
arr.shape

(3, 3)

If we tried to reshape using `(2,1)`, we would get an error, because we can't fit 3 numbers into an array of shape `(2,1)`.

In [77]:
y = np.arange(3).reshape((2,1))

ValueError: cannot reshape array of size 3 into shape (2,1)

A very convenient abbreviation to specify "just 1 column" is to use `.reshape((-1,1))`.  Then NumPy is automatically going to replace the `-1` with **whatever positive integer is necessary to fill in all the numbers.**  In this case, the `-1` gets replaced by `3`.

In [78]:
z = np.arange(3).reshape((-1,1))

In [79]:
z

array([[0],
       [1],
       [2]])

In [80]:
z.shape

(3, 1)

In [81]:
arr

array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2]])

As another example of this `-1` syntax, we get the following length-9 one-dimensional NumPy array.  NumPy automatically replaced the `-1` with a `9` in this case, because there were nine numbers in `arr`.

In [82]:
arr.reshape(-1)

array([0, 0, 0, 1, 1, 1, 2, 2, 2])