# TI3130: NumPy Basics Lab &mdash; Tutorial

**Julián Urbano &mdash; November 2021**

## 1. Introduction

Python is designed as a general purpose programming language, which means that it is not very well suited for data programming right out of the box. Package [NumPy](https://numpy.org/) and others offer different functionality to make it easy to work with data. NumPy in particular offers a powerful multidimensional array implementation in the form of the [`ndarray` class](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html). This will be the main subject of this lab.

If you choose Anaconda NumPy is included, but you can always install it through console:

```
$ conda install jupyter numpy 
```

Let's first import NumPy with the standard alias `np`.

In [3]:
import sys
import numpy as np
print("python ", sys.version, 
      "\nnumpy ", np.__version__)

python  3.9.7 (default, Sep 16 2021, 16:59:28) [MSC v.1916 64 bit (AMD64)] 
numpy  1.21.2


## 2. NumPy Arrays

We know that a regular Python list can be instantiated as follows:

In [4]:
list1 = [1, 2, 3, 4]
type(list1)

list

We can easily turn this list into a NumPy array:

In [5]:
array1 = np.array(list1)
array1

array([1, 2, 3, 4])

In [6]:
type(array1)

numpy.ndarray

In [7]:
print(array1)

[1 2 3 4]


Note that the type of `array1` is no longer `list` but `ndarray`, that is, an n-dimensional array. We can of course define arrays of more than one dimension by specifying arrays of arrays:

In [8]:
list2 = [list1, [11, 22, 33, 44]]
list2

[[1, 2, 3, 4], [11, 22, 33, 44]]

In [9]:
array2 = np.array(list2)
array2

array([[ 1,  2,  3,  4],
       [11, 22, 33, 44]])

One of the properties of a NumPy array we are most often interested in, is their shape. The shape is a tuple of integers indicating the length of the array in each dimension. A shape of `(3, 4, 7)` corresponds to a 3D array of 3 elements along the first dimension, 4 along the second dimension, and 7 along the third dimension.

In [10]:
print("array1: ", array1.shape)
print("array2: ", array2.shape)

array1:  (4,)
array2:  (2, 4)


The best way to think about NumPy arrays is that they consist of two parts: 1) a _data buffer_ which is just the block of raw data, and 2) a _view_ which describes the shape of the array. By changing the shape of an array, we can have a different view of the same data, which is useful for instance to take the transpose of a vector or access a subset of elements. To illustrate, we can `reshape` the 2x4 2D array `array2` into a 2x2x2 3D array:

In [11]:
array3 = array2.reshape((2, 2, 2))
print(array3)

[[[ 1  2]
  [ 3  4]]

 [[11 22]
  [33 44]]]


In [12]:
array3.shape

(2, 2, 2)

## 3. Initialization Shortcuts

NumPy has a series of shortcuts to initialize frequently used arrays with constants or sequences, such as:

- An array of 5 uninitialized elements:

In [13]:
np.empty(5)

array([9.23975983e-312, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
       0.00000000e+000])

- An array of 5 zeros:

In [14]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

- An array of 5 ones:

In [13]:
np.ones(5)

array([1., 1., 1., 1., 1.])

- An array of 5 integers in increasing order:

In [14]:
np.arange(5)

array([0, 1, 2, 3, 4])

- An array of integers from 5 to 20, in steps of 2:

In [15]:
np.arange(5, 20, 2)

array([ 5,  7,  9, 11, 13, 15, 17, 19])

- A 5x5 identity matrix:

In [16]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

## 4. Mathematical Operations

We can do basic arithmetics with arrays. These operations are applied element-wise, meaning that the operation is performed on each individual element of the array. They may involve combinations of (shape-compatible) arrays and scalars.

In [17]:
array4 = np.array([[1, 2, 3, 4], [8, 9, 10, 11]])
array4

array([[ 1,  2,  3,  4],
       [ 8,  9, 10, 11]])

In [18]:
array2 * array4

array([[  1,   4,   9,  16],
       [ 88, 198, 330, 484]])

In [19]:
array4 - array1

array([[0, 0, 0, 0],
       [7, 7, 7, 7]])

In [20]:
1 / array4

array([[1.        , 0.5       , 0.33333333, 0.25      ],
       [0.125     , 0.11111111, 0.1       , 0.09090909]])

In [21]:
array4 ** 2

array([[  1,   4,   9,  16],
       [ 64,  81, 100, 121]], dtype=int32)

Linear Algebra operations, like matrix multiplication or dot product, are performed with special NumPy functions, like `np.matmul` or `np.dot`. The official NumPy documentation contains extensive lists of operations that can be performed upon arrays, such as [mathematical operations](https://numpy.org/doc/stable/reference/routines.math.html) or [algebraic operations](https://numpy.org/doc/stable/reference/routines.linalg.html).

In [22]:
print(array2.T)
print(np.matmul(array2.T, array2))
print(array4.sum())

[[ 1 11]
 [ 2 22]
 [ 3 33]
 [ 4 44]]
[[ 122  244  366  488]
 [ 244  488  732  976]
 [ 366  732 1098 1464]
 [ 488  976 1464 1952]]
48


## 5. Indexing Arrays

NumPy allows us to index arrays in three different ways: slicing, integer indexing, and boolean indexing.

Let us use the following 2D array to illustrate.

In [23]:
array5 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
array5

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

When using **slicing**, we specify what portion of the array we want to access in each dimension. For instance, we can view the first and second rows, second and third columns:

In [24]:
array5[0:2, 1:3]

array([[2, 3],
       [5, 6]])

We can of course access individual items, or even use slicing to assign a new value:

In [25]:
array5[1, 0:2] = [-1, -2]
array5

array([[ 1,  2,  3],
       [-1, -2,  6],
       [ 7,  8,  9]])

One important thing to note is that a slice is just another view of the underlying data buffer. If we change the data in the slice, we are actually changing the data in the underlying data buffer and thus in the orginal array. This is advantageous, but may easily lead to mistakes if overlooked. Consider the following code, and how `array5` is updated when updating the slice `s`:

In [26]:
print(array5)
s = array5[1, 0:2]
s[:] = [4, 5] # update all elements of the slice
print(s)
print(array5)

[[ 1  2  3]
 [-1 -2  6]
 [ 7  8  9]]
[4 5]
[[1 2 3]
 [4 5 6]
 [7 8 9]]


To prevent this, we would need to make a `copy` of the slice, thus creating a new data buffer.

When using **integer indexing**, we index arrays by the indices stored in another array, thus allowing us to view non-consecutive portions of the array. The shape of the result is determined by the shape of the indices.

In [27]:
print(array5[[1, 2, 0]])
print(array5[[1, 2, 0], 0:2]) # broadcast
print(array5[[1, 2, 0], [1, 0, 2]])

[[4 5 6]
 [7 8 9]
 [1 2 3]]
[[4 5]
 [7 8]
 [1 2]]
[5 7 3]


When using **boolean indexing**, we index arrays by boolean indicators, thus allowing us to view the portions of the array that meet some condition. For instance, we can first check which values are smaller than 7, and perhaps even numbers too,

In [28]:
print(array5 < 7)
idx = (array5 < 7) & (array5 % 2 == 0)
print(idx)

[[ True  True  True]
 [ True  True  True]
 [False False False]]
[[False  True False]
 [ True False  True]
 [False False False]]


and use these booleans as indicators to index the array.

In [29]:
print(array5[idx])
array5[idx] = 0
print(array5)

[2 4 6]
[[1 0 3]
 [0 5 0]
 [7 8 9]]


There are many more details about indexing NumPy arrays. Please refer to the [documentation on array indexing](https://numpy.org/doc/stable/reference/arrays.indexing.html) for further information and examples.

## 6. Other Operators

NumPy contains a series of [indexing routines](https://numpy.org/doc/stable/reference/routines.indexing.html), [logic functions](https://numpy.org/doc/stable/reference/routines.logic.html) and [set routines](https://numpy.org/doc/stable/reference/routines.set.html) that will help us further in checking which elements of an array meet certain conditions, and use them for indexing.

When we want to find which elements of an array meet some condition, we can use `np.where`:

In [30]:
array = np.array([1, 2, 3, 2, 5, 3])
idx = np.where(array < 4)
print(idx)
print(array[idx])

(array([0, 1, 2, 3, 5], dtype=int64),)
[1 2 3 2 3]


If we want to check whether all elements of the array meet the condition, or if any of them do, we can use `np.all` and `np.any`, respectively:

In [31]:
print(np.all(array < 4))
print(np.all(array < 10))

print(np.any(array > 4))
print(np.any(array < 0))

False
True
True
False


Sometimes we are interested in the unique values present in an array,

In [32]:
np.unique(array)

array([1, 2, 3, 5])

or in quickly checking which elements of an array are contained in a second array:

In [33]:
np.in1d([0, 1, 2, 3, 4, 5], array)

array([False,  True,  True,  True, False,  True])