# 4 NumPy

## Introduction

After getting used to how Python works, it is the moment to begin getting our hands dirty with data analysis. We will study two packages: [NumPy](http://www.numpy.org/) is the fundamental numeric computing and linear algebra package in Python, that allows for decent data analysis. We will learn it not only for the data analysis, but more importantly because it will be a package that will be always present in our `import` section as scientists. After NumPy we will go to [Pandas](http://pandas.pydata.org/). Pandas is a dedicated data analysis package with a lot more functionalities than NumPy, making our life much easier in terms of data visualization and manipulation. All the power of Pandas will be completely unleashed in Section 5, where we will see how to visualize information in plots.

As usual, we begin importing the necessary packages

In [None]:
import numpy as np
print('NumPy:', np.__version__)

NumPy: 1.18.1


## Numeric Python (NumPy)

NumPy is an open-source add-on module to Python that provides common mathematical and numerical routines in pre-compiled, fast functions. These are growing into highly mature packages that provide functionality that meets, or perhaps exceeds, that
associated with common commercial software like MATLAB. The [NumPy](http://www.numpy.org/) (Numeric Python) package provides basic routines for manipulating large arrays and matrices of numeric data. The main object NumPy works with is a *homogeneous multidimensional array*. Despite its intimidating name, these are nothing but tables of numbers, each labelled by a tuple of indices.

We will now explore some capabilities of NumPy that will prove very useful not only for data analyisis, but throughout all our life with Python.

### Creating Arrays

As mentioned before, the main object in NumPy is the *array*. Creating one is as easy as calling the command `array`

In [None]:
# Create a list
mylist = [1, 2, 3, 4, 5, 6, 7, 8, 9]

# Create a numpy array from a list
my1darr = np.array(mylist)

# Data type of a numpy array
print(type(my1darr))

my1darr

The same applies to multidimensional arrays

In [None]:
# Create a numpy array with three elemets, with three elements each.
my2darr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
my2darr

In [None]:
# Create a numpy array with three elemets, with three elements each, with a single element each
my3darr = np.array([  [ [1], [2], [3] ], [ [4], [5], [6] ], [ [7], [8], [9] ]  ])
my3darr

A NumPy array has a number of dimensions (or *axes*). To obtain the number of axes and the size of each of them you use the command `shape`. For 2-dimensional arrays (matrices), the order corresponds to (rows, columns)

There are two different ways of calling the `shape` command, either with `np.shape(arr)` or `arr.shape`. This is not the only command that works in both formats, and we will be finding some more in our way.

In [None]:
# Shape using `.shape`
print(my1darr.shape)
print(my2darr.shape)
print(my3darr.shape)

# Shape using `np.shape()`
print(np.shape(my1darr))
print(np.shape(my2darr))
print(np.shape(my3darr))

There is one restriction with respect to the use of lists: while you could create lists with data of different type, all the data in an array has to be of the same type, and it will be converted automatically.

In [None]:
# Create a list with floats and strings
lst = [1., 'cat']
print((lst[0]))

# Create a numpy array from a list with mixed data types
arr = np.array(lst)
print((arr[0]))

# The data type of the 1. in the list is the same data type than the 1. in the array?
print( type((lst[0])) == type((arr[0])) )
print(type(lst[0]))
print(type(arr[0]))

(We will go deeper into indexing in a while)

### Special Arrays

Now we review some built-in functions that create matrices commonly used. `ones` and `zeros` return arrays of given shape and type (default is float64), filled with ones or zeros, respectively.

In [None]:
# Create a zero-filled numpy array of shape 3 x 2
np.zeros((3, 2))

In [None]:
# Create a one-filled numpy array of shape 3 x 2 x 4
np.ones((3, 2, 4), dtype=np.int8)

`eye(d)` returns a 2-D, dimension-$d$ array with ones on the diagonal and zeros elsewhere.

In [None]:
# Create an numpy array of shape 3 x 3 with ones in the diagonal
np.eye(3)

`eye` can also create arrays with ones in upper and lower diagonals. To achieve this, you must call `eye(d, d, k)` where $k$ denotes the diagonal (positive for above the center diagonal, negative for below), or `eye(d, k=num)`

In [None]:
# Create an numpy array of shape 5 x 5 with ones in the diagonal shifted-up 2 positions
print(np.eye(5, 5, 2))
print('\n')

# Create an numpy array of shape 4 x 4 with ones in the diagonal shifted-down 1 position
print(np.eye(4, k=-1))

`diag`, depending on the input, either constructs a diagonal array (if the input is a 1-D array) or extracts a diagonal from a matrix (if the input is a 2-D array).

In [None]:
# Create a 2-D array with the array `my1darr` shifted-up 1 position with respect the diagonal 
np.diag(my1darr, 1)

In [None]:
# The the diagonal from the 2-D array `my2darr`
np.diag(my2darr)

In [None]:
# Create a 2-D array with the diagonal from the 2-D array `my2darr` shifted-down 1 position with respect the diagonal 
np.diag(np.diag(my2darr), -1)

`arange(begin, end, step)` returns evenly spaced values within a given interval. Note that the beginning point is included, but not the ending.

In [None]:
# Create range with start at 0, stop before 16, counting up by 2
myrange = np.arange(0, 16, 2)
myrange

**Exercise 1**: Create an array of the first million of odd numbers, both with `arange` and using loops. Try timing both methods to see which one is faster. For that, use `%timeit`.

In [None]:
%timeit np.arange(0, 2000000, 2)

%timeit [i for i in range(2000000) if i % 2 == 0]

Similarly, `linspace(begin, end, points)` returns evenly spaced numbers over a specified interval. Here, instead of specifying the step, you specify the amount of points you want. Also with `linspace` you include the ending of the interval.

In [None]:
# Create line with start at 0, stop at 14, getting 7 evenly spaced points
myline = np.linspace(0, 14, 8) # 
myline

In [None]:
# Print the length of `myrange` and `myline`
print(len(myrange))
print(len(myline))

`reshape` changes the shape of an array, but not its data. This is another of the commands that can be called before or after the array.

In [None]:
# Reshape `myrange` using `np.reshape()`
print(np.reshape(myrange, (2, 4)))
print(np.reshape(myrange, (4, 2)))

print('\n')

# Reshape `myrange` using `.reshape()`
print(myrange.reshape(2, 4))
print(myrange.reshape(4, 2))

Note however that, in order for these changes to be permanent, you should do a reassignment of the variable

In [None]:
# After the reshapings above, the original array stays being the same
myrange  

In [None]:
mynewrange = myrange.reshape(2, 4)
# Now that we have reassigned it is when it definitely changes shape
mynewrange  

### Combining Arrays

The most general command for combining arrays is `concatenate(arrs, d)`. It takes a list of arrays and concatenates them along axis $d$. Remember your 3-D array:

In [None]:
# Remember the `my3darr` 3-D numpy array
print(my3darr)
print(my3darr.shape)

In [None]:
# Concatenate `my3darr` with itsef multiplied by 10 along dimension 0
conc_along0 = np.concatenate([my3darr, 10 * my3darr], 0)
print(conc_along0)
print(conc_along0.shape)

In [None]:
# Concatenate `my3darr` with itsef multiplied by 10 along dimension 1
conc_along1 = np.concatenate([my3darr, 10 * my3darr], 1)
print(conc_along1)
print(conc_along1.shape)

In [None]:
# Concatenate `my3darr` with itsef multiplied by 10 along dimension 2
conc_along2 = np.concatenate([my3darr, 10 * my3darr], 2)
print(conc_along2)
print(conc_along2.shape)

However, for common combinations there exist special commands. Use `vstack` to stack arrays in sequence vertically (row wise), `hstack` to stack arrays in sequence horizontally (column wise).

In [None]:
# Remember the `my1darr` 1-D numpy array from above
print(my1darr)
print(my1darr.shape)

In [None]:
# Stack `my1darr` vertically and horizontally
print(np.vstack([my1darr, 10 * my1darr]))
print('\n')
print(np.hstack([my1darr, 10 * my1darr]))

In [None]:
# Remember the `my2darr` 2-D numpy array from above
print(my2darr)
print(my2darr.shape)

In [None]:
# Stack `my2darr` vertically and horizontally
print(np.vstack([my2darr, 10 * my2darr]))
print('\n')
print(np.hstack([my2darr, 10 * my2darr]))

### Operations

You can perform easily element-wise operations on arrays of any shape. Use the typical symbols, +, -, \*, / and \*\* to perform element-wise addition, subtraction, multiplication, division and power.

In [None]:
x = np.array([1, 2, 3])
print(x)
print(x + 10)
print(3 * x)
print(1 / x)
print(x ** (-2 / 3))
print(2 ** x)

Also (and obviously) these symbols can be used to operate between two arrays, which must be of the same shape. If this is the case, they also do element-wise operations

In [None]:
y = np.arange(4, 7, 1)
print(x + y)     # [1+4, 2+5, 3+6]
print(x * y)     # [1*4, 2*5, 3*6]
print(x / y)     # [1/4, 2/5, 3/6]
print(x ** y)    # [1**4, 2**5, 3**6]

For doing vector or matrix multiplication, the command to be used is `dot`

In [None]:
print(x)
print(y)
x.dot(y)  # 1*4 + 2*5 + 3*6

With python 3.5 matrix multiplication got it's own operator

In [None]:
print(x)
print(y)
x @ y  # 1*4 + 2*5 + 3*6

In [None]:
X = np.array([[i + j for i in range(3, 6)] for j in range(3)])
Y = np.diag([1, 1], 1) + np.diag([1], -2)

print(X)
print('\n')
print(Y)

The operator `*` returns the elements-wise multiplication of arrays with the same shape.

In [None]:
X * Y

The operator `@` returns the product of arrays compatible shapes.

In [None]:
X @ Y

**Exercise 2**: Take a 10x2 matrix representing $(x1,x2)$ coordinates and transform them into polar coordinates $(r,\theta)$.

*Hint 1: the inverse transformation is given by $x1 = r\cos\theta$, $x2 = r\sin\theta$*

*Hint 2: generate random numbers with the functions in numpy.random*

In [None]:
z = np.random.random((10, 2))
x1, x2 = z[:, 0], z[:, 1]
R = np.sqrt(x1 ** 2 + x2 ** 2)
T = np.arctan2(x2, x1)
print(R)
print(T)

#### Transposing

Transposition is a very important operation for linear algebra. Although NumPy is capable of correctly doing matrix-vector products correctly regardless of the orientation of the vector, it is not the case for products of matrices

In [None]:
Z = np.arange(0, 12, 1).reshape((4, 3))

In [None]:
print(Z)
print(y)
Z @ y

In [None]:
print(Z)
print(y.T)
Z @ y.T

In [None]:
print(Z)
print(X)
Z @ X  # (4 rows and 3 cols) @ (3 rows and 3 cols) yields (4 rows and 3 cols)

In [None]:
print(Z.T)
print(X)
#Z.T @ X  # (3 rows and 4 cols) @ (3 rows and 3 cols) yields error

#### Array Methods

To not have to go from NumPy arrays to lists back and forth, NumPy contains some functions to know properties of your arrays. Actually, there are more of these functions than in standard Python.

In [None]:
a = np.array([-4, -2, 1, 3, 5])
print(a.max())
print(a.min())
print(a.sum())
print(a.mean())
print(a.std())

Some interesting functions are `argmax` and `argmin`, which return the index of the maximum and minimum values in the array.

In [None]:
print(a.argmax())
print(a.argmin())

### Indexing/Slicing

We have already seen briefly that to access individual elements you use the bracket notation: `array[ax_0, ax_1, ...]`, where the `ax_i` denotes the coordinate in the `i`-th axis. You can even use this to assign new values to your elements.

In [None]:
r = [4, 5, 6, 7]
print(r[2])
# Reassigning the value stored in index 0
r[0] = 198
r

To select a range of rows or columns you can use a colon `:`. A second `:` can be used to indicate the step size. `array[start:stop:stepsize]`. If you leave `start` (`stop`) blank, the selection will go from the very beginning (until the very end) of the array

In [None]:
s = np.arange(13)**2
print(s)

# Starting by index 2, until index 10, pick each element
print(s[3:10])

# Starting by index 2, until index 10, pick every 3 elements 
print(s[2:10:3])

# Starting by index -5, until the end, pick every 2 elements
print(s[-5::2])

# Starting by index -5, until the end, pick every 2 elements backwards
print(s[-5::-2])

The same applies to matrices or higher-dimensional arrays

In [None]:
r = np.arange(36).reshape((6, 6))
r

In [None]:
# Pick rows 2 to 5, and cols 1 to 3 (remember, row 5 and col 3 not included)
r[2:5, 1:3]

You can also select specific rows and columns, separated by commas

In [None]:
# Pick rows 1, 3 and 4, and cols 1 to 3 (remember, col 3 not included)
r[[1, 3, 4], 1:3]

A very useful tool is *conditional indexing*, where we apply a function, assignment... only to those elements of an array that satisfy some condition

In [None]:
r[r > 30] = 30
r

**Exercise 3**: Create a random 1-dimensional array, and find which element is closest to 0.7

In [None]:
Z = np.random.uniform(0,1,100)
z = 0.7
m = Z[np.abs(Z - z).argmin()]
print(m)

#### Copying Data

**Be very careful with copying and modifying arrays in NumPy!** You will see the reason right now. Let's begin defining `r2` as a slice of r

In [None]:
r2 = r[:3,:3]
r2

And now let's set all its elements to zero

In [None]:
r2[:] = 0
r2

When looking at `r`, we see that it has also been changed!

In [None]:
r

The proper way of handling selections without modifying the original arrays is through the `copy` command.

In [None]:
r_copy = r.copy()
r_copy

Now we can safely modify `r_copy` without affecting `r`.

In [None]:
r_copy[:] = 10
print('f{r_copy}\n')
r

### Iterating Over Arrays

Finally, you can iterate over arrays in the same way as you iterate over lists

In [None]:
test = np.random.randint(0, 10, (4,3))
test

You can iterate by row:

In [None]:
for row in test:
    print(row)

Or by row index

In [None]:
for i in range(len(test)):
    print(test[i])

Or by row and index:

In [None]:
for i, row in enumerate(test):
    print(f'Row {i} is {row}')

In the same way as with lists, you can use `zip` to iterate over multiple iterables.

In [None]:
test2 = test**2
test2

In [None]:
for i, j in zip(test, test2):
    print(f'{i} + {j} = {i+j}')

**Exercise 4**: Create a function that iterates over the columns of a 2-dimensional array

In [None]:
def iterate(df):
    for i, row in enumerate(df):
        shp = row.shape
        row.shape = shp + (1,)
        print(f'Column {i} is {row}')

iterate(test.T)

### Loading and Saving Data

To load and save data NumPy has the `loadtxt` and `savetxt` commands. However, they only work for two-dimensional arrays

In [None]:
np.savetxt('numpytest.txt', test)
np.loadtxt('numpytest.txt')