# NumPy Basics

NumPy provides a powerful interface to arrays in one or more dimensions.

In [1]:
import numpy as np    # np is a common abbreviation

## Creating arrays

In [2]:
a = np.arange(10)   # like range() function but returns a NumPy array
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [3]:
b = np.arange(1, 5, 0.5)   # arguments are (start, stop, step)
b

array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])

In [4]:
c = np.linspace(-np.pi, np.pi, 5)    # (start, stop, num_points)
c

array([-3.14159265, -1.57079633,  0.        ,  1.57079633,  3.14159265])

In [5]:
# a diagonal matrix
np.diag([1,2,3])

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

In [6]:
# diagonal with offset from the main diagonal
np.diag([1,2,3], k=2)

array([[0, 0, 1, 0, 0],
       [0, 0, 0, 2, 0],
       [0, 0, 0, 0, 3],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

In [7]:
np.zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [8]:
np.ones((3,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [9]:
c.dtype

dtype('float64')

In [10]:
cc = np.arange(20, dtype=np.float128)

In [11]:
cc.dtype

dtype('float128')

In [12]:
np.eye(10)

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

## Converting sequences (lists etc.) into arrays

In [13]:
l = range(10)
d = np.array(l)
d

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

The array constructor can also take a sequence of sequences (e.g. list of
lists):

In [14]:
l2 = [[1, 3, 5], [2, 4, 6], [3, 30, 300]]
d2 = np.array(l2)
d2

array([[  1,   3,   5],
       [  2,   4,   6],
       [  3,  30, 300]])

In [15]:
d2.dtype

dtype('int64')

In [16]:
d2.shape

(3, 3)

In [17]:
d2.shape[0]

3

## Simple indexing and slicing

NumPy, like Python, uses 0-based indexing, so the first row or column has index
0

In [18]:
d2[1, 2]   # row 1, column 2  ... i.e. second row, third column

6

Slicing:

In [19]:
d2[1, :]   # row 1, all columns

array([2, 4, 6])

In [20]:
d2[:, 2]   # column 2, all rows

array([  5,   6, 300])

Notice that the orientation is not a "column vector" (not vertical). The sliced
array has *dimension* (rank) 1, whereas the original had dimension 2. We will
use *matrix* objects in NumPy later to represent column vectors.

Note that in slicing, we are not making copies, but rather producing views on
the original array.

In [21]:
c = d2[:,2]
c

array([  5,   6, 300])

Now if we change one of the values in d2

In [22]:
d2[2,2] = 5000

In [23]:
d2

array([[   1,    3,    5],
       [   2,    4,    6],
       [   3,   30, 5000]])

In [24]:
c

array([   5,    6, 5000])

We can use reshape() to change array dimensions

In [25]:
d3 = np.arange(12)
d3

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [26]:
d4 = d3.reshape(3,4)
d4

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

## Fancy indexing

"Fancy indexing" uses an array of indices to index into another array. Here's an
example:

In [27]:
d_squared = d**2
d_squared

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

Suppose we want to extract the first, third, and last items from this:

In [28]:
ind = [0, 2, -1]
d_squared[ind]

array([ 0,  4, 81])

Fancy indexing can be used with multiple dimensions too, like this:

In [29]:
d2_squared = d2**2
d2_squared

array([[       1,        9,       25],
       [       4,       16,       36],
       [       9,      900, 25000000]])

In [30]:
rows = [0, 2]
d2_squared[rows, :]   # the given rows, all columns

array([[       1,        9,       25],
       [       9,      900, 25000000]])

We use a similar approach to extract only certain columns:

In [31]:
cols = [0, 1]
d2_squared[:, cols]

array([[  1,   9],
       [  4,  16],
       [  9, 900]])

## *Fancy indexing in multiple dimensions

To combine these, we use this notation:

In [32]:
d2_squared[rows, :]

array([[       1,        9,       25],
       [       9,      900, 25000000]])

The following has a different meaning:

In [33]:
d2_squared[rows, cols]

array([  1, 900])

This extracts items (rows[0], cols[0]) = (0, 0) and (rows[1], cols[1]) = (2, 1)
instead, as a 1d array.

Arrays and other sequences can also be used as indices for "fancy indexing", not
just lists.

## Boolean indexing for selection

Suppose we want to pluck out all array values that match certain criteria. We
can do that using "Boolean indexing".

In [34]:
d_squared

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [35]:
d_squared > 10

array([False, False, False, False,  True,  True,  True,  True,  True,
        True])

This operates elementwise and gives a Boolean array. We can use this as index,
to select elements that match our criteria:

In [36]:
d_squared[d_squared > 10]

array([16, 25, 36, 49, 64, 81])

## uFunc
NumPy defines 'ufuncs' that operate on entire arrays and other sequences (hence
'universal')
Example: sin()

In [37]:
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [38]:
10*np.sin(a)

array([ 0.        ,  8.41470985,  9.09297427,  1.41120008, -7.56802495,
       -9.58924275, -2.79415498,  6.56986599,  9.89358247,  4.12118485])

In [39]:
x = np.arange(3)
x

array([0, 1, 2])

In [40]:
x[:,np.newaxis]

array([[0],
       [1],
       [2]])

In [41]:
x[np.newaxis,:]

array([[0, 1, 2]])

The 'ufuncs' like sin() and sum() can operate over any axis (dimension) in the
array:

In [42]:
d4

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [43]:
d4.sum(axis=0)

array([12, 15, 18, 21])

In [44]:
d4.sum(axis=1)

array([ 6, 22, 38])

## Linear Algebra

In [45]:
v1 = np.array([5,7])
v1

array([5, 7])

In [46]:
v1 * 2

array([10, 14])

In [47]:
v1 + 2

array([7, 9])

In [48]:
A = np.array( [[1,1], [0,1]] )
A

array([[1, 1],
       [0, 1]])

In [49]:
A * 2

array([[2, 2],
       [0, 2]])

In [50]:
A + 2

array([[3, 3],
       [2, 3]])

In [51]:
B = np.array( [[2,0], [3,4]] )
B

array([[2, 0],
       [3, 4]])

In [52]:
A * B           # elementwise product

array([[2, 0],
       [0, 4]])

In [53]:
A.shape, v1.shape

((2, 2), (2,))

In [54]:
A

array([[1, 1],
       [0, 1]])

In [55]:
v1

array([5, 7])

In [56]:
A*v1

array([[5, 7],
       [0, 7]])

In [57]:
A.dot(B)

array([[5, 4],
       [3, 4]])

## Stacking and repeating arrays

Using function `repeat`, `tile`, `vstack`, `hstack`, and `concatenate` we can
create larger vectors and matrices from smaller ones:

### tile and repeat

In [58]:
a = np.array([[1, 2], [3, 4]])
a

array([[1, 2],
       [3, 4]])

In [59]:
# repeat each element 3 times
np.repeat(a, 3)

array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4])

In [60]:
# tile the matrix 3 times
np.tile(a, 3)

array([[1, 2, 1, 2, 1, 2],
       [3, 4, 3, 4, 3, 4]])

In [61]:
A.dot(A.T)

array([[2, 1],
       [1, 1]])

In [62]:
v1.dot(v1)

74

### concatenate

In [63]:
b = np.array([[5, 6]])
b

array([[5, 6]])

In [64]:
np.concatenate((a, b), axis=0)

array([[1, 2],
       [3, 4],
       [5, 6]])

In [65]:
np.concatenate((a, b.T), axis=1)

array([[1, 2, 5],
       [3, 4, 6]])

### hstack and vstack

In [66]:
np.vstack((a,b))

array([[1, 2],
       [3, 4],
       [5, 6]])

In [67]:
np.hstack((a,b.T))

array([[1, 2, 5],
       [3, 4, 6]])

## NumPy exercises

### Creating arrays

Refer to the NumPy Basics chapter in the course notes to answer these questions:

(1a) Create the following array:
$$
A = \left( \begin{array}{cccc}
0 & 1 & 2 & 3 \\
4 & 5 & 6 & 7 \\
8 & 9 & 10 & 11 \end{array} \right)
$$

(1b) Use ``np.tile`` to create an 8x8 array of booleans in a checker-board
pattern:
$$
B = \left( \begin{array}{cccccccc}
0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \\
1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 \\
0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \\
1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 \\
0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \\
1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 \\
0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \\
1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 \\
\end{array} \right)
$$

### Indexing into arrays

(2a) Use slicing to create an array referring to the values in the top $2 \times
2$ corner of $A$

(2b) Use fancy indexing to create another array containing columns 0, 1, and 3
of $A$

(2c) Try changing the values of the new arrays from questions (2a) and (2b).
Which of these arrays share the same data in memory as $A$?

### Array attributes

(3a) Look up the following attributes of an array: data type, number of rows,
number of columns, total number of bytes.

(3b) Look at the ``flags`` attribute of an array. Try setting ``writeable`` to
``False``.

### Boolean indexing and compound selection

(4a) Create an array out of these people's heights in centimetres:

In [68]:
h = {'Fred': 165.02,
     'Guido': 192.72,
     'Raymond': 175.26,
     'Tim': 182.88,
     'Nick': 177.8,
     'Jacob': 180,
     'Armin': 196,
     'Kenneth': 182.88,
     'Ed': 176,
     'Richard': 178,
     'Russell': 172.72,
     'Lennart': 175.2,
     'Larry': 177.8,
     'Brett': 177.8}

(4b) How many people are taller than 175 cm? (Use NumPy boolean indexing.)

(4c) Find the average height of all people who are outside the range of 170 -
180 cm. (Tip: Use boolean indexing with | and the mean() method.)

In [69]:
# %load solutions/numpy_height_selection.py

### More NumPy exercises

See here for more NumPy exercises, graded by difficulty (currently 46):
http://www.loria.fr/~rougier/teaching/numpy.100/