# Numpy Tutorial

`numpy` is the core python package for CPU computations on arrays. It is the package for managing arrays that is used by other packages such as `pytorch`, `tensorflow` or `scipy`.

For the official Numpy tutorial, see https://docs.scipy.org/doc/numpy-dev/user/quickstart.html

In [5]:
import numpy as np  # standard practice is to name package `np`

## Creating Numpy Arrays

Create arrays from an n-d list using `np.array`

In [6]:
x = np.array([4,5,6,7])
print(x)

[4 5 6 7]


In [7]:
x = np.array([[3,4,2], [5,6,7]])
print(x)

[[3 4 2]
 [5 6 7]]


Generate random data using functions in `np.random`. It is best practice to seed numpy for consistent results.

In [8]:
np.random.seed(123)
print(np.random.random((9,))) # uniform random 0-1
print(np.random.randint(0,3,(10,))) # random integers [0,1,2]

[ 0.69646919  0.28613933  0.22685145  0.55131477  0.71946897  0.42310646
  0.9807642   0.68482974  0.4809319 ]
[1 0 2 0 1 2 1 0 0 0]


Create special arrays using numpy functions

In [9]:
print(np.eye(4))  # Identity matrix

[[ 1.  0.  0.  0.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  1.  0.]
 [ 0.  0.  0.  1.]]


In [10]:
print(np.zeros((2,4)))  # n-d zeros matrix

[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]


In [11]:
print(np.ones((3,2)))  # n-d ones matrix

[[ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]]


In [12]:
print(np.arange(6))  # 1-d matrix of a range

[0 1 2 3 4 5]


All arrays have properties `dtype` and `shape`. You will probably use `np.int32` or `np.float32` for most situations.

In [13]:
x = np.zeros(shape=(2,4), dtype=np.int32)
print(x.shape)
print(x.dtype)

(2, 4)
int32


## Indexing Arrays

Numpy arrays may be indexed by a single number, a slice, or an array of integers

In [14]:
# Create our matrix
x = np.arange(10) ** 2
print(x)

[ 0  1  4  9 16 25 36 49 64 81]


In [15]:
# Index a single value
print(x[3])

9


In [16]:
# Index using slices
print(x[1:5])  # range
print(x[:4])  # implicit start
print(x[5:]) # implicit end
print(x[::-1]) # all reversed
print(x[8:1:-1]) # reversed slice
print(x[::2]) # every other
print(x[:]) # no-op (but useful syntax when dealing with n-d arrays)

[ 1  4  9 16]
[0 1 4 9]
[25 36 49 64 81]
[81 64 49 36 25 16  9  4  1  0]
[64 49 36 25 16  9  4]
[ 0  4 16 36 64]
[ 0  1  4  9 16 25 36 49 64 81]


In [17]:
# Index using an array
i = np.random.randint(0, 10, (5,)) # random ints [0-9]
print(i)
print(x[i])  # index by array

[4 1 7 3 2]
[16  1 49  9  4]


N-d arrays may be indexed by combinations of the above options.

In [18]:
x=np.random.random((2,5))
y=x[::-1, [3,4,2]] # reverse first dimension and take second dimension at 3, 4 and 2
print(x)
print(x.shape)
print(y)
print(y.shape)

[[ 0.08372648  0.71233018  0.42786349  0.2977805   0.49208478]
 [ 0.74029639  0.35772892  0.41720995  0.65472131  0.37380143]]
(2, 5)
[[ 0.65472131  0.37380143  0.41720995]
 [ 0.2977805   0.49208478  0.42786349]]
(2, 3)


Same indexing syntax can be used for getting and setting values

In [19]:
# set values using indexing syntax
x = np.zeros((4,2))
x[0,1] = 1
x[3,:] = 2
print(x)

[[ 0.  1.]
 [ 0.  0.]
 [ 0.  0.]
 [ 2.  2.]]


In [20]:
# when setting multiple values, shapes must be the same (or broadcastable, see below)
x = np.zeros((3,5))
x[1,:] = np.arange(5)
print(x)

[[ 0.  0.  0.  0.  0.]
 [ 0.  1.  2.  3.  4.]
 [ 0.  0.  0.  0.  0.]]


## Conditional Indexing

There are several ways to search an array or index an array using conditions

In [21]:
# Simple syntax
np.random.seed(123)
x=np.random.random((10,))
print(x)
print(x>0.5)
print(x[x>0.5])

[ 0.69646919  0.28613933  0.22685145  0.55131477  0.71946897  0.42310646
  0.9807642   0.68482974  0.4809319   0.39211752]
[ True False False  True  True False  True  True False False]
[ 0.69646919  0.55131477  0.71946897  0.9807642   0.68482974]


In [22]:
# Complicated syntax (necessary in some situations)
np.random.seed(123)
x=np.random.random((10,))
print(x)
print(np.nonzero(x>0.5)[0])
print(np.take(x, np.nonzero(x>0.5)[0], axis=0))

[ 0.69646919  0.28613933  0.22685145  0.55131477  0.71946897  0.42310646
  0.9807642   0.68482974  0.4809319   0.39211752]
[0 3 4 6 7]
[ 0.69646919  0.55131477  0.71946897  0.9807642   0.68482974]


## Special Indexing

Numpy provides helpers for certain common indexing operations

In [23]:
# Create a random matrix
x = np.random.random((5,5)) 
print(x)
# Get diagonal elements
print(np.diag(x))

[[ 0.34317802  0.72904971  0.43857224  0.0596779   0.39804426]
 [ 0.73799541  0.18249173  0.17545176  0.53155137  0.53182759]
 [ 0.63440096  0.84943179  0.72445532  0.61102351  0.72244338]
 [ 0.32295891  0.36178866  0.22826323  0.29371405  0.63097612]
 [ 0.09210494  0.43370117  0.43086276  0.4936851   0.42583029]]
[ 0.34317802  0.18249173  0.72445532  0.29371405  0.42583029]


In [24]:
# Create a random matrix
x = np.random.random((5,5)) 
print(x)
# Get lower-trianglular indices
tril = np.tril_indices(5, -1)  # lower triangle indices for 5x5 matrix, 1 below diagonal
print(tril)
# Get lower triangle
print(x[tril])
# Set lower triangle
x[tril] = 3
print(x)

[[ 0.31226122  0.42635131  0.89338916  0.94416002  0.50183668]
 [ 0.62395295  0.1156184   0.31728548  0.41482621  0.86630916]
 [ 0.25045537  0.48303426  0.98555979  0.51948512  0.61289453]
 [ 0.12062867  0.8263408   0.60306013  0.54506801  0.34276383]
 [ 0.30412079  0.41702221  0.68130077  0.87545684  0.51042234]]
(array([1, 2, 2, 3, 3, 3, 4, 4, 4, 4]), array([0, 0, 1, 0, 1, 2, 0, 1, 2, 3]))
[ 0.62395295  0.25045537  0.48303426  0.12062867  0.8263408   0.60306013
  0.30412079  0.41702221  0.68130077  0.87545684]
[[ 0.31226122  0.42635131  0.89338916  0.94416002  0.50183668]
 [ 3.          0.1156184   0.31728548  0.41482621  0.86630916]
 [ 3.          3.          0.98555979  0.51948512  0.61289453]
 [ 3.          3.          3.          0.54506801  0.34276383]
 [ 3.          3.          3.          3.          0.51042234]]


In [25]:
# Use mgrid to create one-hot variables
x = np.zeros((5,4,3)) # matrix of shape (5,4,3)
i = np.random.randint(0,3,(5,4,))  # ints of shape (5,4)
mgrid = np.mgrid[:5,:4] # indices for first two dimensions
x[mgrid[0], mgrid[1],[i]] = 1
print(mgrid)
print(x)

[[[0 0 0 0]
  [1 1 1 1]
  [2 2 2 2]
  [3 3 3 3]
  [4 4 4 4]]

 [[0 1 2 3]
  [0 1 2 3]
  [0 1 2 3]
  [0 1 2 3]
  [0 1 2 3]]]
[[[ 0.  1.  0.]
  [ 0.  0.  1.]
  [ 1.  0.  0.]
  [ 0.  0.  1.]]

 [[ 0.  0.  1.]
  [ 0.  1.  0.]
  [ 0.  0.  1.]
  [ 0.  0.  1.]]

 [[ 0.  1.  0.]
  [ 0.  0.  1.]
  [ 1.  0.  0.]
  [ 0.  0.  1.]]

 [[ 0.  0.  1.]
  [ 0.  1.  0.]
  [ 0.  1.  0.]
  [ 0.  0.  1.]]

 [[ 0.  0.  1.]
  [ 1.  0.  0.]
  [ 0.  1.  0.]
  [ 0.  1.  0.]]]


## Reshaping

In [26]:
# Arrays are reshaped keeping row-major order
x=np.random.random((6,))
print(x)
print(np.reshape(x, (3,2)))  # total size must be unchanged
print(np.reshape(x, (2,-1)))  # determine last axis based on total size and first axis

[ 0.59443188  0.55678519  0.15895964  0.15307052  0.69552953  0.31876643]
[[ 0.59443188  0.55678519]
 [ 0.15895964  0.15307052]
 [ 0.69552953  0.31876643]]
[[ 0.59443188  0.55678519  0.15895964]
 [ 0.15307052  0.69552953  0.31876643]]


In [27]:
# Arrays can be transposed easily
x=np.random.random((2,3))
print(x)
print(x.T) # simple transpose
print(np.transpose(x, (1,0))) # syntax for multiple dimensions

[[ 0.6919703   0.55438325  0.38895057]
 [ 0.92513249  0.84167     0.35739757]]
[[ 0.6919703   0.92513249]
 [ 0.55438325  0.84167   ]
 [ 0.38895057  0.35739757]]
[[ 0.6919703   0.92513249]
 [ 0.55438325  0.84167   ]
 [ 0.38895057  0.35739757]]


In [28]:
# Special functions for adding and removing dims
x=np.random.random((2,3,1))
print(np.expand_dims(x, 1).shape)  # add a new dimension
print(np.squeeze(x,2).shape)  # remove a dimension (must be size of 1)

(2, 1, 3, 1)
(2, 3)


## Combining Arrays

In [29]:
# concatenation
# - arbitrary number of arrays
# - all but `axis` must match
print(np.concatenate([np.zeros((3,1)), np.ones((3,2)), np.zeros((3,2))], axis=1))

[[ 0.  1.  1.  0.  0.]
 [ 0.  1.  1.  0.  0.]
 [ 0.  1.  1.  0.  0.]]


In [30]:
# stacking
# - new axis is created
# - all axes must match
print(np.stack([np.zeros((2,)), np.random.random((2,))], axis=0))

[[ 0.          0.        ]
 [ 0.76254781  0.59317692]]


## Math and Linear Algebra

Basic math operators (`+`, `-`, etc.) are overridden with array-based counterparts. Numpy also provides a variety of basic math and linear algebra functions (see docs for specific functions).

In [31]:
x = np.random.random((3,5))
y = np.random.random((5,2))
print(x)
print(y)
print(x / np.sum(x))
print(np.mean(x))
print(np.square(x))
print(np.sqrt(x))
print(np.dot(x, y))

[[ 0.6917018   0.15112745  0.39887629  0.2408559   0.34345601]
 [ 0.51312815  0.66662455  0.10590849  0.13089495  0.32198061]
 [ 0.66156434  0.84650623  0.55325734  0.85445249  0.38483781]]
[[ 0.3167879   0.35426468]
 [ 0.17108183  0.82911263]
 [ 0.33867085  0.55237008]
 [ 0.57855147  0.52153306]
 [ 0.00268806  0.98834542]]
[[ 0.1007552   0.02201364  0.05810142  0.03508374  0.05002875]
 [ 0.07474367  0.09710238  0.01542692  0.01906652  0.04690059]
 [ 0.09636529  0.12330444  0.08058899  0.12446191  0.05605654]]
0.457678160509
[[ 0.47845138  0.02283951  0.1591023   0.05801156  0.11796203]
 [ 0.2633005   0.44438829  0.01121661  0.01713349  0.10367151]
 [ 0.43766737  0.71657279  0.30609369  0.73008905  0.14810014]]
[[ 0.83168612  0.38875114  0.63156654  0.49077072  0.5860512 ]
 [ 0.71632964  0.81647079  0.32543584  0.36179407  0.56743335]
 [ 0.81336605  0.92005773  0.74381271  0.92436599  0.62035297]]
[[ 0.52033646  1.05574201]
 [ 0.38906322  1.1794848 ]
 [ 1.03714875  2.06779859]]


## Broadcasting

Broadcasting rules are important to understand. These are the rules by which arrays with different shapes combine.

In [32]:
# arrays must have same ending dimensions or size of 1 to broadcast
print(np.arange(2)+0.3)
print(np.arange(12).reshape((3,4))+0.01*np.arange(4))
print(np.arange(8).reshape((2,4))+0.01*np.arange(24).reshape((3,2,4)))
print(np.arange(6).reshape((3,1,2))+0.01*np.arange(18).reshape((3,3,2)))

[ 0.3  1.3]
[[  0.     1.01   2.02   3.03]
 [  4.     5.01   6.02   7.03]
 [  8.     9.01  10.02  11.03]]
[[[ 0.    1.01  2.02  3.03]
  [ 4.04  5.05  6.06  7.07]]

 [[ 0.08  1.09  2.1   3.11]
  [ 4.12  5.13  6.14  7.15]]

 [[ 0.16  1.17  2.18  3.19]
  [ 4.2   5.21  6.22  7.23]]]
[[[ 0.    1.01]
  [ 0.02  1.03]
  [ 0.04  1.05]]

 [[ 2.06  3.07]
  [ 2.08  3.09]
  [ 2.1   3.11]]

 [[ 4.12  5.13]
  [ 4.14  5.15]
  [ 4.16  5.17]]]


In [33]:
# mismatched dimensions cause an error
try:
  x=np.random.random((3,3,4))+np.random.random((3,2,4))
except ValueError as e:
  print(e)

operands could not be broadcast together with shapes (3,3,4) (3,2,4) 


## Saving and Loading

Numpy provides functions for saving and loading arrays

In [34]:
# Save single array
x = np.random.random((5,))
print(x)
np.save('tmp.npy', x)
y=np.load('tmp.npy')
print(y)

[ 0.56221838  0.12224355  0.2013995   0.81164435  0.46798757]
[ 0.56221838  0.12224355  0.2013995   0.81164435  0.46798757]


In [35]:
# Save dictionary of arrays
x1 = np.random.random((2,))
y1 = np.random.random((3,))
print(x1)
print(y1)
np.savez('tmp.npz', x=x1, y=y1)
data=np.load('tmp.npz')
print(data['x'])
print(data['y'])

[ 0.80793821  0.00742638]
[ 0.55159273  0.93193215  0.58217546]
[ 0.80793821  0.00742638]
[ 0.55159273  0.93193215  0.58217546]
