# Numpy Tutorial

numpy is the core python package for CPU computations on arrays. It is the package for managing arrays that is used by other packages such as pytorch, tensorflow or scipy.

see https://docs.scipy.org/doc/numpy-1.15.0/user/quickstart.html for comprehensive tutorial.

In [2]:
import numpy as np          # Standard practice to name numpy as np

## Creating Numpy Arrays
Create arrays from an n-d list using np.array

In [3]:
x = np.array([1, 2, 3, 4, 5, 6])
print(x)

[1 2 3 4 5 6]


In [4]:
x = np.array([[1,2,3], [4,5,6]])
print(x)

[[1 2 3]
 [4 5 6]]


Generate random data using functions in np.random. It is best practice to seed numpy for consistent results.

In [5]:
np.random.seed(0)
print(np.random.random((10,))) # random array of size 10 with uniform random values (0-1)
print(np.random.randint(0,3,(10,))) # random integers [0,1,2]

[ 0.5488135   0.71518937  0.60276338  0.54488318  0.4236548   0.64589411
  0.43758721  0.891773    0.96366276  0.38344152]
[2 2 0 1 1 1 1 0 1 0]


Create special arrays using numpy functions

In [6]:
print(np.eye(4))  # Identity matrix

[[ 1.  0.  0.  0.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  1.  0.]
 [ 0.  0.  0.  1.]]


In [7]:
print(np.zeros((2,4)))  # n-d zeros matrix

[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]


In [8]:
print(np.ones((2,4)))  # n-d ones matrix

[[ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]


In [9]:
print(np.arange(9))  # 1-d matrix of a range

[0 1 2 3 4 5 6 7 8]


All arrays have properties dtype and shape. You will probably use np.int32 or np.float32 for most situations.

In [10]:
x = np.zeros(shape=(2,4), dtype=np.int32)
print(x.shape)
print(x.dtype)

(2, 4)
int32


## Indexing Arrays
Numpy arrays may be indexed by a single number, a slice, or an array of integers

In [13]:
# Create a matrix
x = np.arange(10) ** 2
print(x)

[ 0  1  4  9 16 25 36 49 64 81]


In [14]:
# Index a single value
print(x[3])

9


In [17]:
# Index using slices
print(x[1:5])  # range
print(x[:4])  # implicit start
print(x[5:]) # implicit end
print(x[::-1]) # all reversed
print(x[8:1:-1]) # reversed slice
print(x[::2]) # every other

[ 1  4  9 16]
[0 1 4 9]
[25 36 49 64 81]
[81 64 49 36 25 16  9  4  1  0]
[64 49 36 25 16  9  4]
[ 0  4 16 36 64]


In [18]:
# Index using an array
i = np.random.randint(0, 10, (5,)) # random ints [0-9]
print(i)
print(x[i])  # index by array

[3 0 3 5 0]
[ 9  0  9 25  0]


N-d arrays may be indexed by combinations of the above options.

In [19]:
x=np.random.random((2,5))
y=x[::-1, [3,4,2]] # reverse first dimension and take second dimension at 3, 4 and 2
print(x)
print(x.shape)
print(y)
print(y.shape)

[[ 0.97861834  0.79915856  0.46147936  0.78052918  0.11827443]
 [ 0.63992102  0.14335329  0.94466892  0.52184832  0.41466194]]
(2, 5)
[[ 0.52184832  0.41466194  0.94466892]
 [ 0.78052918  0.11827443  0.46147936]]
(2, 3)


Same indexing syntax can be used for getting and setting values

In [20]:
# set values using indexing syntax
x = np.zeros((4,2))
x[0,1] = 1
x[3,:] = 2
print(x)

[[ 0.  1.]
 [ 0.  0.]
 [ 0.  0.]
 [ 2.  2.]]


In [21]:
# when setting multiple values, shapes must be the same (or broadcastable, see below)
x = np.zeros((3,5))
x[1,:] = np.arange(5)
print(x)

[[ 0.  0.  0.  0.  0.]
 [ 0.  1.  2.  3.  4.]
 [ 0.  0.  0.  0.  0.]]


## Conditional Indexing
There are several ways to search an array or index an array using conditions

In [22]:
# Simple syntax
np.random.seed(123)
x=np.random.random((10,))
print(x)
print(x>0.5)
print(x[x>0.5])

[ 0.69646919  0.28613933  0.22685145  0.55131477  0.71946897  0.42310646
  0.9807642   0.68482974  0.4809319   0.39211752]
[ True False False  True  True False  True  True False False]
[ 0.69646919  0.55131477  0.71946897  0.9807642   0.68482974]


There are other indexing options available as well like np.diag(), np.tril_indices(), which might be useful in certain scenarios.

## Reshaping
Reshaping is often required when dealing with matrices. It is often required for making the tensors compatible with a torch module.

In [23]:
# Arrays are reshaped keeping row-major order
x=np.random.random((6,))
print(x)
print(np.reshape(x, (3,2)))  # total size must be unchanged
print(np.reshape(x, (2,-1)))  # determine last axis based on total size and first axis

[ 0.34317802  0.72904971  0.43857224  0.0596779   0.39804426  0.73799541]
[[ 0.34317802  0.72904971]
 [ 0.43857224  0.0596779 ]
 [ 0.39804426  0.73799541]]
[[ 0.34317802  0.72904971  0.43857224]
 [ 0.0596779   0.39804426  0.73799541]]


In [24]:
# Arrays can be transposed easily
x=np.random.random((2,3))
print(x)
print(x.T) # simple transpose
print(np.transpose(x, (1,0))) # syntax for multiple dimensions

[[ 0.18249173  0.17545176  0.53155137]
 [ 0.53182759  0.63440096  0.84943179]]
[[ 0.18249173  0.53182759]
 [ 0.17545176  0.63440096]
 [ 0.53155137  0.84943179]]
[[ 0.18249173  0.53182759]
 [ 0.17545176  0.63440096]
 [ 0.53155137  0.84943179]]


## Basic Math Operations
Basic math operators (+, -, etc.) are overridden with array-based counterparts. Numpy also provides a variety of basic math and linear algebra functions.

In [25]:
x = np.random.random((3,5))
y = np.random.random((5,2))
print(x)
print(y)
print(x / np.sum(x))
print(np.mean(x))
print(np.square(x))
print(np.sqrt(x))
print(np.dot(x, y))

[[ 0.72445532  0.61102351  0.72244338  0.32295891  0.36178866]
 [ 0.22826323  0.29371405  0.63097612  0.09210494  0.43370117]
 [ 0.43086276  0.4936851   0.42583029  0.31226122  0.42635131]]
[[ 0.89338916  0.94416002]
 [ 0.50183668  0.62395295]
 [ 0.1156184   0.31728548]
 [ 0.41482621  0.86630916]
 [ 0.25045537  0.48303426]]
[[ 0.11127628  0.09385316  0.11096725  0.04960646  0.05557071]
 [ 0.03506121  0.04511445  0.09691788  0.01414731  0.06661647]
 [ 0.06618049  0.07582999  0.0654075   0.0479633   0.06548753]]
0.434027998836
[[ 0.52483552  0.37334973  0.52192444  0.10430246  0.13089103]
 [ 0.0521041   0.08626794  0.39813087  0.00848332  0.18809671]
 [ 0.18564272  0.24372498  0.18133144  0.09750707  0.18177544]]
[[ 0.85114941  0.78167993  0.84996669  0.56829474  0.6014887 ]
 [ 0.47776901  0.54195392  0.79434006  0.30348796  0.65855992]
 [ 0.65640137  0.70262728  0.65255673  0.55880338  0.65295582]]
[[ 1.26196602  1.74901105]
 [ 0.57110715  0.88826421]
 [ 0.91822734  1.32640648]]


#### When using PyTorch, we will be dealing with tensors which are conceptually similar to numpy arrays. So, all the operations and many more are applicable to tensors as well. For most of operations, the syntax for both numpy array and tensors are almost same.

In [29]:
import torch
print(torch.zeros((2, 3)))

tensor([[0., 0., 0.],
        [0., 0., 0.]])


In [30]:
print(np.zeros((2, 3)))

[[ 0.  0.  0.]
 [ 0.  0.  0.]]


For some operations, the syntax might change a little bit, but the idea remains the same. For example,

In [35]:
x=np.random.random((9,))
print(x)
print(x.reshape((3, 3)))

[ 0.08319499  0.76368284  0.24366637  0.19422296  0.57245696  0.09571252
  0.88532683  0.62724897  0.72341636]
[[ 0.08319499  0.76368284  0.24366637]
 [ 0.19422296  0.57245696  0.09571252]
 [ 0.88532683  0.62724897  0.72341636]]


In [36]:
x=torch.randn((9,))
print(x)
print(x.view((3, 3)))

tensor([ 0.1901,  1.4221, -1.8460, -0.2221, -0.4631,  0.9286,  0.2544, -0.4203,
         0.5020])
tensor([[ 0.1901,  1.4221, -1.8460],
        [-0.2221, -0.4631,  0.9286],
        [ 0.2544, -0.4203,  0.5020]])
