# __NumPy__

__References:__
- NumPy UltraQuick Tutorial: https://colab.research.google.com/github/google/eng-edu/blob/master/ml/cc/exercises/numpy_ultraquick_tutorial.ipynb
- NumPy Tutorial: https://github.com/ageron/handson-ml2/blob/master/tools_numpy.ipynb
- 20 NumPy Operations That Every Data Scientist Should Know https://towardsdatascience.com/20-numpy-operations-that-every-data-scientist-should-know-fb44bb52bde5


__Introduction:__

Numpy (https://numpy.org/) is the fundamental package for scientific computing with Python. It contains among other things: a powerful N-dimensional array object, sophisticated (broadcasting) functions, useful linear algebra, Fourier transform, and random number capabilities.

__Common terms:__

In NumPy, each dimension is called an __axis__.

The number of axes is called the __rank__.

For example, the  3x4 matrix is an array of rank 2 (it is 2-dimensional).

The first axis has length 3, the second has length 4.

An array's list of axis lengths is called the __shape__ of the array.

For example, the above matrix's shape is (3, 4).

The rank is equal to the shape's length.

The __size__ of an array is the total number of elements, which is the product of all axis lengths (eg. 3*4=12)


## np.array
Use __np.array__ to create a NumPy matrix 

The following call to np.array creates an 8-element vector:

In [4]:
import numpy as np
one_dimensional_array = np.array([1.2, 2.4, 3.5, 4.7, 6.1, 7.2, 8.3, 9.5])
print(one_dimensional_array)

[1.2 2.4 3.5 4.7 6.1 7.2 8.3 9.5]


To create a two-dimensional matrix, specify an extra layer of square brackets.

The following call creates a 3x2 matrix:

In [2]:
two_dimensional_array = np.array([[6, 5], [11, 7], [4, 8]])
print(two_dimensional_array)

[[ 6  5]
 [11  7]
 [ 4  8]]


## np.zeros np.ones np.full np.empty
To populate a matrix with all zeroes, call __np.zeros__

To create a 2D array (ie. a matrix) by providing a tuple with the desired number of rows and columns. 

In [3]:
all_zeros_onedimension = np.zeros(5)
print(all_zeros_onedimension)
all_zeros_twodimension = np.zeros((5,2))
print(all_zeros_twodimension)

[0. 0. 0. 0. 0.]
[[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]


To populate a matrix with all ones, call __np.ones__

In [4]:
all_ones_onedimension = np.ones(5)
print(all_ones_onedimension)
all_ones_twodimension = np.ones((5,2))
print(all_ones_twodimension)

[1. 1. 1. 1. 1.]
[[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]


Examples for shape, ndim (rank), and size:

In [5]:
all_ones_twodimension.shape

(5, 2)

In [6]:
all_ones_twodimension.ndim

2

In [7]:
all_ones_twodimension.size

10

__np.full__  : Creates an array of the given shape initialized with the given value.

In [8]:
np.full((3,4), np.pi) 

array([[3.14159265, 3.14159265, 3.14159265, 3.14159265],
       [3.14159265, 3.14159265, 3.14159265, 3.14159265],
       [3.14159265, 3.14159265, 3.14159265, 3.14159265]])

__np.empty__
An uninitialized 2x3 array (its content is not predictable, as it is whatever is in memory at that point):

In [11]:
empt = np.empty((2,3))
print(empt)
empt[1,1]

[[0.0e+000 1.5e-323 0.0e+000]
 [0.0e+000 0.0e+000 0.0e+000]]


0.0

## np.arrange

With a sequence of numbers and random numbers

In [12]:
sequence_of_integers = np.arange(5, 12)
print(sequence_of_integers)

random_integers_between_50_and_100 = np.random.randint(low=50, high=101, size=(6)) # highest generated integer is one less than the high argument
print(random_integers_between_50_and_100)

random_floats_between_0_and_1 = np.random.random([6]) # values between 0.0 and 1.0
print(random_floats_between_0_and_1) 

float_values = np.arange(1.0, 5.0) # float values
print(float_values) 

float_values = np.arange(1.0, 5.0, 0.5) # you can provide a step parameter
print(float_values) 


[ 5  6  7  8  9 10 11]
[95 89 83 58 62 64]
[0.13540765 0.60346664 0.87322472 0.12829712 0.55219474 0.6302995 ]
[1. 2. 3. 4.]
[1.  1.5 2.  2.5 3.  3.5 4.  4.5]


## np.linspace
Use the linspace function instead of arange when working with floats. The linspace function returns an array containing a specific number of points evenly distributed between two values (note that the maximum value is included, contrary to arange):

In [13]:
print(np.linspace(0, 5/3, 6))

[0.         0.33333333 0.66666667 1.         1.33333333 1.66666667]


For a 1D array, the shape would be (n,) where n is the number of elements in your array.

For a 2D array, the shape would be (n,m) where n is the number of rows and m is the number of columns in your array.

Please note that in 1D case, the shape would simply be (n, ) instead of what you said as either (1, n) or (n, 1) for row and column vectors respectively.

This is to follow the convention that:

For 1D array, return a shape tuple with only 1 element   (i.e. (n,))
For 2D array, return a shape tuple with only 2 elements (i.e. (n,m))
For 3D array, return a shape tuple with only 3 elements (i.e. (n,m,k))
For 4D array, return a shape tuple with only 4 elements (i.e. (n,m,k,j))

and so on.

In [9]:
# sample array
u = np.arange(10)

# get its shape
np.shape(u)    # u.shape
#(10,)

# get array dimension using `np.ndim`
np.ndim(u)
#1

np.shape(np.mean(u))
#()       # empty tuple (to indicate that a scalar is a 0D array).

# check using `numpy.ndim`
np.ndim(np.mean(u))
0

0

## broadcasting
If you want to add or subtract two vectors or matrices, linear algebra requires that the two operands have the same dimensions. Furthermore, if you want to multiply two vectors or matrices, linear algebra imposes strict rules on the dimensional compatibility of operands. Fortunately, NumPy uses a trick called __broadcasting__ to virtually expand the smaller operand to dimensions compatible for linear algebra. 

In [14]:
random_floats_between_2_and_3 = random_floats_between_0_and_1 + 2.0 #uses broadcasting to add 2.0 to the value of every item in the vector
print(random_floats_between_2_and_3)

[2.13540765 2.60346664 2.87322472 2.12829712 2.55219474 2.6302995 ]


In [16]:
random_integers_between_150_and_300 = random_integers_between_50_and_100 * 3 #  relies on broadcasting to multiply each cell in a vector by 3
print(random_integers_between_150_and_300)

[285 267 249 174 186 192]


__First rule__: If the arrays do not have the same rank, then a 1 will be prepended to the smaller ranking arrays until their ranks match.

In [17]:
h = np.arange(5).reshape(1, 1, 5)
print(h)
h + [10, 20, 30, 40, 50]  # same as: h + [[[10, 20, 30, 40, 50]]]

[[[0 1 2 3 4]]]


array([[[10, 21, 32, 43, 54]]])

__Second rule__:
Arrays with a 1 along a particular dimension act as if they had the size of the array with the largest shape along that dimension. The value of the array element is repeated along that dimension.

In [20]:
k = np.arange(6).reshape(2, 3)
print(k)
print(k + [[100], [200]])  # same as: k + [[100, 100, 100], [200, 200, 200]]

[[0 1 2]
 [3 4 5]]
[[100 101 102]
 [203 204 205]]


In [21]:
k + [100, 200, 300]  # after rule 1: [[100, 200, 300]], and after rule 2: [[100, 200, 300], [100, 200, 300]]

array([[100, 201, 302],
       [103, 204, 305]])

In [2]:
k + 1000  # same as: k + [[1000, 1000, 1000], [1000, 1000, 1000]]

NameError: name 'k' is not defined

In [6]:
a = np.array([[1,2,3]])
a.shape

(2, 3)

In [8]:
a = np.array([1,2,3])
a.shape

(3,)

In [7]:
b = np.array([[1],[2],[3]]) # a 'column vector'
b.shape

(3, 1)

__Third rule__: After rules 1 & 2, the sizes of all arrays must match.

In [23]:
try:
    test = k + [33, 44]
    print (test)
except ValueError as e:
    print(e)

operands could not be broadcast together with shapes (2,3) (2,) 


In [24]:
try:
    test = k + [33]
    print (test)
except ValueError as e:
    print(e)

[[33 34 35]
 [36 37 38]]


In [25]:
try:
    test = k + [[33],[44]]
    print (test)
except ValueError as e:
    print(e)

[[33 34 35]
 [47 48 49]]


In [26]:
try:
    test = k + [[33,44]]
    print (test)
except ValueError as e:
    print(e)

operands could not be broadcast together with shapes (2,3) (1,2) 


In [27]:
try:
    test = k + [[33,44],[33,44]]
    print (test)
except ValueError as e:
    print(e)

operands could not be broadcast together with shapes (2,3) (2,2) 


In [28]:
try:
    test = k + [[33,44],[33,44],[33,44]]
    print (test)
except ValueError as e:
    print(e)

operands could not be broadcast together with shapes (2,3) (3,2) 


In [29]:
try:
    test = k + [[33,44,55],[33,44,55]]
    print (test)
except ValueError as e:
    print(e)

[[33 45 57]
 [36 48 60]]


## np.rand
A number of functions are available in NumPy's random module to create ndarrays initialized with random values.

In [30]:
np.random.rand(3,4) 
#3x4 matrix initialized with random floats between 0 and 1 (uniform distribution)

array([[0.4140412 , 0.44716036, 0.71219481, 0.82471148],
       [0.95188033, 0.50110537, 0.8858748 , 0.27287164],
       [0.82408576, 0.85415841, 0.39842833, 0.44376008]])

Here's a 3x4 matrix containing random floats sampled from a univariate normal distribution (Gaussian distribution) of mean 0 and variance 1:

In [31]:
np.random.randn(3,4)

array([[ 1.26990142,  2.30044079, -1.01917088, -0.11369375],
       [-0.50651316,  0.58368117, -0.16175492,  0.33984951],
       [ 0.94770913,  2.60617219, -0.88351474, -0.69602632]])

# Reshaping an array
Changing the shape of an ndarray is as simple as setting its shape attribute. However, the array's size must remain the same.

In [10]:
g = np.arange(24)
print(g)
print("Rank:", g.ndim)
g.shape = (6, 4)
print(g)
print("Rank:", g.ndim)
g.shape = (2, 3, 4)
print(g)
print("Rank:", g.ndim)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
Rank: 1
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]
Rank: 2
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
Rank: 3


# reshape
returns a new ndarray object pointing at the same data. This means that modifying one array will also modify the other

In [None]:
g2 = g.reshape(4,6)
print(g2)
print("Rank:", g2.ndim)

# ravel
returns a new one-dimensional ndarray that also points to the same data

In [None]:
g.ravel()

# Arithmetic operations
Examples of operations are given below. Note that the multiplication is not a matrix multiplication. 

In [None]:
a = np.array([14, 23, 32, 41])
b = np.array([5,  4,  3,  2])
print("a + b  =", a + b)
print("a - b  =", a - b)
print("a * b  =", a * b)
print("a / b  =", a / b)
print("a // b  =", a // b)
print("a % b  =", a % b)
print("a ** b =", a ** b)

# Conditional operators
The conditional operators also apply elementwise:

In [None]:
m = np.array([20, -5, 30, 40])
m < [15, 16, 35, 36]

In [None]:
m < 25  # equivalent to m < [25, 25, 25, 25]

# Mathematical and statistical functions


In [None]:
a = np.array([[-2.5, 3.1, 7], [10, 11, 12]])
print(a)
print("mean =", a.mean())

Note that this computes the mean of all elements in the ndarray, regardless of its shape.

In [None]:
for func in (a.min, a.max, a.sum, a.prod, a.std, a.var):
    print(func.__name__, "=", func())

In [None]:
c=np.arange(24).reshape(2,3,4)
c

In [None]:
c.sum(axis=0)  # sum across matrices

In [None]:
c.sum(axis=1)  # sum across rows

In [None]:
c.sum(axis=(0,2))  # sum across matrices and columns

# Universal functions

In [None]:
a = np.array([[-2.5, 3.1, 7], [10, 11, 12]])
print("Original ndarray")
print(a)
for func in (np.abs, np.sqrt, np.square, np.exp, np.log, np.sign, np.ceil, np.modf, np.isnan, np.cos):
    print("\n", func.__name__)
    print(func(a))

# Array indexing

In [None]:
a = np.array([1, 5, 3, 19, 13, 7, 3])
a[3]

In [None]:
a[2:5]

In [None]:
a[2:-1]

In [None]:
a[:2]

In [None]:
a[2::2]

In [None]:
a[::-1]

In [None]:
a[2:5] = [997, 998, 999]
a

In [None]:
b = np.arange(48).reshape(4, 12)
b

In [None]:
b[1, 2]  # row 1, col 2

In [None]:
b[1, :]  # row 1, all columns

In [None]:
b[:, 1]  # all rows, column 1

In [None]:
b[1:2, :]

## Fancy indexing
You may also specify a list of indices that you are interested in. This is referred to as *fancy indexing*.

In [None]:
b[(0,2), 2:5]  # rows 0 and 2, columns 2 to 4 (5-1)

In [None]:
b[:, (-1, 2, -1)]  # all rows, columns -1 (last), 2 and -1 (again, and in this order)

In [None]:
b[(-1, 2, -1, 2), (5, 9, 1, 9)]  # returns a 1D array with b[-1, 5], b[2, 9], b[-1, 1] and b[2, 9] (again)

## Higher dimensions

In [None]:
c = b.reshape(4,2,6)
c

## Slices are actually views
ndarray slices are actually views on the same data buffer. This means that if you create a slice and modify it, you are actually going to modify the original ndarray as well!

In [None]:
a_slice = a[2:6]
a_slice[1] = 1000
a  # the original array was modified!

In [None]:
a[3] = 2000
a_slice  # similarly, modifying the original array modifies the slice!

# Iterating

In [None]:
c = np.arange(24).reshape(2, 3, 4)  # A 3D array (composed of two 3x4 matrices)
c
for m in c:
    print("Item:")
    print(m)
    
for i in range(len(c)):  # Note that len(c) == c.shape[0]
    print("Item:")
    print(c[i])

If you want to iterate on *all* elements in the `ndarray`, simply iterate over the `flat` attribute:

In [None]:
for i in c.flat:
    if (i>20): 
        print("Item:", i)

# Stacking arrays
It is often useful to stack together different arrays. NumPy offers several functions to do just that. Let's start by creating a few arrays.

In [None]:
q1 = np.full((3,4), 1.0) # fill in a 3 x 4 matrix with 1.0 values 
q2 = np.full((4,4), 2.0) # fill in a 4 x 4 matrix with 2.0 values 
q3 = np.full((3,4), 3.0) # fill in a 3 x 4 matrix with 3.0 values 

## `vstack`
Now let's stack them vertically using `vstack`:

In [None]:
q4 = np.vstack((q1, q2, q3)) # q1, q2 and q3 all have the same shape
q4

In [None]:
q4.shape

## `hstack`
We can also stack arrays horizontally using `hstack`:

In [None]:
q5 = np.hstack((q1, q3)) # q1 and q3 both have 3 rows;  q2 has 4 rows, it cannot be stacked horizontally with q1 and q3
q5

In [None]:
q5.shape 

## `concatenate`
The `concatenate` function stacks arrays along any given existing axis.

In [None]:
q7v = np.concatenate((q1, q2, q3), axis=0)  # Equivalent to vstack
q7v

In [None]:
q7h = np.concatenate((q1, q3), axis=1)  # Equivalent to hstack
q7h

## `stack`
The `stack` function stacks arrays along a new axis. All arrays have to have the same shape.

In [None]:
q8 = np.stack((q1, q3))
q8

In [None]:
q8.shape

# Splitting arrays
Splitting is the opposite of stacking. For example, let's use the `vsplit` function to split a matrix vertically.

First let's create a 6x4 matrix:

In [None]:
r = np.arange(24).reshape(6,4)
r

Now let's split it in three equal parts, vertically:

In [None]:
r1, r2, r3 = np.vsplit(r, 3)
r1

In [None]:
r4, r5 = np.hsplit(r, 2)
r4

There is also a `split` function which splits an array along any given axis. `vsplit` is equivalent to calling `split` with `axis=0`. `hsplit` is equivalent to calling `split` with `axis=1`

# Transposing arrays
The `transpose` method creates a new view on an `ndarray`'s data, with axes permuted in the given order.

For example, let's create a 3D array:

In [None]:
t = np.arange(24).reshape(4,2,3)
t

In [None]:
t1 = t.transpose((1,2,0))
t1

In [None]:
t1.shape

The `T` attribute is equivalent to calling `transpose()` when the rank is ≥2:

In [None]:
t.T

## Matrix multiplication
Let's create two matrices and execute a matrix multiplication

In [11]:
n1 = np.arange(10).reshape(2, 5)
n1

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [12]:
n2 = np.arange(15).reshape(5,3)
n2

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [13]:
n1.dot(n2)

array([[ 90, 100, 110],
       [240, 275, 310]])

## Matrix inverse and pseudo-inverse
Many of the linear algebra functions are available in the `numpy.linalg` module, in particular the `inv` function to compute a square matrix's inverse:

In [15]:
import numpy.linalg as linalg

m3 = np.array([[1,2,3],[5,7,11],[21,29,31]])
m3

array([[ 1,  2,  3],
       [ 5,  7, 11],
       [21, 29, 31]])

In [16]:
linalg.inv(m3)

array([[-2.31818182,  0.56818182,  0.02272727],
       [ 1.72727273, -0.72727273,  0.09090909],
       [-0.04545455,  0.29545455, -0.06818182]])

In [17]:
linalg.pinv(m3)

array([[-2.31818182,  0.56818182,  0.02272727],
       [ 1.72727273, -0.72727273,  0.09090909],
       [-0.04545455,  0.29545455, -0.06818182]])

## Identity matrix
The product of a matrix by its inverse returns the identiy matrix (with small floating point errors):

In [18]:
m3.dot(linalg.inv(m3))

array([[ 1.00000000e+00, -1.11022302e-16,  0.00000000e+00],
       [-1.33226763e-15,  1.00000000e+00, -1.11022302e-16],
       [ 2.88657986e-15,  0.00000000e+00,  1.00000000e+00]])

You can create an identity matrix of size NxN by calling `eye`:

In [None]:
np.eye(3)

## Determinant
The `det` function computes the matrix determinant

In [None]:
linalg.det(m3)  # Computes the matrix determinant

## Solving a system of linear scalar equations
The `solve` function solves a system of linear scalar equations, such as:

* $2x + 6y = 6$
* $5x + 3y = -9$

In [19]:
coeffs  = np.array([[2, 6], [5, 3]])
depvars = np.array([6, -9])
solution = linalg.solve(coeffs, depvars)
solution

array([-3.,  2.])

Let's check the solution:

In [20]:
coeffs.dot(solution), depvars  # yep, it's the same

(array([ 6., -9.]), array([ 6, -9]))

# Saving and loading
NumPy makes it easy to save and load `ndarray`s in binary or text format.

In [None]:
a = np.random.rand(2,3)
a

In [None]:
np.save("my_array", a) # save in binary format

In [None]:
np.savetxt("my_array.csv", a) # tabs as delimiters

In [None]:
np.savetxt("my_array.csv", a, delimiter=",") # set a different delimiter

In [None]:
a_loaded = np.loadtxt("my_array.csv", delimiter=",")
a_loaded