<img src="assets/logo.png">

Made by **Viktor Varga**, **Gulyás János Adrián**, **Ellák Somfai**, **Balázs Nagy**

[<img src="assets/open_button.png">](https://colab.research.google.com/github/Fortuz/edu_MethodsAndTools/blob/main/practices/P02b_Numpy_2.ipynb)

# Python tutorial - Numpy, chapter 2

## Operations performed along certain axes

These operations create new values from slices taken from one or more axis of an array. They all have an `axis` parameter, with which we can give along what axis the operation should be performed.

For example, calling the `np.sum()` operation on a (2,3) sized array with the `axis=0` parameter will sum the elements along the axis with index #0, while calling it with the `axis=1` parameter will do the same along axis #1. In the former case, the resulting array will become (3,) shaped, while in the latter case it will become (2,) shaped. Multiple axes can be given with the `axis` parameter (by listing them in a tuple), in which case those axes will not appear in the result, as summation happens along all the given axes. If we call the `np.sum()` function on the array without the `axis` parameter, it will sum the entire array, resulting in a scalar (this means the function was called with the default `axis=None` argument)

In [1]:
import numpy as np

a = np.arange(6, dtype=np.int32).reshape((2,3))
print("The 2D 'a' array:\n", a)
print("   ... its shape is", a.shape)

print("\nSumming array along axis#0: ", np.sum(a, axis=0))
print("Summing array along axis#1: ", np.sum(a, axis=1))
print("Summing array along axis#0 and axis#1: ", np.sum(a, axis=(0,1)))
print("Summing whole array: ", np.sum(a))

The 2D 'a' array:
 [[0 1 2]
 [3 4 5]]
   ... its shape is (2, 3)

Summing array along axis#0:  [3 5 7]
Summing array along axis#1:  [ 3 12]
Summing array along axis#0 and axis#1:  15
Summing whole array:  15


Other similiar operations: `np.prod()`, `np.mean()`, `np.std()`, `np.amax()`, `np.amin()`

In [2]:
a = np.arange(6, dtype=np.int32).reshape((2,3))
print("The 2D 'a' array:\n", a)
print("   ... its shape is", a.shape)

print("\nProducts of array along axis#1: ", np.prod(a, axis=1))
print("Mean of array along axis#0: ", np.mean(a, axis=0))
print("Standard deviation of whole array: ", np.std(a))

# np.maximum() is elmentwise maximum of multiple arrays
# np.amax() is maximum along axis/axes of a single array

print("\nMaximum and minimum of array along axis#0: ", np.amax(a, axis=0), "and", np.amin(a, axis=0))


The 2D 'a' array:
 [[0 1 2]
 [3 4 5]]
   ... its shape is (2, 3)

Products of array along axis#1:  [ 0 60]
Mean of array along axis#0:  [1.5 2.5 3.5]
Standard deviation of whole array:  1.707825127659933

Maximum and minimum of array along axis#0:  [3 4 5] and [0 1 2]


Logical operations include: `np.all()`, `np.any()`

In [3]:
a = np.arange(6, dtype=np.int32).reshape((2,3))
b = (a % 2 == 1) | (a < 4)
print("The 2D 'b' boolean array:\n", b)
print("   ... its shape is", b.shape)

print("\nLogical OR along axis#0: ", np.any(a, axis=0))
print("Logical OR on whole array: ", np.any(a))
print("Logical AND along axis#1: ", np.all(a, axis=1))
print("Logical AND on whole array: ", np.all(a))



The 2D 'b' boolean array:
 [[ True  True  True]
 [ True False  True]]
   ... its shape is (2, 3)

Logical OR along axis#0:  [ True  True  True]
Logical OR on whole array:  True
Logical AND along axis#1:  [False  True]
Logical AND on whole array:  False


## Advanced indexing

In **Advanced indexing**, similarly to **basic indexing**, we reference multiple elements/slices of an array. However, while in **basic indexing** we can only reference elements in the index-intervals (range) by using one certain step size, in **advanced indexing** there are two other methods available to us.

On one hand, we can use sequences (lists, iterators, integer arrays) for indexing.

Arrays created using **advanced indexing** will always be **copies** of the original array, not views.

In [4]:
a = np.arange(6, dtype=np.int32)+10
print("The 'a' array:", a)
print("   ... its shape is", a.shape)

b = a[[1,3,3,-5]]   # indexing with a list of indices, negative indices are counted from backwards
print("\nThe 'b' array: ", b)

# writing the 'b' array does not modify the original 'a' array
b[0] = 42
print("\nThe modified 'b' array: ", b)
print("The original 'a' array after modifying 'b': ", a)



The 'a' array: [10 11 12 13 14 15]
   ... its shape is (6,)

The 'b' array:  [11 13 13 11]

The modified 'b' array:  [42 13 13 11]
The original 'a' array after modifying 'b':  [10 11 12 13 14 15]


Overwriting elements with advanced indexing:

In [5]:
a = np.arange(6, dtype=np.int32)+10
print("The 'a' array:", a)
print("   ... its shape is", a.shape)

a[[1,5,2]] = 99
print("\nThe modified 'a' array:", a)

The 'a' array: [10 11 12 13 14 15]
   ... its shape is (6,)

The modified 'a' array: [10 99 99 13 14 99]


We can use an integer type array for indexing.

In [6]:
a = np.arange(6, dtype=np.int32)+10
print("The 'a' array:", a)
print("   ... its shape is", a.shape)

idxs = np.array([0,2,0,-1,-4], dtype=np.int32)
print("\nThe 'a' array indexed with 'idxs' array:", a[idxs])


The 'a' array: [10 11 12 13 14 15]
   ... its shape is (6,)

The 'a' array indexed with 'idxs' array: [10 12 10 15 12]


If we index a 1-dimensional array with a multidimensional index array, the result will also be multidimensional.

In [25]:
a = np.arange(6, dtype=np.int32)+10
print("The 'a' array:", a)
print("   ... its shape is", a.shape)

idxs = np.array([[2,3],[5,2],[1,1]], dtype=np.int32)
print("\nThe 'idxs' array:\n", idxs)
print("   ... its shape is", idxs.shape)

b = a[idxs]
print("\nThe 'a' array indexed with the 2D 'idxs' array:\n", b)
print("   ... its shape is", b.shape)

The 'a' array: [10 11 12 13 14 15]
   ... its shape is (6,)

The 'idxs' array:
 [[2 3]
 [5 2]
 [1 1]]
   ... its shape is (3, 2)

The 'a' array indexed with the 2D 'idxs' array:
 [[12 13]
 [15 12]
 [11 11]]
   ... its shape is (3, 2)


If we index a multidimensional array, along each axis we can use different indexing techniques. If we create a new array using indexing and at least along one axis we use advanced indexing, then the new array will be a copy of our specific part of the original array, not a view of it.

If we'd like to use a sequence to index along multiple axes, then the length of our two sequences must be equal. (If we use a multidimensional array for indexing along multiple axes, then the two index arrays must be broadcastable to the same shape).

Full documentation at: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

If we'd like to select a list of elements from a 2 dimensional array, then along two axes we need one equal length sequence each to separately index the #0 and a #1 axis. For example if we'd like to return the `a[0,1], a[2,2], a[0,5]` elements, then we need to index as follows: `a[[0,2,0], [1,2,5]]`.

In [8]:
a = np.arange(6, dtype=np.int32).reshape((2,3))+10
print("The 'a' array:\n", a)
print("   ... its shape is", a.shape)

idxs0 = [0,1,0]
idxs1 = [1,2,2]

b = a[idxs0, idxs1]  # extracting a[0,1], a[1,2], a[0,2] into the 'b' array


print("\nThe 'b' array:", b)
print("   ... its shape is", b.shape)

# we can also use a tuple of lists/arrays/... to do multi-dimensional indexing
# in fact, this is the same thing as above, due to automatic tuple-packing
idxs = (idxs0, idxs1)
b = a[idxs]
print("\nThe 'b' array:", b)
print("   ... its shape is", b.shape)

# advanced indexing only along a single axis: extracting slices from 'a'
c = a[[0,1,0],:]
print("\nThe 'c' array:\n", c)
print("   ... its shape is", c.shape)

# advanced indexing along one axis, basic indexing along the other
d = a[[0,1,0],::2]
print("\nThe 'd' array:\n", d)
print("   ... its shape is", d.shape)

# e = a[[0,1,0],[0,2]]  # but we can't do this:
#   IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,)

# but we can do this: reshape #1 axis index array to (2,1) to be broadcastable to shape (2,3)
#     and match #0 axis index array shape (3,)
# indexed elements this way: [[a[0,0], a[1,0], a[0,0]], [a[0,2], a[1,2], a[0,2]]]
f = a[np.array([0,1,0]), np.array([0,2])[:,None]]
print("\nThe 'f' array:\n", f)
print("   ... its shape is", f.shape)

# NOTE
# There is an inconsistency how two basic indices work vs how two array indexing works.
# If you have two slice indexing, the direct product of their indices are selected
# In contrast, two arrays form a list of pairs, describing the list of integer indices
a = np.arange(9, dtype=np.int32).reshape((3,3))
print("\nThe 'a' array:\n", a)

d1 = a[0::2, 0::2]
print("The 'd1' array:\n", d1)
d2 = a[[0,2], [0,2]]
print("The 'd1' array:\n", d2)

The 'a' array:
 [[10 11 12]
 [13 14 15]]
   ... its shape is (2, 3)

The 'b' array: [11 15 12]
   ... its shape is (3,)

The 'b' array: [11 15 12]
   ... its shape is (3,)

The 'c' array:
 [[10 11 12]
 [13 14 15]
 [10 11 12]]
   ... its shape is (3, 3)

The 'd' array:
 [[10 12]
 [13 15]
 [10 12]]
   ... its shape is (3, 2)

The 'f' array:
 [[10 13 10]
 [12 15 12]]
   ... its shape is (2, 3)

The 'a' array:
 [[0 1 2]
 [3 4 5]
 [6 7 8]]
The 'd1' array:
 [[0 2]
 [6 8]]
The 'd1' array:
 [0 8]


Instead of using multidimensional indexing on a multidimensional array, the array we want to index can be flattened to a single dimension (`ndarray.reshape(-1)`), then the multidimensional indices can be converted into 1-dimensional indices (`np.ravel_multi_index()`), this way the indexing can now be done on the flattened array. This technique can be useful if we want to perform an operation on our multidimensional array that is implemented only for one-dimensional arrays. The array's original shape can be regained after this with `ndarray.reshape(<original_shape>)`. The indexes referencing elements of the flattened array can be converted back to multidimensional indexes (`np.unravel_index()`).

Another method of using **advanced indexing** is **indexing with a boolean mask**. Masking can also happen along one or more axes. The point is that it must be possible to broadcast the mask's size to become compatible with the array's shape along the desired axes.

In [9]:
a = np.arange(6, dtype=np.int32).reshape((2,3))
print("The 'a' array:\n", a)
print("   ... its shape is", a.shape)

mask = a < 4
print("\nThe 'mask' array:\n", mask)
print("   ... its shape is", mask.shape)
print("   ... its data type is", mask.dtype)

b = a[mask]  # this is a copy, not a view!
print("\nThe 'b' array:\n", b)
print("   ... its shape is", b.shape)



The 'a' array:
 [[0 1 2]
 [3 4 5]]
   ... its shape is (2, 3)

The 'mask' array:
 [[ True  True  True]
 [ True False False]]
   ... its shape is (2, 3)
   ... its data type is bool

The 'b' array:
 [0 1 2 3]
   ... its shape is (4,)


Printing elements selected using a mask:

In [10]:
a = np.arange(6, dtype=np.int32).reshape((2,3))
print("The 'a' array:\n", a)
print("   ... its shape is", a.shape)

# Select all elements that are even or larger than 3
a[(a % 2 == 0) | (a > 3)] = 99

print("\nThe modified 'a' array:\n", a)

The 'a' array:
 [[0 1 2]
 [3 4 5]]
   ... its shape is (2, 3)

The modified 'a' array:
 [[99  1 99]
 [ 3 99 99]]


It's possible to use a mask only along the certain axes that we want to.

Let's zero out all the rows of array `a` that have at least one negative number!

In [11]:
a = np.array([[1.2,2.5],[1.,-.6],[2.8,1.7],[-1.5,.7]], dtype=np.float32)
print("The 'a' array:\n", a)
print("   ... its shape is", a.shape)

#a[np.any(a < 0., axis=1),:] = 0

#print("\nThe modified 'a' array:\n", a)
#print("   ... its shape is", a.shape)


The 'a' array:
 [[ 1.2  2.5]
 [ 1.  -0.6]
 [ 2.8  1.7]
 [-1.5  0.7]]
   ... its shape is (4, 2)


In [12]:
a

array([[ 1.2,  2.5],
       [ 1. , -0.6],
       [ 2.8,  1.7],
       [-1.5,  0.7]], dtype=float32)

In [13]:
a<0

array([[False, False],
       [False,  True],
       [False, False],
       [ True, False]])

In [14]:
np.any(a<0, axis=1)

array([False,  True, False,  True])

In [15]:
a[np.any(a < 0., axis=1),:] = 0

print("\nThe modified 'a' array:\n", a)
print("   ... its shape is", a.shape)



The modified 'a' array:
 [[1.2 2.5]
 [0.  0. ]
 [2.8 1.7]
 [0.  0. ]]
   ... its shape is (4, 2)


It's possible to use sequences and masks for indexing at the same time on different axes of an array.

A sequence can be created from a mask using the `np.where()` function.

## Concatenating arrays, inserting, deleting.

It's important to know that if a Numpy array is allocated in memory, its size cannot be changed. If we'd like to append or delete an element/row/column/etc., perhaps concatenate multiple arrays, Numpy must always create a new array, which can be a costly operation.

**Concatenating:**

`np.concatenate()`: Multiple arrays are concatenated along an already **existing** layer. It's important that all the axes, except the one used for concatenation, must be of equal length.

`np.stack()`: Concatenates multiple arrays along a **new** axis. All arrays must have the same shape.

In [16]:
# CONCATENATE

a = np.arange(6, dtype=np.float32).reshape((2,3))
print("The 2D 'a' array:\n", a)
print("   ... its shape is", a.shape)

b = np.zeros((2,4), dtype=np.float32)
print("\nThe 2D 'b' array:\n", b)
print("   ... its shape is", b.shape)

c = np.concatenate([a, b], axis=-1)
print("\nThe two arrays concatenated along their last axis:",
      '\nc = np.concatenate([a, b], axis=-1)\n', c)
print("   ... the concatenated array shape is", c.shape)

# STACK: all arrays must have the same shape

d = b[:,:3]
print("\nThe 2D 'd' array:",
      "\nd = b[:,:3]\n", d)
print("   ... its shape is", d.shape)

e = np.stack([a, d], axis=0)
print("\nArrays 'a' and 'd' stacked along a new #0 axis:"
      "\ne = np.stack([a, d], axis=0)\n", e)
print("   ... the new shape is", e.shape)

f = np.stack([a, d], axis=-1)
print("\nArrays 'a' and 'd' stacked along a new last (#2) axis:"
      "\nf = np.stack([a, d], axis=-1)\n", f)
print("   ... the new shape is", f.shape)


The 2D 'a' array:
 [[0. 1. 2.]
 [3. 4. 5.]]
   ... its shape is (2, 3)

The 2D 'b' array:
 [[0. 0. 0. 0.]
 [0. 0. 0. 0.]]
   ... its shape is (2, 4)

The two arrays concatenated along their last axis: 
c = np.concatenate([a, b], axis=-1)
 [[0. 1. 2. 0. 0. 0. 0.]
 [3. 4. 5. 0. 0. 0. 0.]]
   ... the concatenated array shape is (2, 7)

The 2D 'd' array: 
d = b[:,:3]
 [[0. 0. 0.]
 [0. 0. 0.]]
   ... its shape is (2, 3)

Arrays 'a' and 'd' stacked along a new #0 axis:
e = np.stack([a, d], axis=0)
 [[[0. 1. 2.]
  [3. 4. 5.]]

 [[0. 0. 0.]
  [0. 0. 0.]]]
   ... the new shape is (2, 2, 3)

Arrays 'a' and 'd' stacked along a new last (#2) axis:
f = np.stack([a, d], axis=-1)
 [[[0. 0.]
  [1. 0.]
  [2. 0.]]

 [[3. 0.]
  [4. 0.]
  [5. 0.]]]
   ... the new shape is (2, 3, 2)


**Inserting and deleting elements:**

If we'd like to use a loop to insert multiple elements/slices into an array, then it may be much more efficient if we used a list that contains the elements/slices we want to insert, then finally create an array with the list, using, for example, concatenation.

For inserting a single element/slice we can use the `np.append()` and `np.insert()` operations.

With the `np.pad()` operation, we can pad new elements to the edges of an array, even to multiple axes.

Deleting elements is usually done with correct indexing instead of `np.delete()`.

## Matrix and vector operations, linear algebra

Matrix multiplication:

In [17]:
m = np.arange(9, dtype=np.float32).reshape((3,3))   # 3 by 3 matrix
print("The 'm' matrix:\n", m)
print("   ... its shape is", m.shape)

mm = np.matmul(m, m)

print("\nThe matrix product is:\n", mm)
print("   ... its shape is", mm.shape)

mm = np.dot(m, m)    # dot product: when applying on 2D matrices, has the same effect as np.matmul()

print("\nThe matrix product is:\n", mm)
print("   ... its shape is", mm.shape)

mm2 = m @ m    # new in Python 3.5
print("\nThe matrix product is:\n", mm2)

The 'm' matrix:
 [[0. 1. 2.]
 [3. 4. 5.]
 [6. 7. 8.]]
   ... its shape is (3, 3)

The matrix product is:
 [[ 15.  18.  21.]
 [ 42.  54.  66.]
 [ 69.  90. 111.]]
   ... its shape is (3, 3)

The matrix product is:
 [[ 15.  18.  21.]
 [ 42.  54.  66.]
 [ 69.  90. 111.]]
   ... its shape is (3, 3)

The matrix product is:
 [[ 15.  18.  21.]
 [ 42.  54.  66.]
 [ 69.  90. 111.]]


Non-square matrix, $M^T M$

In [18]:
m = np.arange(6, dtype=np.float32).reshape((2,3))   # 2 by 3 matrix
print("The 'm' matrix:\n", m)
print("   ... its shape is", m.shape)

mm = np.matmul(m, m.T)   # transpose: m.T or np.transpose(m)

print("\nThe matrix product is:\n", mm)
print("   ... its shape is", mm.shape)

The 'm' matrix:
 [[0. 1. 2.]
 [3. 4. 5.]]
   ... its shape is (2, 3)

The matrix product is:
 [[ 5. 14.]
 [14. 50.]]
   ... its shape is (2, 2)


Multiplying a vector with a matrix:

In [19]:
m = np.arange(6, dtype=np.float32).reshape((2,3))   # 2 by 3 matrix
print("The 'm' matrix:\n", m)
print("   ... its shape is", m.shape)
v1 = np.arange(3, dtype=np.float32)
print("The 'v1' vector:\n", v1)
print("   ... its shape is", v1.shape)

# (2, 3) x (3,) -> (2,)

r1 = np.dot(m, v1)
print("\nThe result vector:\n", r1)
print("   ... its shape is", r1.shape)

# (2,) x (2, 3) -> (3,)

v2 = v1[:2]
print("The 'v2' vector:\n", v2)
print("   ... its shape is", v2.shape)

r2 = np.dot(v2, m)
print("\nThe result vector:\n", r2)
print("   ... its shape is", r2.shape)

The 'm' matrix:
 [[0. 1. 2.]
 [3. 4. 5.]]
   ... its shape is (2, 3)
The 'v1' vector:
 [0. 1. 2.]
   ... its shape is (3,)

The result vector:
 [ 5. 14.]
   ... its shape is (2,)
The 'v2' vector:
 [0. 1.]
   ... its shape is (2,)

The result vector:
 [3. 4. 5.]
   ... its shape is (3,)


Inner product of two vectors:

In [20]:
v1 = np.arange(3)
v2 = np.arange(3)
print("\nThe 'v1' vector:", v1)
print("   ... its shape is", v1.shape)
print("The 'v2' vector:", v2)
print("   ... its shape is", v2.shape)

print("\nTheir dot product is: ", np.dot(v1, v2))


The 'v1' vector: [0 1 2]
   ... its shape is (3,)
The 'v2' vector: [0 1 2]
   ... its shape is (3,)

Their dot product is:  5


Other important operations:

In [21]:
m = np.array([[2., 1., 3.],[4., 2., 6.],[1., 1., 5.]], dtype=np.float32)
print("The 'm' matrix:\n", m)
print("   ... its shape is", m.shape)

print("\nRank of matrix:", np.linalg.matrix_rank(m)) # not full rank since row#1 == 2*row#0

print("Determinant of matrix:", np.linalg.det(m))  # zero since it is not full rank

v1 = np.arange(3, dtype=np.float32)
print("\nThe 'v1' vector:\n", v1)
print("   ... its shape is", v1.shape)
print("\nLength of vector 'v1' (L2 norm):", np.linalg.norm(v1, ord=2))   # ord=2 is the default (euclidean distance)

print("\nLength of row-vectors of 'm' mnatrix:", np.linalg.norm(m, axis=1, ord=2))

# euclidean distance of two points
two_points = np.array([[1.,2.,-2.],[5., 3., 2.]])
print("\nTwo points:\n", two_points)

vec_length = np.linalg.norm(two_points[0,:] - two_points[1,:], ord=2)
print("  ... their distance:", vec_length)

The 'm' matrix:
 [[2. 1. 3.]
 [4. 2. 6.]
 [1. 1. 5.]]
   ... its shape is (3, 3)

Rank of matrix: 2
Determinant of matrix: 0.0

The 'v1' vector:
 [0. 1. 2.]
   ... its shape is (3,)

Length of vector 'v1' (L2 norm): 2.236068

Length of row-vectors of 'm' mnatrix: [3.7416575 7.483315  5.196152 ]

Two points:
 [[ 1.  2. -2.]
 [ 5.  3.  2.]]
  ... their distance: 5.744562646538029


In [22]:
a=np.array([[1,2,3], [2.4, 5, 3]])
a #( two vectors, 3 items long each)
# sum of squares along each rows
np.sqrt(np.sum(a**2,axis=1))

array([3.74165739, 6.30555311])

## Other useful Numpy operations

Sorting, counting, searching:

https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.sort.html

Set operations:

https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.set.html

`np.apply_along_axis()`: This operation performs a function that is defined for 1-dimensional arrays, on a multidimensional array's 1-dimensional slices, then concatenates the results. Although this function may be useful at times, we must know that this technique is about as useful as writing a simple Python loop to keep calling the function on the sliced array, then concatenating the results. So **if we want to perform the inner operation on many small slices, this method is not efficient.**