# Module 5 - Numerical Python II


NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays (source: https://en.wikipedia.org/wiki/NumPy).

Numpy is a Python library, so it comes as a collection of Python modules.
- More about numpy: https://www.w3schools.com/python/numpy/default.asp



In [1]:
import numpy as np
np.set_printoptions(precision=2)

## Other - Selected Nympy Functions

### Sorting

In [2]:
#
# Suppose we have a ndarray
#
data = np.random.rand(10, 4)
data

array([[0.9 , 0.  , 0.01, 0.2 ],
       [0.35, 0.52, 0.53, 0.4 ],
       [0.36, 0.01, 0.2 , 0.27],
       [0.07, 0.02, 0.74, 0.52],
       [0.61, 0.67, 0.48, 0.22],
       [0.48, 0.26, 1.  , 0.77],
       [0.22, 0.98, 0.73, 0.17],
       [0.58, 0.71, 0.9 , 0.16],
       [0.08, 0.17, 0.29, 0.32],
       [0.39, 0.01, 0.07, 0.64]])

- We can sort the data 

In [3]:
#
# Sort each row by sorting along the column axis.
# The default sorting axis is the last axis (axis=-1)
#
np.sort(data)

array([[0.  , 0.01, 0.2 , 0.9 ],
       [0.35, 0.4 , 0.52, 0.53],
       [0.01, 0.2 , 0.27, 0.36],
       [0.02, 0.07, 0.52, 0.74],
       [0.22, 0.48, 0.61, 0.67],
       [0.26, 0.48, 0.77, 1.  ],
       [0.17, 0.22, 0.73, 0.98],
       [0.16, 0.58, 0.71, 0.9 ],
       [0.08, 0.17, 0.29, 0.32],
       [0.01, 0.07, 0.39, 0.64]])

In [4]:
#
# Sort each column by sorting along the row axis
#
np.sort(data, axis=0)

array([[0.07, 0.  , 0.01, 0.16],
       [0.08, 0.01, 0.07, 0.17],
       [0.22, 0.01, 0.2 , 0.2 ],
       [0.35, 0.02, 0.29, 0.22],
       [0.36, 0.17, 0.48, 0.27],
       [0.39, 0.26, 0.53, 0.32],
       [0.48, 0.52, 0.73, 0.4 ],
       [0.58, 0.67, 0.74, 0.52],
       [0.61, 0.71, 0.9 , 0.64],
       [0.9 , 0.98, 1.  , 0.77]])

### Argsort

- `numpy.argsort` returns the indices of the original ndarray in sorted order.

In [7]:
np.argsort(data)

array([[1, 2, 3, 0],
       [0, 3, 1, 2],
       [1, 2, 3, 0],
       [1, 0, 3, 2],
       [3, 2, 0, 1],
       [1, 0, 3, 2],
       [3, 0, 2, 1],
       [3, 0, 1, 2],
       [0, 1, 2, 3],
       [1, 2, 0, 3]])

- `numpy.argsort` is best used to sort a single axis ndarray. Suppose we want to sort by the **first** column of data, and still keep the rows intact.

In [8]:
#
# This is the first column
#
data[:, 0]

array([0.9 , 0.35, 0.36, 0.07, 0.61, 0.48, 0.22, 0.58, 0.08, 0.39])

In [9]:
#
# We can sort it, but keep the indice positions
#
sorted_idx = np.argsort(data[:, 0])
sorted_idx

array([3, 8, 6, 1, 2, 9, 5, 7, 4, 0])

In [10]:
#
# Now, we can use `sorted_idx` to rearrange data
#
data[sorted_idx, :]

array([[0.07, 0.02, 0.74, 0.52],
       [0.08, 0.17, 0.29, 0.32],
       [0.22, 0.98, 0.73, 0.17],
       [0.35, 0.52, 0.53, 0.4 ],
       [0.36, 0.01, 0.2 , 0.27],
       [0.39, 0.01, 0.07, 0.64],
       [0.48, 0.26, 1.  , 0.77],
       [0.58, 0.71, 0.9 , 0.16],
       [0.61, 0.67, 0.48, 0.22],
       [0.9 , 0.  , 0.01, 0.2 ]])

In [11]:
#
# Let's do reverse sorting.  This can be done by sorting
# `-data` instead of `data`.
#
rev_sorted_idx = np.argsort(-data[:, 0])
rev_sorted_idx

array([0, 4, 7, 5, 9, 2, 1, 6, 8, 3])

In [12]:
#
# Rearrange the rows of data according to reversely sorted first column
#
data[rev_sorted_idx,:]

array([[0.9 , 0.  , 0.01, 0.2 ],
       [0.61, 0.67, 0.48, 0.22],
       [0.58, 0.71, 0.9 , 0.16],
       [0.48, 0.26, 1.  , 0.77],
       [0.39, 0.01, 0.07, 0.64],
       [0.36, 0.01, 0.2 , 0.27],
       [0.35, 0.52, 0.53, 0.4 ],
       [0.22, 0.98, 0.73, 0.17],
       [0.08, 0.17, 0.29, 0.32],
       [0.07, 0.02, 0.74, 0.52]])

- Check out `numpy.lexsort` which acts like `numpy.argsort` by performs comparison over multiple columns. This is known as **lexicographical sorting**.

### Linear Algebra
In linear algebra, we are primarily interested in matrices (ndarrays with 2 axes) and vectors (ndarrays with one axis).

**Matrix multiplication**

In [13]:
#
# Matrix multiplication
#

M = np.arange(12).reshape(3,4)
M

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [14]:
N = np.random.rand(2, 3)
N

array([[0.12, 0.75, 0.5 ],
       [0.44, 0.27, 0.67]])

-  We can multiple N x M, but not M X N.


In [15]:
np.dot(N, M)

array([[ 6.99,  8.35,  9.72, 11.09],
       [ 6.47,  7.85,  9.22, 10.6 ]])

In [16]:
np.dot(M, N)

ValueError: shapes (3,4) and (2,3) not aligned: 4 (dim 1) != 2 (dim 0)

- Matrix multiplication has a very convenient shorthand in Python. Using the `@` operator

In [17]:
# Use the `@` operator.
#
N @ M

array([[ 6.99,  8.35,  9.72, 11.09],
       [ 6.47,  7.85,  9.22, 10.6 ]])

In [18]:
M @ N

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 4)

**Matrix and Vectors**

In [19]:
v = np.random.rand(4)
v

array([0.15, 0.19, 0.37, 0.96])

In [20]:
M @ v

array([ 3.83, 10.56, 17.28])

In [21]:
N @ v

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 4 is different from 3)

- Dot product between two vectors of the same length is defined as:

    $u \cdot v = \sum_{i}{u_i \times v_i}$


In [22]:
u = np.random.rand(4)
u

array([0.79, 0.6 , 0.91, 0.26])

In [23]:
np.dot(u, v)

0.8283795128391014

In [24]:
u @ v

0.8283795128391014

- Don't forget that we can always perform element-wise multiplication of ndarrays of the same dimension

In [25]:
u * v

array([0.12, 0.11, 0.34, 0.25])

## Persistent Storage of Numpy Data

Numpy can save data to the disk, and load data from disk.

### Saving data

In [26]:
#
# Consider an nd-array obtained somehow.
#
data = np.random.rand(6, 3)
data

array([[0.86, 0.34, 0.42],
       [0.5 , 0.27, 0.95],
       [0.73, 0.61, 0.35],
       [0.27, 0.8 , 0.83],
       [0.52, 0.73, 0.45],
       [0.53, 0.01, 0.72]])

In [27]:
#
# We can save it with numpy.save(filename, ndarray)
#
np.save('my_data', data)

### Loading Data

In [28]:
#
# We can load the ndarray that is saved in the file.
#
reloaded_data = np.load('my_data.npy')
reloaded_data

array([[0.86, 0.34, 0.42],
       [0.5 , 0.27, 0.95],
       [0.73, 0.61, 0.35],
       [0.27, 0.8 , 0.83],
       [0.52, 0.73, 0.45],
       [0.53, 0.01, 0.72]])