# Introduction to Python and Natural Language Technologies

__Lecture 04, NumPy__

__March 2, 2020__

# Numpy

"NumPy is the fundamental package for scientific computing with Python."

https://numpy.org/

NumPy provides:

* linear algebra, Fourier transform and random numbers,
* easy-to-use matrices, arrays, tensors,
* heavy optimization and
* C/C++/Fortran integration.

"Numpy is the MATLAB of python!"

We import NumPy as `np` by convenction:

In [None]:
import numpy as np

NumPy uses an underlying [BLAS](http://www.netlib.org/blas/) library, just as MATLAB does. These libraries employ vectorization.
* Anaconda uses IntelMKL (Intel's proprietary math library)
* If you install numpy manually, and you have previously installed [OpenBLAS](http://www.openblas.net/) (free, opensource), then NumPy will use that.

In [None]:
np.show_config()

# N-dimensional arrays

The core object of NumPy is `ndarray` (_$n$-dimensional array_).

In [None]:
A = np.array([[1, 2], [3, 4], [5, 6]])
print(type(A))
A

## Constructing arrays

Ndarray should never be constructed directly. A number of functions exist for constructing arrays:

__`np.array`__: manually specify each element

In [None]:
np.array([[1, 2], [3, 4], [5, 6]])

It can also take a sequence or a generator:

In [None]:
np.array(range(1, 7))

__`np.arange`__: range. Unlike the built-in `range`, `np.arange` supports float arguments

In [None]:
np.arange(1.1, 10.1, 1.5)

__`np.linspace`__: equally divided interval

In [None]:
A = np.linspace(0, 10, 5)
print(A.shape, type(A))
A

This is often useful for creating a fine-grained scale:

In [None]:
np.linspace(0, 10, 101)

__`np.ones`__: fills an array with 1s

In [None]:
np.ones((2, 3, 2))

__`np.zeros`__: fills an array with 0s

In [None]:
np.zeros((2, 3))

__`np.ones_like`__ and __`np.zeros_like`__ copy another array's shape and dtype:

In [None]:
A = np.array([[2, 3, -1], [1, 2, 6]])
np.ones_like(A)

In [None]:
A = np.array([[2.1, 3, -1], [1, 2, 6]])
np.zeros_like(A)

__`np.eye`__: creates an identity matrix, only works in 2D

In [None]:
np.eye(3)

In [None]:
np.eye(4, dtype=bool)

It doesn't have to be a square matrix:

In [None]:
np.eye(5, 2)

there is no `np.eye_like`, but you can use the following:

In [None]:
A = np.array([[2, 3, -1], [1, 2, 6]])
np.eye(*A.shape, dtype=A.dtype)

## Basic properties of ndarrays

`A.shape` is a tuple of the array's dimensions

In [None]:
A.shape

and `dtype` is the type of the elements

In [None]:
A.dtype

In [None]:
A = np.array([1.5, 2])
A.shape, A.dtype

`nbytes` gives us the memory footprint of the array:

In [None]:
A.nbytes

In [None]:
A = np.zeros((1000, ))
A.nbytes

In [None]:
A = np.zeros((1000, ), dtype=np.bool)
A.nbytes

In [None]:
A = np.zeros((1000, ), dtype=np.uint16)
A.nbytes

## Accessing elements

Arrays are zero-indexed.

Accessing one row:

In [None]:
A = np.array([[1, 2], [3, 4], [5, 6]])
print(A.shape)
A[0], A[1], type(A[0])

Accessing one column:

In [None]:
A[:, 0], type(A[:, 0])

Accessing a single element:

In [None]:
A[2, 1], type(A[2, 1])

Accessing a range of rows / columns

In [None]:
A[:2]  # or A[:2, :]

In [None]:
A[:, 1:]

In [None]:
A[::2]

In general, an $n$-dimensional array requires $n$ indices to access its scalar elements.

In [None]:
B = np.array([[[1, 2, 3],[4, 5, 6]]])
B.shape, B.ndim

In [None]:
B[0].shape

In [None]:
B[0, 1], B[0, 1].shape

3 indices access scalar members if ndim is 3

In [None]:
B[0, 1, 2], B[0, 1, 2].shape

## Slicing, advanced indexing

`:` retrieves the full size along that dimension.

In [None]:
A = np.array([[1, 2, 3], [4, 5, 6]])
print(A)
print(A[0])
print(A[0, :]) # first row
print(A[:, 0])  # first column

These are 1D vectors, neither $1\times n$ nor $n\times1$ matrices!

In [None]:
A[0, :].shape, A[:, 0].shape

In [None]:
B = np.array([[[1, 2, 3],[4, 5, 6]]])
B.shape

In [None]:
print(B[:, 1, :].shape)
B[:, 1, :]

In [None]:
B[0, 1, :], B[0, 1, :].shape

In [None]:
type(B[0, 1, 1]), B[0, 1, 1]

In [None]:
isinstance(B[0, 1, 1], np.integer), isinstance(2, np.integer)

All python range indexing also work, like reverse:

In [None]:
print(A[:, ::-1])
print(A[::-1, :])
print(A[:, ::2])

## Advanced indexing

Advanced indexing is when the index is a list.

In [None]:
B = np.array([[[1, 2, 3], [4, 5, 6]]])
print(f"{B.shape = }")
print(f"{B[0, 0, [0, 2]].shape = }")
B[0, 0, [0, 2]]

This selects the first and third "column":

In [None]:
B[0, :, [0, 2]]

or third and first "column":

In [None]:
B[0, :, [2, 0]]

one column can be selected multiple times and the list of indices doesn't have to be ordered

In [None]:
B[0, :, [2, 0, 2, 2]]

In [None]:
print(f"{B[0, 0, [0, 2, 0]].shape = }")
print(f"{B[0, 0, [0, 2, 0, 0]].shape = }")

### Advanced indexing theory

If all indices are lists:

<div align=center>B[$i_1$, $i_2$, $\ldots$].shape = (len($i_1$), len($i_2$), $\ldots$)</div>

The size of a particular dimension remains when the corresponding index is a colon (`:`).

If an index is a scalar then that dimension _disappears_ from the shape of the output.

If an index is a list then the dimension remains and its size is the length of the list.

One-length lists can be used as well. The resulting dimension remains but its size becomes one.

In [None]:
B = np.array([[[1, 2, 3], [4, 5, 6]]])
B[:, :, 2].shape

In [None]:
B[:, :, [2]].shape

In [None]:
B = np.array([[1, 2, 3], [4, 5, 6]])
print(B.shape)
print(B)
B[[0, 1, 0, 0, 0], [1, 2, 0, -1, 1]]

## Under the hood

The default array representation is C style ([row-major](https://en.wikipedia.org/wiki/Row-_and_column-major_order)) indexing. But you shouldn't rely on the representation, it is not recommended to use the low level C arrays.

In [None]:
print(B.strides)
print(B.flags)

# Operations on arrays

## Element-wise arithmetic operators

Arithmetic operators are overloaded, they act element-wise.

In [None]:
A = np.array([[1, 1], [2, 2]])
P = A >= 2
print(P)
print(P.dtype)

In [None]:
A + A

In [None]:
A * A

In [None]:
np.exp(A)

In [None]:
2 ** A

In [None]:
1 / A

## Matrix algebraic operations

`dot` is the standard matrix product

In [None]:
A = np.array([[1, 2], [3, 4]])
A.dot(A)

inner dimensions must match

In [None]:
A = np.array([[1, 2], [3, 4]])
B = np.array([[1, 2, 3], [4, 5, 6]])
print(A.shape, B.shape)
A.dot(B)
# B.dot(A)

Inverse can be computed by `np.linalg.inv`:

In [None]:
A_inv = np.linalg.inv(A)
print(A_inv)

A_inv.dot(A)

In [None]:
np.round(A_inv.dot(A), 5)

pseudo-inverse can be computed with `np.linalg.pinv`

In [None]:
A = np.array([[1, 2, 3], [4, 5, 6]])
A_pinv = np.linalg.pinv(A)

A.dot(A_pinv).dot(A)

There is a `matrix` class as well, for which `*` acts as a matrix product.

In [None]:
M = np.matrix([[1, 2], [3, 4]])
print(np.multiply(M, M))
print(M * M)

## Transpose

__`transpose`__ rearranges the dimensions of an array:

In [None]:
A = np.array([[[1, 2, 3], [4, 5, 6]]])
print(f"{A.shape = }")
print(f"{A.transpose(1, 0, 2).shape = }")

The parameters of transpose are the axes in their new order. All axes must be listed exactly once:

In [None]:
# print(f"{A.transpose(1, 0).shape = }")
# print(f"{A.transpose(1, 0, 0).shape = }")
# print(f"{A.transpose(1, 0, 2, 1).shape = }")

__T__ is a shorthand for __`.transpose(1, 0)`__, it transposes the first two dimensions

In [None]:
np.array([[1, 2, 3], [4, 5, 6]]).T

In [None]:
A = np.array([[1, 2, 3], [4, 5, 6]])
A

In [None]:
A.transpose(1, 0)

## Casting and types

C types are available in `numpy`

In [None]:
P = np.array([[1.2, 1], [-1.5, 0]])
print(P.dtype)
P.astype(int)

In [None]:
P = np.array([[1.2, 1], [-1.5, 0]])
(-P.astype(int)).astype(np.uint32)

In [None]:
np.array([[1, 2], [3, 4]], dtype=np.float32)

Directly converts strings to numbers

In [None]:
np.float32('-10')

`dtype` can be specified during array creation

In [None]:
np.array(['10', '20'], dtype=np.float16)
# np.array(['10ab', '20'], dtype=np.float16)  # raises ValueError

__`np.datetime64`__ for dates:

In [None]:
np.datetime64("2018-03-10")

It supports basic time arithmetics:

In [None]:
np.datetime64("2018-03-10") - np.datetime64("2017-12-13")

String arrays:

In [None]:
T = np.array(['apple', 'plum'])
print(T)
print(T.shape, T.dtype, type(T))

Fixed length character arrays, longer string are truncated:

In [None]:
T[1] = "banana"
T

## Changing shape

The shape of an array can be modified with `reshape`, as long as the number of elements remains the same. The underlying elements are unchanged and not copied in the memory.

In [None]:
B = np.array([[[1, 2, 3], [4, 5, 6]]])
B.reshape((2, 3))

In [None]:
B.reshape((3, 2)).shape, B.shape

A `ValueError` is raised if the shape is invalid:

In [None]:
B.reshape(7)  # raises ValueError

We have a shorthand now to create arrays like B:

In [None]:
np.arange(1, 7).reshape((1, 2, 3))

Size `-1` can be used to span the resulted array as much as it can in that dimension.

In [None]:
X = np.arange(12).reshape((2, -1, 2))
print(f"{X.shape = }")
print(X)

But only one dimension can have -1 as its size:

In [None]:
X = np.arange(12).reshape((2, -1, 2, -1))

In [None]:
X = np.arange(12).reshape((2, 3, 2))

__`resize`__ deletes elements or fills with zeros but it works only _inplace_.

In [None]:
X = np.array([[1, 2], [3, 4]])
X.resize((5, 3))
#X.resize((2, 2))
X

However, `np.resize` (not a member) works differently

In [None]:
X = np.array([[1, 2], [3, 4]])
np.resize(X, (5, 3))

`X` is unchanged:

In [None]:
X

## Concatenation

Arrays can be concatenated along any axis as long as their shapes are compatible.

In [None]:
A = np.arange(6).reshape(2, -1)
B = np.arange(10, 18).reshape(2, -1)
print(f"{A.shape = }")
print(f"{B.shape = }")

np.concatenate((A, B), axis=1)

In [None]:
np.concatenate((A, B), axis=-1)  # last dimension

In [None]:
np.concatenate((A, B))  # axis=0 is the default

In [None]:
np.concatenate([A, B, A, B], axis=1)

Concatenating on the first and second dimension of 2D arrays is a very common operation, there are shorthands:

In [None]:
A = np.arange(6).reshape(2, -1)
B = np.arange(8).reshape(2, -1)
A, B

In [None]:
np.hstack((A, B))

In [None]:
A = np.arange(6).reshape(-1, 2)
B = np.arange(8).reshape(-1, 2)
print(A.shape, B.shape)

# np.hstack((A, B))
np.vstack((A, B))

`np.stack` puts the arrays next to each other along a **new** dimension

In [None]:
A.shape, np.stack((A, A, A, A, A, A)).shape

In [None]:
np.stack((A, A, A, A))

Block matrix

In [None]:
np.concatenate([np.concatenate([np.ones((2,2)), np.zeros((2,2))], axis=1),
                np.concatenate([np.zeros((2,2)), np.ones((2,2))], axis=1)], axis=0)

## Iteration

Arrays support iteration but it is rarely used. Vectorized operations are much more efficient.

By default, iteration takes place in the first (outermost) dimension.

In [None]:
A = np.arange(6).reshape(2, -1)
for row in A:
    print("Row:", row)

But you can slice the desired elements for a loop.

In [None]:
B = np.arange(6).reshape(1, 2, 3)

for x in B[0, 0, :]:
    print(x)

We can flatten the matrix and iterate through the elements using `flat`:

In [None]:
for a in B.flat:
    print(a)

In [None]:
for k in range(B.shape[2]):
    print(B[:, :, k])

# Broadcasting

The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. 

For example a $1\times 1$ array can be multiplied with matrices, just like a scalars times a matrix.

In [None]:
A = np.arange(6).reshape(2, -1)
s = 2.0 * np.ones((1, 1))
print(s)
print(A)
print(f"{A.shape = }, {s.shape =}")
s * A

A one-length vector can be multiplied with a matrix:

In [None]:
print(np.ones((1,)) * A)
np.ones(()) * B

However you cannot perform element-wise operations on uneven sized dimensions:

In [None]:
np.ones((2, 3)) + np.ones((3, 2))

This behavior is defined via _broadcasting_. If an array array has a dimension of length one, then it can be _broadcasted_, which means that it can span as much as the operation requires (operation other than indexing).

In [None]:
np.arange(3).reshape((1, 3)) + np.zeros((2, 3))

In [None]:
np.arange(3).reshape((3, 1)) + np.zeros((3, 4))

More than one dimension can be broadcasted at a time.

In [None]:
np.arange(3).reshape((1, 3, 1)) + np.zeros((2, 3, 6))

## Theory

If an array has the shape `(1, 3, 1)`, its first and third dimensions can be broadcast.

The index triple `[x, y, z]` accesses its elements as `[0, y, 0]` even if $x>0$ and $z>0$. In other words, broadcast dimensions are repeated as many times as the operation needs it.

Non-existent dimensions can be broadcast.

In terms of shapes: `(k,) + (i, j, k)` means that a vector plus a three dimensional array (of size $i \times j \times k$). The index `[i, j, k]` of the broadcast vector degrades to `[k]`.

Denoting false dimensions with `None`, we can illustrate broadcasting:

`(2,) + (3, 2, 2)` results the broadcast `(None, None, 2) + (3, 2, 2)`.

False dimensions are __always__ prepended at the front, or in the place of 1-length dimensions.

False dimensions can be forced with the `None` index.
These exist only for broadcasting.

`(2,) + (3, 2, 2) -> (None, None, 2) + (3, 2, 2) = (3, 2, 2)`

is OK, but

`(3,) + (3, 2, 2) -> (None, None, 3) + (3, 2, 2)`

is NOT.

In [None]:
def test_broadcast(x, y):
    try:
        A = np.ones(x) + np.ones(y)
        print("Broadcastible")
    except ValueError:
        print("Not broadcastible")

        
test_broadcast((3,), (3, 2, 2))
test_broadcast((2,), (3, 2, 2))
test_broadcast((3, 1, 4), (3, 2, 1))
test_broadcast((3, 1, 4), (3, 2, 2))
test_broadcast((1, 4), (3, 2, 4))

We can force broadcasting by using `None` in place of the broadcastible dimension.

In [None]:
(np.ones(3)[:, None, None] + np.ones((3, 2, 2))).shape

The result in shapes: `(3, None, None) + (3, 2, 2) = (3, 2, 2)`

## Example

This one liner to produce a complex "grid".

In [None]:
np.arange(5)[:, None] + 1j * np.arange(5)[None, :]

Due to the default behavior, a vector behaves as a row vector and acts row-wise on a matrix.

`(n,) -> (None, n)`

In [None]:
np.arange(5) + np.zeros((5, 5))

We can explicitly reshape it:

In [None]:
np.arange(5).reshape(-1, 1) + np.zeros((5, 5))

This behavior does not apply to non-element-wise operations, like `dot` product.

# Reductions

Sum over an axis:

In [None]:
Y = np.arange(24).reshape(2, 3, 4)
Y

In [None]:
Y.sum()  # sum every element

In [None]:
Y.sum(axis=0) #.shape

In [None]:
Y.sum(axis=(1, 2)) #.shape

In [None]:
i = 0
ysum = [0, 0]

for j in range(Y.shape[1]):
    for k in range(Y.shape[2]):
        ysum[0] += Y[0, j, k]
ysum

In [None]:
Y.sum(axis=-1)

`mean, std, var` work similarly but compute the mean, standard deviation and variance along an axis or the full array:

In [None]:
Y.mean()

In [None]:
Y.mean(axis=(2, 0))

In [None]:
Y = np.arange(24).reshape(2, 3, 4)
Y.mean(axis=(2)).shape

In [None]:
Y.std()

# `np.random`

Numpy has a rich random subpackage.

In [None]:
np.random.random()

`np.random.rand`  or `np.random.random` samples random `float64` numbers from $[0, 1)$ with uniform sampling.

In [None]:
np.random.random(size=(2, 3))

How can we generate random numbers between 20 and 30?

In [None]:
# TODO

Other distributions:

In [None]:
np.random.uniform(1, 2, (2, 2))

In [None]:
np.random.standard_normal(size=(3, 2))  # np.random.normal(0, 1, size=(3, 2))

In [None]:
np.random.normal(10, 100, size=(3, 2))

Descrete randoms:

In [None]:
np.random.choice(["A", "2", "3", "4", "5", "6", "7", "8", "9", "10", "J", "Q", "K"], 10, replace=False)

`choice` accepts custom probabilities:

In [None]:
np.random.choice(range(1, 7), 10, p=[0.1, 0.1, 0.1, 0.1, 0.1, 0.5])

In [None]:
print(np.random.permutation(["A", "2", "3", "4", "5", "6", "7", "8", "9", "10", "J", "Q", "K"]))

`permutation` permutes the first (outermost) dimension.

In [None]:
print(np.random.permutation(np.arange(9).reshape((3, 3))))

# Miscellaneous

## Shuffling

In [None]:
A = np.arange(10)
np.random.shuffle(A)
A

Fixing the seed:

In [None]:
np.random.seed(42)
A = np.arange(10)
np.random.shuffle(A)
A

## Split dataset into train, dev and test

In [None]:
dataset = set() 
letters = list("abcdef")
while len(dataset) < 100:
    word_len = np.random.randint(3, 7)
    word = "".join(np.random.choice(letters, size=word_len))
    dataset.add(word)
dataset = np.array(list(dataset))

In [None]:
dataset

In [None]:
all_idx = np.arange(len(dataset))
np.random.shuffle(all_idx)
train_idx = all_idx[:80]
dev_idx = all_idx[80:90]
test_idx = all_idx[90:]

train_dataset = dataset[train_idx]
dev_dataset = dataset[dev_idx]
test_dataset = dataset[test_idx]

In [None]:
dev_dataset

## Boolean indexing and `np.where`

Selecting elements that satisfy a certain condition:

In [None]:
A = np.random.random((4, 3))
print(A.mean())
A

Selecting elements greater than the mean of the matrix:

In [None]:
A > A.mean()

In [None]:
A[A > A.mean()]

`np.where` returns the advanced indices for which the condition is satisfied (where the boolean array is `True`):

In [None]:
np.where(A > A.mean())

In [None]:
A[np.where(A > A.mean())]

actually `np.where` returns the indices of elements that evaluate to nonzero

In [None]:
np.where([2, -1, 0, 5])

## `np.tile`, `np.repeat`

In [None]:
np.tile([1, 2, 3], reps=(4, 2))

In [None]:
np.repeat([1, 2, 3], repeats=4)