## NumPy: the Dark Side and Its Applications

![numpy-logo](images/NumPy_logo_2020.svg)

**Dboy Liao**

[Medium](https://medium.com/@dboyliao)
[GitHub](https://github.com/dboyliao)
[LinkedIn](https://www.linkedin.com/in/yin-chen-liao-69967188/)
[CakeResume](https://www.cakeresume.com/dboyliao)

# The NumPy `ndarray`

```cpp
int matrix[3][5];
```

- arbitrary shape?

- flexible reshape?

- different data type?

![no-cpp](images/no_cpp.jpg)

```python
matrix = [
    [1, 2, 3, 4, 5],
    [6, 7, 8, 9, 10],
    [11, 12, 13, 14, 15]
]
```

來寫個 `reshape` 吧!

我賭你不敢 😜

```python
matrix = [
    [1, 2, 3, 4],
    [5, 6, 7, 8, 9],
    [10, 11, 12, '13']
]
```

- invalid 2D array
   - invalid data type
   - invalid shape

![better-way](images/better_way.jpg)

In [1]:
import numpy as np

array = np.arange(16, dtype=np.int8).reshape(4, 4).copy()
array

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]], dtype=int8)

In [2]:
array.shape

(4, 4)

In [3]:
array.strides

(4, 1)

In [4]:
array.data

<memory at 0x1093f25a0>

In [5]:
array.base is None

True

![array-2d](images/array_2d.drawio.svg)

![array-2d-flatten](images/array_2d_flatten.drawio.svg)

In [6]:
arr_flatten = array.ravel()
arr_flatten

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
      dtype=int8)

In [7]:
# Dark Magic 
np.lib.stride_tricks

<module 'numpy.lib.stride_tricks' from '/Users/dboyliao/Work/open_source/Taipei.py/TaipeiPy-numpy-talk-2023-Jan/.venv/lib/python3.10/site-packages/numpy/lib/stride_tricks.py'>

In [8]:
def flat_list(ll, acc=None):
    if acc is None:
        acc = []
    for l in ll:
        if not isinstance(l, list):
            acc.append(l)
        else:
            acc = flat_list(l, acc)
    return acc

In [9]:
flat_list([[1, 2, 3], [4, 5, 6]])

[1, 2, 3, 4, 5, 6]

In [10]:
flat_list([[1, 2, 3], [4, 5, 6], [1, 2, [3, 4]]])

[1, 2, 3, 4, 5, 6, 1, 2, 3, 4]

In [11]:
def reshape_(in_array, new_shape):
    flat_array = flat_list(in_array)
    new_strides = []
    acc = 1
    for s in new_shape[::-1]:
        new_strides.insert(0, acc*8)
        acc *= s
    return np.lib.stride_tricks.as_strided(flat_array, shape=new_shape, strides=new_strides)

In [12]:
reshape_(
    [
        [1, 2, 3, 4, 5],
        [6, 7, 8, 9, 10],
        [11, 12, 13, 14, 15]
    ],
    (5, 3)
)

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15]])

In [13]:
reshape_(
    [
        [1, 2, 3, 4, 5],
        [6, 7, 8, 9, 10],
        [11, 12, 13, 14, 15]
    ],
    (6, 3)
)

array([[               1,                2,                3],
       [               4,                5,                6],
       [               7,                8,                9],
       [              10,               11,               12],
       [              13,               14,               15],
       [2251799813685248,                3,       4406147920]])

# NumPy is All You Need

###### Nature 2020 Review Paper

![numpy-nature](images/numpy-nature.webp)

[source](https://www.nature.com/articles/s41586-020-2649-2)

- indexing
    - basic indexing
    - advanced indexing

- broadcasting

- vectorization

[Official Doc](https://numpy.org/doc/stable/user/basics.indexing.html)

![numpy-memorize](images/numpy-memorize.jpg)

![better-way](images/better_way.jpg)

![array-2d-flatten](images/array_2d_flatten.drawio.svg)

In [14]:
print("1 == 4 * 0 + 1 * 1:", 1 == 4*0 + 1*1)
print("6 == 4 * 1 + 1 * 2:", 6 == 4*1 + 1*2)

1 == 4 * 0 + 1 * 1: True
6 == 4 * 1 + 1 * 2: True


In [15]:
print("arr_flatten[1] == array[0, 1]:", arr_flatten[1] == array[0, 1])
print("arr_flatten[6] == array[1, 2]:", arr_flatten[6] == array[1, 2])

arr_flatten[1] == array[0, 1]: True
arr_flatten[6] == array[1, 2]: True


In [16]:
array.strides

(4, 1)

- linear offset: the offset of an element in the flattened array
- $\mathbf{arr}$: a m-dims array
    - strides: $(s_0, s_1, ..., s_m)$
    - shape: $(d_0, d_1, ..., d_m)$
- $e = \mathbf{arr}[i_0, i_1, ..., i_m]$, element in $\mathbf{arr}$
    - with linear offset $\text{offset}_e$

Then we have:

$$
    \text{offset}_e = \sum\limits_{j=0}^{m} s_j \cdot i_j
$$

# Applications

## Shared Memory View

In [29]:
cube = np.arange(3*3*3).reshape((3, 3, 3))
cube

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

![array-3d](images/array-3d.drawio.svg)

### Time Series: Sliding Window with Shared-Memory View

In [17]:
data = np.arange(20, dtype=np.int8)
data

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19], dtype=int8)

In [18]:
windows = np.lib.stride_tricks.as_strided(
    data,
    shape=(16, 5),
    strides=(1, 1)
)

In [19]:
X = windows[:, :4]
Y = windows[:, 4]

In [20]:
X

array([[ 0,  1,  2,  3],
       [ 1,  2,  3,  4],
       [ 2,  3,  4,  5],
       [ 3,  4,  5,  6],
       [ 4,  5,  6,  7],
       [ 5,  6,  7,  8],
       [ 6,  7,  8,  9],
       [ 7,  8,  9, 10],
       [ 8,  9, 10, 11],
       [ 9, 10, 11, 12],
       [10, 11, 12, 13],
       [11, 12, 13, 14],
       [12, 13, 14, 15],
       [13, 14, 15, 16],
       [14, 15, 16, 17],
       [15, 16, 17, 18]], dtype=int8)

In [21]:
Y

array([ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
      dtype=int8)

In [22]:
X.base is windows, Y.base is windows

(True, True)

## Nested Loop and Vectorization (Broadcasting)

In [23]:
Na, Nb, Nc = 20, 30, 50
a = np.random.rand(Na, 5)
b = np.random.rand(Nb, 5)
c = np.random.rand(Nc, 1)

In [24]:
out = np.zeros((Na, Nb, Nc), dtype=float)
for i in range(Na):
    for j in range(Nb):
        for k in range(Nc):
            a_ = a[i]
            b_ = b[j]
            out[i, j, k] = ((a_ > b_) * c[k]).sum()

![better-way](images/better_way.jpg)

In [30]:
out_ = (
    (a[:, np.newaxis, np.newaxis, :] > b[np.newaxis, :, np.newaxis, :]) * c[np.newaxis, np.newaxis, :, :]
).sum(axis=-1)
out_.shape

(20, 30, 50)

In [26]:
np.allclose(
    out,
    out_
)

True

# Take Home Messages

- the foundation data structure of `numpy`: `ndarray`
- shared-memory view
- indexing
- broadcasting

**NEVER** write nested loops ever again

![better-way](images/better_way.jpg)

# Learning Resources


- https://github.com/wadetb/tinynumpy
  - pure python, `numpy` compliant implementation
- https://github.com/dboyliao/numPY
  - my work
  - try to build a pure python implementation of `numpy`
  - education purpose
- https://github.com/rougier/numpy-100
  - 100 numpy exercises with solutions