In [4]:
import numpy as np

In [5]:
np.set_printoptions(
    edgeitems=30, 
    linewidth=100000, 
    precision=3,
    suppress=True)

## Visualizing Multi-Dimensional Tensors
For any multi-dimensional tesnor I need to take the last 2 dims and create a matrix out of it. Lets call these matrics blocks. If the total number of dimensions is odd, then I simply create a column vector of such blocks. If the total number of dimensions is even, I will arrange these blocks in matrices. This is best shown with a bunch of examples.

In [102]:
def ndarray_to_latex(x):
    if len(x.shape) == 2:
        ret = r"\begin{bmatrix}" + "\n"
        for r in range(x.shape[0]):
            ret += " & ".join(str(i) for i in x[r]) + r" \\" + "\n"
        ret += r"\end{bmatrix}"
        return np.array([ret])
    else:
        raise NotImplementedError()
                

In [104]:
print(ndarray_to_latex(np.array([
    [1, 2, 3],
    [4, 5, 6]
]))[0])

\begin{bmatrix}
1 & 2 & 3 \\
4 & 5 & 6 \\
\end{bmatrix}


In [6]:
blocks = []
for i in range(24):
    blocks.append(np.full(6, fill_value=i))
blocks    

[array([0, 0, 0, 0, 0, 0]),
 array([1, 1, 1, 1, 1, 1]),
 array([2, 2, 2, 2, 2, 2]),
 array([3, 3, 3, 3, 3, 3]),
 array([4, 4, 4, 4, 4, 4]),
 array([5, 5, 5, 5, 5, 5]),
 array([6, 6, 6, 6, 6, 6]),
 array([7, 7, 7, 7, 7, 7]),
 array([8, 8, 8, 8, 8, 8]),
 array([9, 9, 9, 9, 9, 9]),
 array([10, 10, 10, 10, 10, 10]),
 array([11, 11, 11, 11, 11, 11]),
 array([12, 12, 12, 12, 12, 12]),
 array([13, 13, 13, 13, 13, 13]),
 array([14, 14, 14, 14, 14, 14]),
 array([15, 15, 15, 15, 15, 15]),
 array([16, 16, 16, 16, 16, 16]),
 array([17, 17, 17, 17, 17, 17]),
 array([18, 18, 18, 18, 18, 18]),
 array([19, 19, 19, 19, 19, 19]),
 array([20, 20, 20, 20, 20, 20]),
 array([21, 21, 21, 21, 21, 21]),
 array([22, 22, 22, 22, 22, 22]),
 array([23, 23, 23, 23, 23, 23])]

### Visualizing 3D Tensors
Lets say I have a tensor with $2 \times 3 \times 2$ dimensions. I'll first create blocks of $3 \times 2$. Then I'll simply stack them on top of each other.

$$
X = \begin{bmatrix}
\begin{bmatrix}
0 & 0 \\
0 & 0 \\
0 & 0 \\
\end{bmatrix} \\
\begin{bmatrix}
1 & 1 \\
1 & 1 \\
1 & 1 \\
\end{bmatrix} \\
\end{bmatrix}
$$

In [7]:
X = np.concatenate((blocks[0], blocks[1]), axis=0).reshape(2, 3, 2)
print(X)
print(X.shape)

[[[0 0]
  [0 0]
  [0 0]]

 [[1 1]
  [1 1]
  [1 1]]]
(2, 3, 2)


In [8]:
X[0]

array([[0, 0],
       [0, 0],
       [0, 0]])

In [9]:
X[1]

array([[1, 1],
       [1, 1],
       [1, 1]])

In [10]:
X[0,0]

array([0, 0])

### Visualizing 4D Tensors
Now lets say our dimensions are $2 \times 3 \times 2 \times 3$. As usual first I'll take the last 2 dims and create matrix blocks out of those, i.e., I'll create 6 blocks, each of shape $2 \times 3$. Now I'll take these six blocks and arrange them in a $2 \times 3$ matrix row-wise.

![4d](./visualizing_4D.png)

In [11]:
X = np.concatenate(
    (
        blocks[0], 
        blocks[1],
        blocks[2],
        blocks[3],
        blocks[4],
        blocks[5]
    ), 
    axis=0
).reshape(2, 3, 2, 3)
print(X)
print(X.shape)

[[[[0 0 0]
   [0 0 0]]

  [[1 1 1]
   [1 1 1]]

  [[2 2 2]
   [2 2 2]]]


 [[[3 3 3]
   [3 3 3]]

  [[4 4 4]
   [4 4 4]]

  [[5 5 5]
   [5 5 5]]]]
(2, 3, 2, 3)


In [12]:
X[0]

array([[[0, 0, 0],
        [0, 0, 0]],

       [[1, 1, 1],
        [1, 1, 1]],

       [[2, 2, 2],
        [2, 2, 2]]])

In [13]:
X[0,0]

array([[0, 0, 0],
       [0, 0, 0]])

In [14]:
X[0,1]

array([[1, 1, 1],
       [1, 1, 1]])

In [15]:
X[0,2]

array([[2, 2, 2],
       [2, 2, 2]])

### Visualizing a 5D Tensor
Lets say I have a tensor with dims $3 \times 2 \times 2 \times 3 \times 2$. As usual first I break it into blocks of shape $3 \times 2$. Then I take the next two dims, $2 \times 2$ and arrange my blocks in matrices of this shape. Then I take the next dim $3$ which is a single number, so I just stack these matrics on top of each other.

![5d](./visualizing_5D.png)

### Visualizing a 6D Tensor
In our final example lets take a tensor with dims $2 \times 3 \times 2 \times 2 \times 3 \times 2$. As usual we'll take the last two dims and make blocks of matrices out of those. So each block is of shape $3 \times 2$. Now we take the next 2 dims $2 \times 2$ and arrange our blocks in a matrix of this shape. And then we take the next 2 dims $2 \times 3$ and take our super-blocks and arrange them in a matrix of this shape.

![6D](./visualizing_6D.png)

## Generating Random Numbers

A pseudo random number generator is actually deterministic and it generates a sequence of random numbers. The current "random" number in the sequence is depedent on the last number that was generated, i.e., $r_{t+1} = f(r_t)$. Of course the first random that the PRNG object/module generates needs a starting point, i.e., $r_1 = f(r_0)$, but the PRNG needs $r_0$. By default most PRNG use some combination of the current datetime and other device-specific properties like the MAC address, etc. to come up with $r_0$. This ensures that if the PRNG is run again at some later date, the $r_0$ is different and thereby results in a different sequece of random numbers. This $r_0$ is called the random seed. For reproducibility, I can set the random seed to some specific value and that will result in the same sequence every time the PRNG runs. To see this in action see `random_gens.py`.

In [16]:
# If I restart the Python process, the same seed will result in the same random numbers.
rng = np.random.default_rng(12345)

### Generate random integers

[571 602 221 194 236 676]
[[617 942]
 [709 255]
 [915 949]]

In [17]:
# Generate random integers in range [10, 1000)
print(rng.integers(10, 1000, 6))
print(rng.integers(10, 1000, (3, 2)))

[702 235 790 323 212 799]
[[646 679]
 [988 397]
 [841 339]]


In [18]:
# Generate random floats
print(rng.random(3))
print(rng.random((3, 2)))

[0.598 0.187 0.673]
[[0.942 0.248]
 [0.949 0.667]
 [0.096 0.442]]


In [19]:
# Generate randm floats in range [a, b)
a = 3.2
b = 5.4
(b - a) * rng.random() + a

5.150255822520539

In [20]:
# Sample from a normal distribution with mean μ and standard deviation σ
μ = 3
σ = 5
rng.normal(loc=μ, scale=σ)

6.944221722596004

## Add Columns

I have a matrix A -
$$
    A = \begin{bmatrix}
    1 & 2 \\
    30 & 40 \\
    500 & 600 \\
    \end{bmatrix}
$$

I want to add a column -
$$
    b = \begin{bmatrix}
    9 \\
    90 \\
    900 \\
    \end{bmatrix}
$$

So the resultant matrxi will be -
$$
    C = \begin{bmatrix}
    1 & 2 & 9 \\
    30 & 40 & 90 \\
    500 & 600 & 900 \\
    \end{bmatrix}
$$

In [21]:
A = np.array([ [1, 2], [30, 40], [500, 600] ])
b = np.array([9, 90, 900]).reshape(3, 1)
print(A)
print(b)

[[  1   2]
 [ 30  40]
 [500 600]]
[[  9]
 [ 90]
 [900]]


In [22]:
C = np.concatenate((A, b), axis=1)
print(C)

[[  1   2   9]
 [ 30  40  90]
 [500 600 900]]


## Add Rows

I have a matrix A -

$$
    A = \begin{bmatrix}
    1 & 2 \\
    30 & 40 \\
    500 & 600 \\
    \end{bmatrix}
$$

And another row vector b -
$$
    b = \begin{bmatrix}
    7000 & 8000 \\
    \end{bmatrix}
$$

So the resultant matrix will be -
$$
    C = \begin{bmatrix}
    1 & 2 \\
    30 & 40 \\
    500 & 600 \\
    7000 & 8000 \\
    \end{bmatrix}
$$

In [23]:
# Even though I might think of [7000, 8000] as having 1 row and 2 columns
# i.e., having a shape (1, 2), actually it has a shape of (2,).
# Trying to concatenate [7000, 8000] with A will not work as-is. We need to reshape it (1, 2)
# first.
A = np.array([ [1, 2], [30, 40], [500, 600] ])
b = np.array([7000, 8000]).reshape(1, 2)
print(A, A.shape)
print(b, b.shape)

[[  1   2]
 [ 30  40]
 [500 600]] (3, 2)
[[7000 8000]] (1, 2)


In [24]:
C = np.concatenate((A, b), axis=0)
print(C)

[[   1    2]
 [  30   40]
 [ 500  600]
 [7000 8000]]


In [25]:
print(A.shape)
print(b.shape)

(3, 2)
(1, 2)


## To Pandas

In [26]:
import pandas as pd

In [27]:
A = np.random.randint(10, 1000, (3, 2))
b = np.array(['like', 'dislike', 'neutral']).reshape(3, 1)
C = np.concatenate((A, b), axis=1)
print(C)

[['502' '232' 'like']
 ['469' '324' 'dislike']
 ['592' '939' 'neutral']]


In [28]:
df = pd.DataFrame(C, columns=['Apples', 'Oranges', 'Rating'])
print(df.info())
print(df.describe())
df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Apples   3 non-null      object
 1   Oranges  3 non-null      object
 2   Rating   3 non-null      object
dtypes: object(3)
memory usage: 200.0+ bytes
None
       Apples Oranges Rating
count       3       3      3
unique      3       3      3
top       502     232   like
freq        1       1      1


Unnamed: 0,Apples,Oranges,Rating
0,502,232,like
1,469,324,dislike
2,592,939,neutral


## Meshgrids
Best explained by doing.

First lets look at a simple 2D example.

In [29]:
x = np.array([1, 2, 3])
y = np.array([11, 12])
xx, yy = np.meshgrid(x, y)

In [30]:
xx

array([[1, 2, 3],
       [1, 2, 3]])

In [31]:
yy

array([[11, 11, 11],
       [12, 12, 12]])

Now lets look at a 3D example.

In [32]:
x = np.linspace(0, 1, 5)
y = np.linspace(0, 1, 3)
z = np.linspace(0, 1, 2)
xx, yy, zz = np.meshgrid(x, y, z)
print(xx.shape, yy.shape, zz.shape)

(3, 5, 2) (3, 5, 2) (3, 5, 2)


The above code built 3 tensors each of shape (3, 5, 2). The xx tensor will have the x values repeated along the first axis, yy tensor will have the y values repeated along the second axis, and the zz tensor will have the z values repeated along the last axis. Below is how I visualize this.

The above code built three "cubes" that all have width of 5 units, height of 3 units, and depth of 2 units. Note, that the order of shapes is size_y, size_x, size_z. That is because it is listing the number of "rows" first, which corresponds with the "height" of the cube. 

Ordinarily I'd visualize this tensor as a 3x5 matrix having a vector of size 2 as its elements. However, that way of viewing is not going to be very helpful with a meshgrid tensor. It makes sense to view at as values on the x-y plane, y-z plane, and x-z plane. The size of these planes will be as follows:

  * x-y plane: 3x5
  * y-z plane: 2x3
  * x-z plane: 2x5

The xx tensor will have the x values repeated in these planes, yy tensor will have the y values, and the zz tensor will have the z values.

In [33]:
def print_xy(t, z_ndx):
    print(t[:,:,z_ndx])

def print_yz(t, x_ndx):
    print(t[:,x_ndx,:].T)  # Transposing because I want to view y as cols
    
def print_xz(t, y_ndx):
    print(t[y_ndx, :, :].T)  # Transposing because I want to view x as cols

Let us examine the xx cube. It should have the following sheet (lying on the x-y plane):

```
0.0, 0.25, 0.5, 0.75, 1.0
0.0, 0.25, 0.5, 0.75, 1.0
0.0, 0.25, 0.5, 0.75, 1.0
```

duplicated as another layer on top (along the z-axis). So viewed from the y-z plane we should just see 

```
0.0, 0.0, 0.0
0.0, 0.0, 0.0
```

And on the x-z plane we should just see

```
0.0, 0.25, 0.5, 0.75, 1.0
0.0, 0.25, 0.5, 0.75, 1.0
```

Lets verify that is indeed the case.

In [34]:
print('View of the xx cube from the top (x-y plane) with the cube sliced at z=0')
print_xy(xx, 0)

print('\nView of the xx cube from the side (y-z plane) with the cube sliced at x=0')
print_yz(xx, 0)

print('\nView of the xx cube from the other side (x-z plane) with the cube sliced at y=0')
print_xz(xx, 0)

View of the xx cube from the top (x-y plane) with the cube sliced at z=0
[[0.   0.25 0.5  0.75 1.  ]
 [0.   0.25 0.5  0.75 1.  ]
 [0.   0.25 0.5  0.75 1.  ]]

View of the xx cube from the side (y-z plane) with the cube sliced at x=0
[[0. 0. 0.]
 [0. 0. 0.]]

View of the xx cube from the other side (x-z plane) with the cube sliced at y=0
[[0.   0.25 0.5  0.75 1.  ]
 [0.   0.25 0.5  0.75 1.  ]]


Let us now examine the yy cube. It should have the following sheet on the x-y plane

```
0.0, 0.0, 0.0, 0.0, 0.0
0.5, 0.5, 0.5, 0.5, 0.5
1.0, 1.0, 1.0, 1.0, 1.0
```

lying on top of each other (along the z-axis). This means that we'll get this same layer whether we slice the cube at z=0 or z=1.

Viewing the cube from the y-z plane we should see

```
0.0, 0.5, 1.0
0.0, 0.5, 1.0
```

Again, we should get the same layer regardless of where we slice the cube, whether at x=0, 1, 2, or 3.

And viewing from the x-z plane we should see

```
0.0, 0.0, 0.0, 0.0, 0.0
0.0, 0.0, 0.0, 0.0, 0.0
```

But if we slice the x-z plane at x=1 we will get a different layer

```
0.5, 0.5, 0.5, 0.5, 0.5
0.5, 0.5, 0.5, 0.5, 0.5
```

Lets verify that this is indeed the case

In [35]:
print('View of the yy cube from the top (x-y plane) with the cube sliced at z=1')
print_xy(yy, 1)

print('\nView of the yy cube from the side (y-z plane) with the cube sliced at x=0')
print_yz(yy, 0)

print('\nView of the yy cube from the side (x-z plane) with the cube sliced at y=0')
print_xz(yy, 1)

View of the yy cube from the top (x-y plane) with the cube sliced at z=1
[[0.  0.  0.  0.  0. ]
 [0.5 0.5 0.5 0.5 0.5]
 [1.  1.  1.  1.  1. ]]

View of the yy cube from the side (y-z plane) with the cube sliced at x=0
[[0.  0.5 1. ]
 [0.  0.5 1. ]]

View of the yy cube from the side (x-z plane) with the cube sliced at y=0
[[0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5]]


Finally lets examine the zz cube. 

Looking at it from our usual top-view slicing the cube at z=0 we will see

```
0.0, 0.0, 0.0, 0.0, 0.0
0.0, 0.0, 0.0, 0.0, 0.0
0.0, 0.0, 0.0, 0.0, 0.0
```

But if we slice the cube at z=1 we see
```
1.0, 1.0, 1.0, 1.0, 1.0
1.0, 1.0, 1.0, 1.0, 1.0
1.0, 1.0, 1.0, 1.0, 1.0
```

The best way to visualize this is sheets in the y-z plane stacked one after the other along the x-axis. Here is what the sheet looks like -

```
0.0, 0.0, 0.0
1.0, 1.0, 1.0
```

Because this sheet is duplicated along the x-axis, we get the same layer for different values of x.

Looking at it from the other side (x-z plane) we see -

```
0.0, 0.0, 0.0, 0.0, 0.0
1.0, 1.0, 1.0, 1.0, 1.0
```

In [36]:
print('View of the zz cube from the top (x-y plane) with the cube sliced at z=0')
print_xy(zz, 0)

print('\nSame top view but with the cube sliced at z=1')
print_xy(zz, 1)

print('\nView of the zz cube from the side (y-z plane) with the cube sliced at x=0')
print_yz(zz, 0)

print('\nView of the zz cube from the other side (x-z plane) with the cube sliced at y=0')
print_xz(zz, 0)


View of the zz cube from the top (x-y plane) with the cube sliced at z=0
[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]

Same top view but with the cube sliced at z=1
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]

View of the zz cube from the side (y-z plane) with the cube sliced at x=0
[[0. 0. 0.]
 [1. 1. 1.]]

View of the zz cube from the other side (x-z plane) with the cube sliced at y=0
[[0. 0. 0. 0. 0.]
 [1. 1. 1. 1. 1.]]


## Get indexes of some elements

In [37]:
a = np.array([
    [1, 2, 3],
    [4, 5, 0],
    [3, 0, 1]
])

In [38]:
np.argwhere(a == 0)

array([[1, 2],
       [2, 1]])

In [39]:
a = np.array([1, 2, 3, 0, 4, 0, 5])
np.argwhere(a == 0)

array([[3],
       [5]])

In [40]:
np.flatnonzero(a)

array([0, 1, 2, 4, 6])

There is a function called `nonzero` but its output is pretty non-intuitive. I could not make sense of it. Better to use `argwhere` and reshape the output as desired.

## Get some indexes in a tensor
Given a matrix 

$$
\begin{bmatrix}
a_{00} \; a_{01} \; a_{02} \\
a_{10} \; a_{11} \; a_{12} \\
a_{20} \; a_{21} \; a_{22} \\
a_{30} \; a_{31} \; a_{32} \\
a_{40} \; a_{41} \; a_{42} \\
\end{bmatrix}
$$

To get a subset of the matrix in red
$$
\begin{bmatrix}
a_{00} \; a_{01} \; a_{02} \\
a_{10} \; a_{11} \; a_{12} \\
\color{red}{a_{20} \; a_{21}} \; a_{22} \\
\color{red}{a_{30} \; a_{31}} \; a_{32} \\
a_{40} \; a_{41} \; a_{42} \\
\end{bmatrix}
$$



In [41]:
a = np.arange(15).reshape(5, 3)
a

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [42]:
a[[2, 3]][:, [0, 1]]

array([[ 6,  7],
       [ 9, 10]])

However to get elements in pink
$$
\begin{bmatrix}
a_{00} \; a_{01} \; a_{02} \\
a_{10} \; a_{11} \; a_{12} \\
\color{pink}{a_{20}} \; a_{21} \; a_{22} \\
a_{30} \; \color{pink}{a_{31}} \; a_{32} \\
a_{40} \; a_{41} \; a_{42} \\
\end{bmatrix}
$$

In [43]:
a[[2, 3], [0, 1]]

array([ 6, 10])

## Einsum
Multiply elements and sum them. `ij, jk -> ik` this means $X_{ik} = \sum_j A_{ik} \times B_{jk}$. This is to say that lets fix the values of $i$ and $k$, and then pick elements from $A$ and $B$ by varying $j$ along the specified dimension, multiply these elements, and finally add everything up. This is nothing but the familiar matrix multiplication. Whatever dims do not show up in the output are the ones we need to vary  and sum over. The multiplication is indicated by the number of input elements (2 in this case).

In [44]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
B = np.array([
    [10, 11],
    [12, 13],
    [14, 15]
])
X = np.einsum("ij, jk -> ik", A, B)
X

array([[ 76,  82],
       [184, 199]])

In [45]:
A @ B

array([[ 76,  82],
       [184, 199]])

### Permutation
`ij -> ji` simply makes the rows as columns and vice versa $X_{ji} = A_{ij}$. Both the dims show up in the output, so there is nothing to sum over. And because there is only element in the input, there is nothing to multiply either. This looks like a transpose but can be much more general than that. For a multi-dim tensor `...ij -> ...ji`, keep all the other dims as-is, but for the last 2 dims flip the rows and cols. Similarly `ij... -> ji...` flips the first two rows and cols keeping everything inside as-is.

In [46]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
np.einsum("ij -> ji", A)

array([[1, 4],
       [2, 5],
       [3, 6]])

In [47]:
A = np.arange(12).reshape(2, 2, 3)
A

array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

In [48]:
np.einsum("...ij -> ...ji", A)

array([[[ 0,  3],
        [ 1,  4],
        [ 2,  5]],

       [[ 6,  9],
        [ 7, 10],
        [ 8, 11]]])

In [49]:
np.einsum("ij... -> ji...", A)

array([[[ 0,  1,  2],
        [ 6,  7,  8]],

       [[ 3,  4,  5],
        [ 9, 10, 11]]])

In [50]:
A[0,0]

array([0, 1, 2])

In [51]:
A[0,1]

array([3, 4, 5])

In [52]:
A[1,0]

array([6, 7, 8])

In [53]:
A[1,1]

array([ 9, 10, 11])

### Summation
Sum all the elements of a tensor. Now the output is not another tensor but a scalar so this is written as `ij ->`. Both the inputs do not show up in the output, so both must be summed over. And there is only element in the input, so nothing to multiply. $X = \sum_i \sum_j A_{ij}$.

In [54]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
np.einsum("ij ->", A)

21

### Column Sum
Sum the columns of the input tensor. For a 2D input tensor, the output will be a 1D vector. And there is nothing to multiply. `ij -> j` $X_j = \sum_j A_{ij}$.

In [55]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
np.einsum("ij -> j", A)

array([5, 7, 9])

### Row Sum
Similar logic as before `ij -> i` is $X_i = \sum_j A_{ij}$.

In [56]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
np.einsum("ij -> i", A)

array([ 6, 15])

### Matrix-Vector Multiplication
`ij, j -> i` is $X_i = \sum_j A_{ij} v_{j}$. The output is a single dimension vector and we want to take our cursor and vary it along the columns, i.e., slide it along the row in the first matrix for a fixed $i$. There are two elements in the input, so these must be multiplied prior to summing.

In [57]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
v = np.array([7, 8, 9])
print(A @ v.T)
print(np.einsum("ij, j -> i", A, v))

[ 50 122]
[ 50 122]


### Dot Product
`i,i ->`. This is saying that the result is a scalar, hence no dims in the output. There are two elements in the input, so they must be multiplied. And $i$ does not show up in the output, so the summation needs to happen by varying $i$. $x = \sum_i u_i v_i$.

In [58]:
u = np.array([1, 2, 3])
v = np.array([4, 5, 6])
print(u.T @ v)
np.einsum("i,i ->", u, v)

32


32

In [59]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
np.einsum("ij,ij ->", A, A)

91

In [60]:
np.sum(A.flatten() ** 2)

91

### Hadamard Product
This is elementwise product of two matrices of the same shape. `ij, ij -> ij`. This is saying that the output is a 2D matrix as well. The input has two elements so they must be multiplied. There is nothing that shows up in the input but not in the output, so no summation. $X_{ij} = A_{ij} B_{ij}$.

In [61]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
B = np.array([
    [7, 8, 9],
    [10, 11, 12]
])
print(A * B)
print(np.einsum("ij, ij -> ij", A, B))

[[ 7 16 27]
 [40 55 72]]
[[ 7 16 27]
 [40 55 72]]


### Outer Product
Outer product of two (column) vectors is $uv^T$. `i, j -> ij`. This tells us that the output is a 2D tensor and inputs are 1D each. There are two elements in the input, so they must be multiplied together. And finally, there are no missing indexes in the output, so nothing to sum over. $X_{ij} = u_i v_j$.

In [62]:
u = np.array([1, 2, 3])
v = np.array([4, 5, 6])
print(u.reshape(-1, 1) @ v.reshape(-1, 1).T)
print(np.outer(u, v))
print(np.einsum("i, j -> ij", u, v))

[[ 4  5  6]
 [ 8 10 12]
 [12 15 18]]
[[ 4  5  6]
 [ 8 10 12]
 [12 15 18]]
[[ 4  5  6]
 [ 8 10 12]
 [12 15 18]]


### Batch Multiply
Lets say I have a batch of photos $m \times h \times w$ and I want to multiply this with an FC layer that is $w \times c$. `mhw, wc -> mhc` is $X_{mhc} = \sum_w A_{mhw} F_{wc}$. A simpler and more intuitive way to say this is `...ij, ...jk -> ik`, i.e., keep the outer dims as-is, I don't care how many there are, but take the two inner most dims and do a matrix multiplication of those.

In [63]:
A = np.array([
    [
        [1, 2, 3],
        [4, 5, 6]
    ],
    [
        [7, 8, 9],
        [10, 11, 12]
    ],
])
A.shape

(2, 2, 3)

In [64]:
W = np.array([
    [0.1, 0.2],
    [0.3, 0.4],
    [0.5, 0.6]
])
W.shape

(3, 2)

In [65]:
A[0] @ W

array([[2.2, 2.8],
       [4.9, 6.4]])

In [66]:
A[1] @ W

array([[ 7.6, 10. ],
       [10.3, 13.6]])

In [67]:
X = np.array([A[0] @ W, A[1] @ W])
X

array([[[ 2.2,  2.8],
        [ 4.9,  6.4]],

       [[ 7.6, 10. ],
        [10.3, 13.6]]])

In [68]:
np.einsum("...ij, ...jk -> ...ik", A, W)

array([[[ 2.2,  2.8],
        [ 4.9,  6.4]],

       [[ 7.6, 10. ],
        [10.3, 13.6]]])

In [69]:
np.einsum("mhw, wc -> mhc", A, W)

array([[[ 2.2,  2.8],
        [ 4.9,  6.4]],

       [[ 7.6, 10. ],
        [10.3, 13.6]]])

In [70]:
A.shape

(2, 2, 3)

In [71]:
_, a, b, c = (1, 2, 3, 4, 5, 6)

ValueError: too many values to unpack (expected 4)

In [None]:
a

4

In [None]:
b

5

In [None]:
c

6