In [1]:
import torch
import numpy as np

In [2]:
t = torch.zeros((3,3))
for i in range(3):
    for j in range(3):
        t[i,j] = np.random.randint(1, 16)
t

tensor([[11., 11.,  1.],
        [ 7., 13.,  2.],
        [ 4.,  3.,  5.]])

In [3]:
s = t.sum(1)
s

tensor([23., 22., 12.])

In [4]:
s.shape

torch.Size([3])

In [5]:
o = t / s
o

tensor([[0.4783, 0.5000, 0.0833],
        [0.3043, 0.5909, 0.1667],
        [0.1739, 0.1364, 0.4167]])

According to [broadcastring semantics](https://pytorch.org/docs/stable/notes/broadcasting.html), `o = t / s` undergoes the following steps:
1. `t / s`, from the shape perspective, is `[3,3] / [3]`. The dimensions are aligned to start at the trailing dimension, so it becomes<br> `[3,3] / [NonExistent, 3]`, which is valid for broadcasting. The non-existent dimension is converted to 1, so it becomes `[3,3] / [1,3]`.
2. Then the dimensions are aligned to fit together by making `s` of shape `[3,3]`. Since the non-compatible dimension was the dimension-0(rows), tensor `s` gets converted to a `[3,3]` tensor by copying and pasting the initial 1D row 3 times, i.e <br>
$
\begin{bmatrix}
21 & 30 & 32
\end{bmatrix}
$
<br>
&emsp;&emsp;&emsp;$\downarrow$
<br>
$
\begin{bmatrix}
21 & 30 & 32\\
21 & 30 & 32\\
21 & 30 & 32
\end{bmatrix}
$
3. `t` gets element-wise divided the new aligned `s`.

But `s` is not being normalized, since the rows get divided by wrong values because of unforseen broadcastring operations. The result of the division is:

$
\begin{bmatrix}
6/21 & 7/30 & 8/32\\
6/21 & 12/30 & 12/32\\
14/21 & 13/30 & 5/32
\end{bmatrix}
$

In [6]:
for i in range(3):
    print(o[i].sum())

tensor(1.0616)
tensor(1.0619)
tensor(0.7269)


This happened because `keepdim=False` by default. The `keepdim=True` squizes the dimension along which the sum was performed to 1, instead of completely reducing it.

If we set `keepdim=True`, broadcasting semantics process this operation in the following way:
1. `t / s`, from the shape perspective, is `[3,3] / [3, 1]`, which is valid for broadcasting.
2. Since the non-compatible dimension was the dimension-1(columns), tensor `s` gets converted to a `[3,3]` tensor by copying and pasting the initial ***column*** 3 times, i.e <br>
$
\begin{bmatrix}
21\\
30\\
32
\end{bmatrix}
$
<br>
&emsp;&emsp;&emsp;$\downarrow$
<br>
$
\begin{bmatrix}
21 & 21 & 21\\
30 & 30 & 30\\
32 & 32 & 32
\end{bmatrix}
$


This way, each row entry gets divided by the its respective row sum
$
\begin{bmatrix}
6/21 & 7/21 & 8/21\\
6/30 & 12/30 & 12/30\\
14/32 & 13/32 & 5/32
\end{bmatrix}
$

In [7]:
s = t.sum(1, keepdim=True)
s

tensor([[23.],
        [22.],
        [12.]])

In [8]:
s.shape

torch.Size([3, 1])

In [9]:
o = t / s
o

tensor([[0.4783, 0.4783, 0.0435],
        [0.3182, 0.5909, 0.0909],
        [0.3333, 0.2500, 0.4167]])

In [10]:
for i in range(3):
    print(o[i].sum()) # each row sums up to 1.0

tensor(1.)
tensor(1.)
tensor(1.)
