In [6]:
from IPython.display import Image
import torch
import torch.nn as nn

In this notebook we will go over DL fundamental concept for pytorch
- Understanding torch.nn.Linear
- Understanding torch.nn.Embedding
- Axis and dim for tensor

# torch.nn.Linear

`nn.Linear(5, 3)` creates a linear transformation that map  $R^5$  → $R^3$ ⇒ which means that `nn.linear(5, 3)` will initialize a weight matrix with shape (3, 5)

Furthermore, `nn.Linear(5, 3)` creates a linear transformation A where transform the matrix X with $A\cdot X+b$ . In other words, A will transform (N, 5) matrix into a (N, 3) matrix, where N can be anything (number of observations). Where 5 is input feature number and 3 is the output feature number

![example](img/Linear_Transformation.png)

Notice that in Linear algebra class, we treat  $R^5$ as column vector like above but in pytorch and tensorflow we treat them as row vector: `[x1, x2, x3, x4, x5]`

In [3]:
import torch.nn as nn
W = nn.Linear(5, 3) # w matrix shape(3, 5)

W.weight.shape

torch.Size([3, 5])

In [4]:
# convert back to numpy, detached() if tensor requires grad
W.weight.detach().numpy()

array([[ 0.42570674,  0.20171326, -0.19392422, -0.06542477, -0.05925357],
       [ 0.02411893, -0.1895127 ,  0.22421086,  0.01469514, -0.06337062],
       [-0.00068069, -0.17809209, -0.06550798,  0.43950218,  0.2640894 ]],
      dtype=float32)

In [None]:
X = torch.rand(100, 5) # R^5


Y = nn.Linear(x)
print(Y.weight.shape) # R^3

![example](img/nnLinear_mult.png)


---

# nn.Embedding

given a list of ids we can "look up" the embedding corresponing to each id
can you see that some vectors are the same?

So from embed matrix, we pick row `1, 4, 1, 5, 1, and 0` and then stak them all together

This trick is heavily used in NLP. for example:

## Code snippet

The layer in your code snippet does essentially this:

- creates two lookup tables in `__init__`
- the layer is called with input of shape `(batch_size, 2)`:
    - first column contains indices of user embeddings
    - second columns contains indices of movies embeddings
- those embeddings are multiplied and summed returning `(batch_size,)` (so it's different from `nn.Linear` which would return `(batch_size, out_features)` and do dot product instead of element-wise multiplication followed by summation like here)

This is probably used to train both representations (of users and movies) for some recommender-like system.

## The Difference between nn.Linear and nn.Embedding

> - 前者主要做线性加权，后者是embedding层，支持按索引检索
> - 其实根本区别在于输入，nn.Linear的输入为一个向量，输出也为一个向量，向量的各个维的元素的取值范围为连续的。而nn.Embedding的输入只能为离散值，只需要输入一个离散值也能获取结果，而这个离散值实际上相当于取one-hot之后的向量(look up table)。

In nn.Embeddings back propagation wouldnt happen on the entire matrix. Back propagation will be done only on the rows of the embedding matrix whose indices are passed.

So, less computation is required in this case since we only do paramter update on  embedding matrix whose indices are passed `(Look up table)`

At theoretical level, the embedding layer is a linear layer, there is not any difference at all. However, in practice, if you are building a deep learning software, you have to make a difference among them. This is because it does not make sense to apply an embedding layer using traditional matrix multiplication, as the input matrix is very sparse. For this reason, it is faster to do a look-up, although in terms of theory it is equivalent to doing a matrix multiplication.

Essentially everything. `[torch.nn.Embedding](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html)` is a lookup table; essentially works the same as `torch.Tensor` but with a few twists (like possibility to use sparse embedding or default value at specified index).

---

# Dimension (axis = 0, 1, 2)

https://towardsdatascience.com/understanding-dimensions-in-pytorch-6edf9972d3be

The key to grasp how *dim* in PyTorch and *axis* in NumPy work was this paragraph from Aerin’s article:

> The way to understand the “axis” of numpy sum is that it collapses the specified axis. So when it collapses the axis 0 (the row), it becomes just one row (it sums column-wise).

Numpy and torcharray sum that Axis:

`ndarrays` also have several features you’d expect from an n-dimensional array; each ndarray has n axes, indexed from 0, so that the first axis is 0, the second is 1, and so on. In particular, since we deal with 2D ndarrays often, we can think of axis = 0 as the rows and axis = 1 as the columns—see Figure 1-3.

This is for 2D array:

![example](img/2d_axis.png)

This is for 3D array axis: `(0, 1, 2) => (batch_size, sentence_len, embedding_dim)`

- 0: batch_size, 1: sentence_len, 2: embedding_dim
- Each matrix is a numerical representation of sentence
- If batch_size = 3, there're 3 sentences inside this batch

![example](img/3d_axis.png)

Here is the original Tensor with 3 dimensions shape = (3, 2, 3)



# Sum Over Dim = 0

since we sum over `dim = 0`, so `dim = 0 -> 3` will be summed and result will be `shape = (2, 3)`

---
```python
>> torch.sum(y, dim=0)
tensor([[ 3,  6,  9],
        [12, 15, 18]])

dimensionindex = (0, 1, 2)
original shape = (3, 2, 3)

new shape = (2, 3)
# notice that the dimension index = 0 will be clipped so ouput dim = (2, 3)
```


![example](img/chrome-capture.gif)

---

# Sum Over Dim = 1

```python
>> torch.sum(y, dim=1)
tensor([[5, 7, 9],
        [5, 7, 9],
        [5, 7, 9]])
```

![example](img/dim1.gif)


# Sum over Dim = 2

![example](img/dim2.gif)

```
>> torch.sum(y, dim=2)tensor([[ 6, 15],
        [ 6, 15],
        [ 6, 15]])

(3, 2, 3) -> (3 , 3) dim_index = 2(the last dimension index), which is 3 will be collasped

so in the end shape = (3, 2) 
```

Recall that we when we have a word embedding let's say `feature dimension = 4` like below:

```python
word_embedding = [
				hello -0.121 -.431 .97712 .3343
				world .....
				ball  .12121 .23223 .2324 .5742
									] 
```

and we use `np.mean(word_embedding, axis = 0)` which means we sum up across the row $\sum_{i=1}^{n} x_{ij}$ (along the i axis → going down)and then take the average.

# Reference Link
- https://towardsdatascience.com/understanding-dimensions-in-pytorch-6edf9972d3be

