# Self-Attention

Paper: `Transformer` Attention is All you need (NIPS 2017)

Code:
- [官方TensorFlow实现](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py)
- [Pytorch实现](https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/master/transformer/SubLayers.py)

Reference:
- `Enzo_Mi` [Multi-Head Attention | 算法 + 代码](https://www.bilibili.com/video/BV1qo4y1F7Ep)
- `黑白` [Transformer代码及解析(Pytorch)](https://zhuanlan.zhihu.com/p/345993564)
- `于建民` [The Illustrated Transformer【译】](https://blog.csdn.net/yujianmin1990/article/details/85221271)
- `Jay Alammar` [The Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/)

In [6]:
import torch
import torch.nn as nn
class SelfAttention(nn.Module):
    def __init__(self, dim, dk, dv):
        super(SelfAttention, self).__init__()
        self.scale = dk ** -0.5
        self.q = nn.Linear(dim, dk)
        self.k = nn.Linear(dim, dk)
        self.v = nn.Linear(dim, dv)
    
    def forward(self, x):
        q = self.q(x)
        k = self.k(x)
        v = self.v(x)
        
        attn = q @ k.transpose(-2,-1) * self.scale
        attn = attn.softmax(dim=-1)
        
        x = attn @ v
        return x

att = SelfAttention(dim=2,dk=2,dv=3)
x = torch.rand((1,4,2))
output = att(x)
print(x, '\n', output)

tensor([[[0.6832, 0.2191],
         [0.3721, 0.6172],
         [0.1940, 0.8315],
         [0.5647, 0.7821]]]) 
 tensor([[[ 0.2901, -0.0998, -0.4378],
         [ 0.2862, -0.0993, -0.4376],
         [ 0.2842, -0.0991, -0.4374],
         [ 0.2801, -0.0991, -0.4383]]], grad_fn=<UnsafeViewBackward0>)


In [1]:
ll = nn.Linear(2, 5) # Linear 就是把 [...,2] 最后一维经过全连接层修改最后一维的维度为 [...,5]

x_in = torch.rand((1,4,2))
x_out = ll(x)

print(x_in, '\n', x_out)

NameError: name 'nn' is not defined