Reference:
https://tiensu.github.io/blog/59_self-attention/
(Phần tài liệu này nói theo buyơcs sẽ rất dễ hình dung)

https://viblo.asia/p/kham-pha-suc-manh-cua-co-che-self-attention-trong-transformers-BQyJKj9R4Me
(Đi sâu vào từng phần cũng như là khái niệm trong thuật toán)

https://sebastianraschka.com/blog/2023/self-attention-from-scratch.html
(Tài liệu tiếng anh đi sâu và chi tiết)


In [None]:
sentence = 'Life is short, eat dessert first'

dc = {s:i for i,s in enumerate(sorted(sentence.replace(',', '').split()))}
print(dc)

{'Life': 0, 'dessert': 1, 'eat': 2, 'first': 3, 'is': 4, 'short': 5}


In [None]:
import torch

sentence_int = torch.tensor([dc[s] for s in sentence.replace(',', '').split()])
print(sentence_int)

tensor([0, 4, 5, 2, 1, 3])


Now, using the integer-vector representation of the input sentence, we can use an embedding layer to encode the inputs into a real-vector embedding. Here, we will use a 16-dimensional embedding such that each input word is represented by a 16-dimensional vector. Since the sentence consists of 6 words, this will result in a 6×16-dimensional embedding

In [None]:
torch.manual_seed(123)
embed = torch.nn.Embedding(6, 16)
embedded_sentence = embed(sentence_int).detach()

print(embedded_sentence)
print(embedded_sentence.shape)

tensor([[ 0.3374, -0.1778, -0.3035, -0.5880,  0.3486,  0.6603, -0.2196, -0.3792,
          0.7671, -1.1925,  0.6984, -1.4097,  0.1794,  1.8951,  0.4954,  0.2692],
        [ 0.5146,  0.9938, -0.2587, -1.0826, -0.0444,  1.6236, -2.3229,  1.0878,
          0.6716,  0.6933, -0.9487, -0.0765, -0.1526,  0.1167,  0.4403, -1.4465],
        [ 0.2553, -0.5496,  1.0042,  0.8272, -0.3948,  0.4892, -0.2168, -1.7472,
         -1.6025, -1.0764,  0.9031, -0.7218, -0.5951, -0.7112,  0.6230, -1.3729],
        [-1.3250,  0.1784, -2.1338,  1.0524, -0.3885, -0.9343, -0.4991, -1.0867,
          0.8805,  1.5542,  0.6266, -0.1755,  0.0983, -0.0935,  0.2662, -0.5850],
        [-0.0770, -1.0205, -0.1690,  0.9178,  1.5810,  1.3010,  1.2753, -0.2010,
          0.4965, -1.5723,  0.9666, -1.1481, -1.1589,  0.3255, -0.6315, -2.8400],
        [ 0.8768,  1.6221, -1.4779,  1.1331, -1.2203,  1.3139,  1.0533,  0.1388,
          2.2473, -0.8036, -0.2808,  0.7697, -0.6596, -0.7979,  0.1838,  0.2293]])
torch.Size([6, 16])


# Defining the Weight Matrices

In [None]:
# Khởi tạo các tensor weights
torch.manual_seed(123)

d = embedded_sentence.shape[1]

d_q, d_k, d_v = 24, 24, 28

W_query = torch.nn.Parameter(torch.rand(d_q, d))
W_key = torch.nn.Parameter(torch.rand(d_k, d))
W_value = torch.nn.Parameter(torch.rand(d_v, d))

print(d)

16


In [None]:
print(W_query)
print(W_key)
print(W_value)

Parameter containing:
tensor([[0.2961, 0.5166, 0.2517, 0.6886, 0.0740, 0.8665, 0.1366, 0.1025, 0.1841,
         0.7264, 0.3153, 0.6871, 0.0756, 0.1966, 0.3164, 0.4017],
        [0.1186, 0.8274, 0.3821, 0.6605, 0.8536, 0.5932, 0.6367, 0.9826, 0.2745,
         0.6584, 0.2775, 0.8573, 0.8993, 0.0390, 0.9268, 0.7388],
        [0.7179, 0.7058, 0.9156, 0.4340, 0.0772, 0.3565, 0.1479, 0.5331, 0.4066,
         0.2318, 0.4545, 0.9737, 0.4606, 0.5159, 0.4220, 0.5786],
        [0.9455, 0.8057, 0.6775, 0.6087, 0.6179, 0.6932, 0.4354, 0.0353, 0.1908,
         0.9268, 0.5299, 0.0950, 0.5789, 0.9131, 0.0275, 0.1634],
        [0.3009, 0.5201, 0.3834, 0.4451, 0.0126, 0.7341, 0.9389, 0.8056, 0.1459,
         0.0969, 0.7076, 0.5112, 0.7050, 0.0114, 0.4702, 0.8526],
        [0.7320, 0.5183, 0.5983, 0.4527, 0.2251, 0.3111, 0.1955, 0.9153, 0.7751,
         0.6749, 0.1166, 0.8858, 0.6568, 0.8459, 0.3033, 0.6060],
        [0.9882, 0.8363, 0.9010, 0.3950, 0.8809, 0.1084, 0.5432, 0.2185, 0.3834,
         0.3720

In [None]:
print(W_query.shape)
print(W_key.shape)
print(W_value.shape)

torch.Size([24, 16])
torch.Size([24, 16])
torch.Size([28, 16])
