词序列以二维矩阵形式存储[seq_len, d_model]
在第pos行的第2i/2i+1维的位置编码是：
pe(pos,2i)=sin(pos / 10000^(2i/d_model))
pe(pos,2i+1) = cos(pos / 10000^(2i / d_model))
可见：
- 每列/序列中的词在同一维的位置编码用同一个正余弦函数，周期一致
- 这个周期从低维到高维是等比数列，从2pi到10000*2pi，公比是10000^(1/d_model)，并且每两个维度为一组
- 等比数列构成几何级数（geometric progression）
- 低维度的周期小，在序列中会有重复，但是每个词的位置编码是所有维度共同完成的
- 低维度短周期，捕捉局部信息；高维度长周期，捕捉全局信息。互为补充
- 对数字的二进制编码中，也是低位周期小，高位周期长，公比是2

In [1]:
import tensorflow as tf
import numpy as np

2025-01-10 16:16:48.134814: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [10]:
max_seq_len, d_model = 10, 16

In [3]:
np.arange(max_seq_len)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [8]:
pos = np.arange(max_seq_len)[:,np.newaxis]
pos, pos.shape

(array([[0],
        [1],
        [2],
        [3],
        [4],
        [5],
        [6],
        [7],
        [8],
        [9]]),
 (10, 1))

In [12]:
i = np.arange(d_model)[np.newaxis,:]
i, i.shape

(array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15]]),
 (1, 16))

In [13]:
positional_encoding = np.zeros((max_seq_len, d_model))
angle_rates = 1 / np.power(10000, (2 * (i // 2)) / np.float32(d_model))

In [16]:
pos * i

array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0],
       [  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
         13,  14,  15],
       [  0,   2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24,
         26,  28,  30],
       [  0,   3,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36,
         39,  42,  45],
       [  0,   4,   8,  12,  16,  20,  24,  28,  32,  36,  40,  44,  48,
         52,  56,  60],
       [  0,   5,  10,  15,  20,  25,  30,  35,  40,  45,  50,  55,  60,
         65,  70,  75],
       [  0,   6,  12,  18,  24,  30,  36,  42,  48,  54,  60,  66,  72,
         78,  84,  90],
       [  0,   7,  14,  21,  28,  35,  42,  49,  56,  63,  70,  77,  84,
         91,  98, 105],
       [  0,   8,  16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96,
        104, 112, 120],
       [  0,   9,  18,  27,  36,  45,  54,  63,  72,  81,  90,  99, 108,
        117, 126, 135]])

In [17]:
angle_rates.shape

(1, 16)

In [19]:
positional_encoding[:, 0::2] = np.sin(pos * angle_rates[:, 0::2]) # 偶数 正弦
positional_encoding[:, 1::2] = np.cos(pos * angle_rates[:, 1::2])

In [20]:
positional_encoding = tf.cast(positional_encoding, tf.float32)

In [28]:
def get_positional_encoding(max_seq_len, d_model):
    positional_encoding = np.zeros((max_seq_len, d_model))
    pos = np.arange(max_seq_len)[:, np.newaxis]
    i = np.arange(d_model)[np.newaxis, :]
    angle_rates = 1 / np.power(10000, 2 * (i//2) / np.float32(d_model))
    positional_encoding[:,0::2] = np.sin(pos * angle_rates[:, 0::2] )
    positional_encoding[:,1::2] = np.cos(pos * angle_rates[:, 1::2])
    positional_encoding = tf.cast(positional_encoding, tf.float32)
    return positional_encoding
new_position_encoding = get_positional_encoding(max_seq_len, d_model)

In [29]:
tolerance = 1e-3
np.abs(new_position_encoding - positional_encoding) < tolerance

array([[ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  Tru