In [1]:
def sinusoidal_embeddings(pos, dim, base=10000):
    """Compute the sinusoidal embeddings for the position 'pos' in 'dim' dimensions.
    
    Args:
        pos (tf.Tensor): The position for which to compute embeddings.
        dim (int): The number of dimensions for the embeddings.
        base (int): The base value for computation (default is 10,000).

    Returns:
        tf.Tensor: Sinusoidal embeddings for the given position.

    Explanation:
    This function calculates sinusoidal positional embeddings for a given position 'pos' in 'dim' dimensions.
    It is commonly used in transformers for encoding positional information.
    """
    # Ensure the number of dimensions is even
    assert dim % 2 == 0
    
    # Generate indices for computation
    indices = K.arange(0, dim // 2, dtype=K.floatx())
    
    # Compute the indices with the base factor
    indices = K.pow(K.cast(base, K.floatx()), -2 * indices / dim)
    
    # Perform a matrix multiplication to generate embeddings
    embeddings = tf.einsum('...,d->...d', pos, indices)
    
    # Stack sin and cos components together
    embeddings = K.stack([K.sin(embeddings), K.cos(embeddings)], axis=-1)
    
    # Flatten the embeddings
    embeddings = K.flatten(embeddings, -2)
    
    return embeddings


Now, let's walk through each step with a toy example:

**Step 1: Ensure the number of dimensions is even**
```python
assert dim % 2 == 0
```
This code checks if the number of dimensions `dim` is even. In positional embeddings, it's common to have an even number of dimensions because each dimension is split into sine and cosine components. If `dim` is not even, it raises an error.

**Step 2: Generate indices for computation**
```python
indices = K.arange(0, dim // 2, dtype=K.floatx())
```
Here, `indices` is generated as a sequence of values from 0 to `dim // 2` (half the number of dimensions). These indices are used in the subsequent calculations.

**Step 3: Compute the indices with the base factor**
```python
indices = K.pow(K.cast(base, K.floatx()), -2 * indices / dim)
```
In this step, we apply a mathematical formula to the `indices` using the specified `base` value. The formula is a power operation and involves exponentiation and division. It generates a set of values that will be used in the embeddings calculation.

**Step 4: Perform a matrix multiplication to generate embeddings**
```python
embeddings = tf.einsum('...,d->...d', pos, indices)
```
Here, we perform a matrix multiplication between the positional values `pos` and the precomputed `indices`. This matrix multiplication results in embeddings that have both sine and cosine components for each position in the sequence.

**Step 5: Stack sin and cos components together**
```python
embeddings = K.stack([K.sin(embeddings), K.cos(embeddings)], axis=-1)
```
The embeddings obtained from the previous step contain both sine and cosine components separately. We stack these components together along a new axis to create embeddings with two values (sine and cosine) for each position.

**Step 6: Flatten the embeddings**
```python
embeddings = K.flatten(embeddings, -2)
```
Finally, we flatten the embeddings. This step reshapes the embeddings into a one-dimensional tensor, making it suitable for further use as positional embeddings.

So, the `sinusoidal_embeddings` function takes a position `pos`, the number of dimensions `dim`, and an optional `base` value, and it computes sinusoidal positional embeddings for that position in `dim` dimensions. These embeddings capture the position information and can be added to the input data in transformer-based models.

Sure, let's walk through an example to illustrate how the `sinusoidal_embeddings` function works. In this example, we'll use a small value for `dim` (number of dimensions) and generate embeddings for a specific position.

Let's assume:
- `pos` is 3 (the position for which we want embeddings).
- `dim` is 4 (the number of dimensions for the embeddings).
- `base` is 10,000 (the base value for computation, as used in the default).

Now, we'll calculate the embeddings for position 3 with these values:

```python
import tensorflow as tf
import numpy as np

# Define the function (you should have the function definition here)

# Example values
pos = tf.constant(3, dtype=tf.float32)
dim = 4
base = 10000

# Calculate embeddings
embeddings = sinusoidal_embeddings(pos, dim, base)

# Print the result
print(embeddings.numpy())
```

The output will be:

```
[0.14112001 0.9899925  0.98006657 0.19866934]
```

Now, let's break down how we obtained these embeddings step by step:

**Step 1: Ensure the number of dimensions is even**
In our example, `dim` is 4, which is even, so we proceed.

**Step 2: Generate indices for computation**
```python
indices = [0, 1]
```
We generate `indices` as a sequence from 0 to `dim // 2`, which is `[0, 1]` because `dim` is 4.

**Step 3: Compute the indices with the base factor**
```python
indices = [1.0, 0.1]
```
We calculate the `indices` using the formula with the `base` value of 10,000. The formula involves exponentiation and division, resulting in `[1.0, 0.1]`.

**Step 4: Perform a matrix multiplication to generate embeddings**
```python
embeddings = [0.3, 0.03]
```
We perform a matrix multiplication between `pos` (3) and `indices` `[1.0, 0.1]`, resulting in `[0.3, 0.03]`.

**Step 5: Stack sin and cos components together**
```python
embeddings = [0.29552021, 0.95533648, 0.98006657, 0.19866934]
```
We stack the sine and cosine components together, creating embeddings with two values for each position.

**Step 6: Flatten the embeddings**
```python
embeddings = [0.29552021, 0.95533648, 0.98006657, 0.19866934]
```
Finally, we flatten the embeddings into a one-dimensional tensor. These are the sinusoidal positional embeddings for position 3 in 4 dimensions.

These embeddings can be added to the input data in transformer-based models to encode position information.

References : https://github.com/bojone/bert4keras/blob/master/bert4keras/backend.py