# Introduction to the Encoder in Attention Mechanism

In this notebook, we will learn the fundamentals of the encoder and the attention mechanism using PyTorch. We will explore key concepts such as projection layers, dot products for attention scores, softmax for normalization, and weighted sums of values. By the end of this notebook, you'll have hands-on experience implementing these concepts and understanding how the encoder works in a neural network architecture like transformers.




## 1. Basic Linear Layer Exercise

In the encoder, we use projection layers to project the input embeddings into a new space. This is done through a linear transformation.

### Task:
Implement a simple linear layer in PyTorch to project an input embedding into a new space (simulating the projection for Queries, Keys, or Values).





In [3]:
%pip install scipy
import torch
import torch.nn as nn
from scipy.linalg.cython_lapack import spftrf
from transformers.models.prophetnet.modeling_prophetnet import softmax

# Example input: A batch of 3 sequences, each of length 4, with 5-dimensional embeddings
input_embeddings = torch.randn(4, 5)  # Shape: [batch_size, seq_length, embedding_dim]

# Define a projection layer (linear transformation)
projection_layer =  nn.Linear(5,6)  # Input size is 5, output size is 6

# Apply projection layer to the input embeddings (like Q or K)
projected = projection_layer(input_embeddings)
print(projected.shape)  # Output should have shape: [4, 6]


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: C:\Users\Utilizador\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


ImportError: cannot import name 'spftrf' from 'scipy.linalg.cython_lapack' (C:\Users\Utilizador\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\scipy\linalg\cython_lapack.cp311-win_amd64.pyd)

## 2. Dot Product for Attention Calculation
In the attention mechanism, we calculate the similarity between the queries and keys using the dot product.

### Task:
Compute the similarity score between queries and keys using the dot product.



In [None]:
import torch

# Example query and key vectors (after projection)
queries = torch.randn(4, 6)  # Shape: [seq_length, d_k]
keys = torch.randn(4, 6)     # Shape: [seq_length, d_k]

attention_scores = torch.matmul(queries,keys.T)         #DOT PRODUCT HERE
print(attention_scores.shape)  # Output: [4, 4]


## 3. Softmax to Normalize Attention Scores
The softmax function normalizes the attention scores, turning them into probabilities that sum to 1. This step is important for focusing the attention on the most relevant parts of the sequence.

### Task:
Apply softmax to the attention scores to get the attention weights.


In [None]:
import torch
import torch.nn.functional as F

# Example attention scores (e.g., dot product result)
attention_scores = torch.randn(4, 4)  # Shape: [seq_length, seq_length]

# Apply softmax to normalize the attention scores
attention_weights = F.softmax(attention_scores,dim=1)   # Softmax along the last dimension
print(attention_weights.shape)  # Output: [4, 4]


## 5. Full Attention Mechanism

Now, we combine the previous steps to create the full scaled dot-product attention mechanism. This function computes the attention output by performing the following operations:

1. Compute dot product between queries and keys.
2. Apply softmax to normalize attention scores.
3. Use the attention weights to compute the weighted sum of values.

### Task:
Implement the full attention mechanism by combining the previous operations.



In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

# Example input
sentence_input = torch.randn(4, 6)  # [seq_len, d_k]
d_k = 6

# Example layers
query_layer = nn.Linear(6, 6)
key_layer = nn.Linear(6, 6)
value_layer = nn.Linear(6, 6)

# Apply the projections
query = query_layer(sentence_input)   # Project input into query space
keys = key_layer(sentence_input)  # Project input into key space
values = value_layer(sentence_input)  # Project input into value space

# Attention Scores: Scaled dot-product attention
attention_scores = torch.matmul(query,keys.T)

# Softmax to get attention weights
attention_weights = F.softmax(attention_scores,dim=1)  # Normalize along last dimension USE: F.softmax

# Final scores
attention_output = torch.matmul(attention_weights, values)

print(attention_output)


SyntaxError: invalid syntax (2992342543.py, line 6)