# Cloning the TransformerEncoderScratch Repository

To get started with the Transformer Encoder, first, you need to clone the `TransformerEncoderScratch` repository from GitHub. This repository contains all the necessary code and examples to help you understand and implement the Transformer Encoder from scratch.

## Step 1: Clone the Repository

Open your terminal and run the following command:

```bash
!git clone https://github.com/atikul-islam-sajib/TransformerEncoderScratch.git
```

This command will create a directory named `TransformerEncoderScratch` in your current working directory, containing all the code and resources you need.


In [None]:
!git clone https://github.com/atikul-islam-sajib/TransformerEncoderScratch.git

## Next Steps - Change the directory

```bash
cd TransformerEncoderScratch
```

In [None]:
cd TransformerEncoderScratch

# Install the requirements

To run the code, you need to install the required dependencies. Run the following command in your terminal:

```bash
pip install -r requirements.txt
```

This command will install all the necessary libraries and packages listed in the `requirements.txt` file.

In [None]:
!pip install -r requirements.txt

## Inference the Transformer Encoder

```python
import torch
from transformer import TransformerEncoder
```

- **torch**: PyTorch library used for tensor operations and neural network modules.
- **TransformerEncoder**: Class imported from a local `transformer` module that implements the Transformer Encoder.

### Script Attributes

```python
"""
This script initializes a Transformer Encoder with specified parameters,
creates a random embedding tensor, and prints the shapes of the embedding
and the output tensors.

Attributes:
    batch_size (int): The batch size for the input tensor.
    sequence_length (int): The sequence length for the input tensor.
    model_dimension (int): The dimension of the model.
    feed_forward (int): The dimension of the feed forward network.
    number_heads (int): The number of attention heads.
    dropout (float): The dropout rate.
    epsilon (float): The epsilon value for numerical stability.
"""
```

- **batch_size**: The number of samples processed together in one forward/backward pass. Here, it is set to 64.
- **sequence_length**: The length of the input sequences. Here, it is set to 512.
- **model_dimension**: The size of the input feature vector. Here, it is set to 768.
- **feed_forward**: The dimension of the feed forward network inside the Transformer Encoder. Here, it is set to 2048.
- **number_heads**: The number of attention heads in the multi-head attention mechanism. Here, it is set to 12.
- **dropout**: The dropout rate used to prevent overfitting. Here, it is set to 0.1.
- **epsilon**: A small value added to avoid division by zero during normalization. Here, it is set to 1e-6.

### Creating the Embedding Tensor

```python
# Create a random embedding tensor with the specified shape
embedding = torch.randn((batch_size, sequence_length, model_dimension))
```

- **embedding**: A random tensor of shape `(batch_size, sequence_length, model_dimension)` generated using `torch.randn`. This tensor simulates the input embeddings to the Transformer Encoder. In real, use *nn.Embedding()* to create the embedding layer.

### Creating the Padding Mask Tensor

```python
# Create a random padding mask tensor
padding_masked = torch.randn((batch_size, sequence_length))
```

- **padding_masked**: A random tensor of shape `(batch_size, sequence_length)` generated using `torch.randn`. This tensor simulates the padding mask that indicates which elements in the sequence are padding and should be ignored during attention calculations.

### Initializing the Transformer Encoder

```python
# Initialize the Transformer Encoder with the specified parameters
netTransformer = TransformerEncoder(
    dimension=model_dimension,
    heads=number_heads,
    feed_forward=feed_forward,
    dropout=dropout,
    epsilon=epsilon,
    mask=padding_masked,
)
```

- **netTransformer**: An instance of the `TransformerEncoder` class, initialized with the specified parameters.

### Printing the Shapes of Embedding and Output Tensors

```python
# Print the divider line
print("|", "-" * 100, "|")

# Print the shape of the embedding tensor
print("|", "\tThe embedding shape is: ", embedding.size())

# Pass the embedding through the Transformer Encoder and print the output shape
print(
    "|",
    "\tThe output shape is: ",
    netTransformer(embedding).size(),
)  # (batch_size, sequence_length, model_dimension)

# Print the closing divider line
print("|", "-" * 100, "|")
```

- The script prints a divider line for readability.
- It prints the shape of the embedding tensor.
- It passes the embedding tensor through the `netTransformer` (Transformer Encoder) and prints the shape of the resulting output tensor.
- Finally, it prints a closing divider line.

In [None]:
import torch
from transformer import TransformerEncoder

"""
This script initializes a Transformer Encoder with specified parameters,
creates a random embedding tensor, and prints the shapes of the embedding
and the output tensors.

Attributes:
    batch_size (int): The batch size for the input tensor.
    sequence_length (int): The sequence length for the input tensor.
    model_dimension (int): The dimension of the model.
    feed_forward (int): The dimension of the feed forward network.
    number_heads (int): The number of attention heads.
    dropout (float): The dropout rate.
    epsilon (float): The epsilon value for numerical stability.
"""

batch_size = 64
sequence_length = 512
model_dimension = 768
feed_forward = 2048
number_heads = 12
dropout = 0.1
epsilon = 1e-6

# Create a random embedding tensor with the specified shape
embedding = torch.randn((batch_size, sequence_length, model_dimension))

# Create a random padding mask tensor
padding_masked = torch.randn((batch_size, sequence_length))

# Initialize the Transformer Encoder with the specified parameters
netTransformer = TransformerEncoder(
    dimension=model_dimension,
    heads=number_heads,
    feed_forward=feed_forward,
    dropout=dropout,
    epsilon=epsilon,
    mask=padding_masked,
)

# Print the divider line
print("|", "-" * 100, "|")

# Print the shape of the embedding tensor
print("|", "\tThe embedding shape is: ", embedding.size())

# Pass the embedding through the Transformer Encoder and print the output shape
print(
    "|",
    "\tThe output shape is: ",
    netTransformer(embedding).size(),
)  # (batch_size, sequence_length, model_dimension)

# Print the closing divider line
print("|", "-" * 100, "|")