Attention is All You Need

This repository contains the implementation using Pytorch of the seminal paper "Attention is All You Need" by Vaswani et al. This implementation includes the Transformer model, tokenizer, and training scripts.

Overview

The Transformer model, introduced in the paper "Attention is All You Need," is a novel architecture designed to handle sequential data with self-attention mechanisms. This architecture has achieved state-of-the-art performance in various natural language processing tasks.

Requirements

Python 3.11 or higher
PyTorch 2.3 or higher
NumPy
transformers 4.41 or higher

Installation

To set up the environment, follow these steps:

Clone the repository:

git clone https://github.com/deeplearningcafe/transformer-pytorch.git
cd transformer-pytorch

Create a virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Or if using conda:

conda create -n transformer_torch
conda activate transformer_torch

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Train tokenizer

We use the BPE tokenizer, the same as the original paper. To train the tokenizer from scratch we use the tokenizers library from huggingface. The vocabulary_size is set to 37000.

python train_tokenizer.py

Hyperparameter search

In the case of using a dataset that is different from the original WMT 2014 English-to-German dataset, the warmup steps should be changed. By overfitting in a single batch, we can test several warmup values. To run it just change the hp_search variable to True inside the config.yaml file, the tolerance, max steps and search interval can be changed.

python training.py

Train transformer

To train the Transformer model, use the provided script:

python training.py

Here, config.yaml is a configuration file specifying the model parameters, training settings, and dataset paths. Parameter count becomes 56,754,176, while try to use the same configuration as in the paper.

transformer:
hidden_dim: 512
num_heads: 8
intermediate_dim: 2048
eps: 1e-06
num_layers: 6
dropout: 0.1
label_smoothing: 0.1

train:
warmup_steps: 4000
max_length: 128
device: cpu
train_batch: 128
eval_batch: 128
steps: 10
val_steps: 1
log_steps: 1
save_steps: 1
use_bitsandbytes: False
save_path: weights/transformer_
dataset_name: "bentrevett/multi30k"

Examples

We include the inference.py file for inference that calls the generate method from the transformer class, with option to use greedy sampling or top_k sampling. To evaluate the model on the test set we include test_model.py file.

References

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is All You Need. arXiv preprint arXiv:1706.03762.
Hugging Face Transformers: https://huggingface.co/transformers/

Author

aipracticecafe

License

This project is licensed under the MIT license. Details are in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
notebooks		notebooks
util		util
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.MD		README.MD
config.yaml		config.yaml
inference.py		inference.py
requirements.txt		requirements.txt
test_model.py		test_model.py
train_tokenizer.py		train_tokenizer.py
training.py		training.py
transformer_implementation.py		transformer_implementation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Attention is All You Need

Table of Contents

Overview

Requirements

Installation

Usage

Train tokenizer

Hyperparameter search

Train transformer

Examples

References

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

deeplearningcafe/transformer-pytorch

Folders and files

Latest commit

History

Repository files navigation

Attention is All You Need

Table of Contents

Overview

Requirements

Installation

Usage

Train tokenizer

Hyperparameter search

Train transformer

Examples

References

Author

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages