Rustformer: A Rust Implementation of "Attention Is All You Need"

This project is a complete implementation of the Transformer architecture in Rust, as detailed in the paper "Attention Is All You Need" by Vaswani et al. (2017). It uses the Candle deep learning framework for building and training the model.

🚧 Development Status

This project is currently under active development. The core architectural components are in place, with future work planned for the following:

Full dataset loading and preprocessing pipelines.
Evaluation metrics, such as BLEU score calculation.
Implementation of the Transformer Big configuration.
Support for exporting model weights to standard formats (e.g., ONNX).

🎯 Objective

The primary goal is to create a faithful reproduction of the original Transformer model, adhering to the architecture, components, and training specifications described in the paper. This implementation focuses on the Transformer Base configuration.

🏗️ Project Structure

The codebase is organized into modular Rust files with clear separation of concerns:

src/main.rs: The main binary entry point for running the training process and inference.
src/lib.rs: The root of the rustformer library crate.
src/transformer.rs: Contains the core building blocks of the Transformer, including the Encoder, Decoder, EncoderLayer, and DecoderLayer structs.
src/attention.rs: Implements the ScaledDotProductAttention and MultiHeadAttention mechanisms.
src/config.rs: Defines the model configuration and hyperparameters, such as d_model, d_ff, n_heads, and dropout rates.
src/model_args.rs: Defines the ModelArgs struct for holding the model's architectural parameters.
src/data.rs: Handles data loading, tokenization, and batching for training.
src/train.rs: Implements the main training loop, including the optimizer setup and learning rate scheduling.
src/optimizer.rs: Defines the Adam optimizer with the specific hyperparameters (β₁, β₂, ε) from the paper.
src/generation.rs: Contains logic for generating sequences during inference.
src/metrics.rs: (Future) Intended for evaluation metrics like BLEU score.
src/monitoring.rs: (Future) Intended for logging and monitoring during training.

🧩 Architecture Specifications

This implementation follows the Transformer Base model as specified in the paper.

Core Components

Encoder: A stack of N=6 identical layers, each with a multi-head self-attention mechanism and a position-wise feed-forward network.
Decoder: A stack of N=6 identical layers, featuring masked multi-head self-attention, encoder-decoder attention, and a position-wise feed-forward network.
Multi-Head Attention: Splits queries, keys, and values into 8 parallel attention heads.
Position-wise Feed-Forward Network: A two-layer fully connected network with a ReLU activation.
Positional Encoding: Uses sine and cosine functions to inject position information into the input embeddings.

Model Dimensions (`Transformer Base`)

Parameter	Value
Layers (N)	6
d_model	512
d_ff	2048
Heads (h)	8
d_k / d_v	64
Dropout	0.1

🚀 How to Run

Build the project:
```
cargo build --release
```
Run the training:
```
cargo run --release
```
(Note: Dataset and command-line arguments for training configuration will be specified in main.rs)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
experiment_results.json		experiment_results.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rustformer: A Rust Implementation of "Attention Is All You Need"

🚧 Development Status

🎯 Objective

🏗️ Project Structure

🧩 Architecture Specifications

Core Components

Model Dimensions (`Transformer Base`)

🚀 How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Rustformer: A Rust Implementation of "Attention Is All You Need"

🚧 Development Status

🎯 Objective

🏗️ Project Structure

🧩 Architecture Specifications

Core Components

Model Dimensions (Transformer Base)

🚀 How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Model Dimensions (`Transformer Base`)

Packages