TransformerHub

This repository aims to implement different forms of transformer model, including seq2seq (the original architecture in All You Need is Attention paper), encoder-only, decoder-only, and unified transformer models.

These models are not meant to be the states of the arts on any tasks. Instead, they come with the purpose of training myself with advanced programming skills and also provide references to people who share the love of deep learning and machine intelligence.

This work is inspired by, and would not be possible without the open-source repos of NanoGPT, ViT, MAE, CLIP, and OpenCLIP. A huge thanks to them for open-sourcing their models!

This repository also maintains a paperlist of recent progresses in transformer models.

Features

This repository features a list of designs:

Transformer Architectures:
- Encoder-only
- Decoder-only
- Encoder-Decoder
- Unified (In Progress)
Attention Modules:
- Unmasked Attention (Transformer, BERT)
- Causal masked Attention (Transformer, GPT)
- Prefix Causal Attention (T5)
- Sliding-Window Attention (Mistral)
Position Embedding:
- Fixed Position Embedding (Transformer)
- Learnable Position Embedding (Transformer, BERT)
- Rotary Position Embedding (Roformer)
- Extrapolable Position Embedding (Length-Extrapolatable Transformer)
Sampling:
- Temperature-based Sampler
- Top-k Sampler
- Nucleus (top-p) Sampler

Current Progress

Currently working on implementing DINO, a variant of ViT trained in a self-supervised manner

Model	Implemented	Trained	Evaluated
Transformer	✅	No	No
GPT	✅	No	No
BERT	✅	Yes	No
ViT	✅	No	No
MAE	No	No	No
CLIP	No	No	No

DISCLAIMER: Because of the popularity and versatility of Transformers, there will be a lot of course assignments related to implementing part of or the entire Transformer models. This repository was developed purely for self-training purpose, and could well serve as a reference for implementing a Transformer model. But directly copying from this repo is strictly prohibited and is a violation of code of conduct for most academic institutes.

For those who need a refreshment of what the Transformer is or what the detailed architecture of Transformer looks like, please refer to a well-illustrated blog: http://nlp.seas.harvard.edu/annotated-transformer/#background

Here is a poem generated by LLaMA2, an open-source LLm released by Meta AI: Attention is all you need,
To understand what's said and read.
Transformers learn relations,
Through multi-head attentions.
Encoder, decoder architecture,
Learns features for good imagery.
Training on large datasets,
Its performance quickness gets.
Built on top of sequence to sequence,
Its parallel computing saves time to flex.
Understanding language, text and voice,
With deep learning that gave it its poise.
Task agnostic, wide usability,
Driving progress in AI agility.
Pushing NLP to new heights,
Transformers show their might.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Samplers		Samplers
config		config
data		data
models		models
transformerhub.egg-info		transformerhub.egg-info
utils/tokenizers		utils/tokenizers
.gitignore		.gitignore
Evaluating char-level gpt.ipynb		Evaluating char-level gpt.ipynb
README.md		README.md
configurator.py		configurator.py
setup.py		setup.py
train.py		train.py
train_simplified.py		train_simplified.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TransformerHub

Features

Current Progress

About

Releases

Packages

Languages

BubbleJoe-BrownU/TransformerHub

Folders and files

Latest commit

History

Repository files navigation

TransformerHub

Features

Current Progress

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages