Sequence-to-sequence models implementation using PyTorch framework.
Note: I reuse many source codes written by others. This repository is just for my daily practice, instead of any commercial use.
Clone the project, go into the project directory, and execute
python setup.py install
or
pip install ./
or simply copy the source code.
pip install ./
is recommended, because you can activate a virtual environment first and then install the package in that environment without affecting others.
Install or copy seq2seq
folder to the project directory as a package before using.
See example
Trainer
supports gradient accumulation which enables larger (equivalent) batch size although with limited memory.- Supports beam search by reusing the code from the Transformers library of the HuggingFace Inc. team.
- Attention mechanism.
- I use the Luong attention mechanism instead of the Bahdanau attention mechanism. When computing the attention at time step
t
, the former uses the hidden state from the time stept
while the latter uses the hidden state from the time stept-1
. - Support multi-head attention.
- I use the Luong attention mechanism instead of the Bahdanau attention mechanism. When computing the attention at time step
- Fix
trainer
. When saving training checkpoint,trainer
does not save the best epoch model. So, if resume training, the saved best epoch after finishing is not actually the best epoch of the whole training stage, but the best epoch after the checkpoint. (Not sure if trainer should save the best-so-far model at every checkpoint, which will make the checkpoint file large.) - (Not sure if it is necessary.) Support regression.
- Add some utility scripts, such as
create_vocab.py
,inference.py
, and so on. - Self-Attention