torch-poet: a PyTorch char-rnn implementation

This is a PyTorch implemention along the ideas of Andrej Karpathy's char-rnn as described in 'The Unreasonable Effectiveness of Recurrent Neural Networks'.

Overview

This Jupyter notebook trains multi-layer LSTMs on a library of texts and then generates new text from the neural model. Through color-highlighting, source-references within the text generated by the model are used to link to the original sources. This visualizes how similar the generated and original texts are.

Training and generation can either happen on Google Colab or on a standard local or remote jupyter instance.

Structure

1. Books from Project Gutenberg

The notebook uses the ml_indie_tools project to access the Project Gutenberg library for books of given authors, languages and/or keywords. A list of matching books is compiled, downloaded, and prepared for the training pipeline. On local jupyter installations downloads from Project Gutenberg are cached locally, for Colab Notebooks, the notebook will ask to authorize a connection to the user's Google Drive. The Google Drive connection is used to 1. cache downloaded text documents and to 2. store training snapshots. All activity is only within the Colab Notebooks/<project name> and gutenberg_cache paths. The project name is defined at the beginning of the notebook and serves to differentiate between different training configurations and datasets.

Please refer to ml_indie_tools for more details.

2. Training pipeline

The selected book library is then converted into a Torch Dataset that resides on the gpu (if available) and is fed into the model using torch DataLoader.

3. Training

The model can be configured for an arbitrary number of LSTM layers. By default, every 3 minutes, training statistics are shown and a training snapshot is generated, and every 10 minutes (as soon as loss is below 1.5), sample text is generated.

Training can be interrupted at any point, and on restart, the last available snapshot is automatically loaded and training is continued. This is especially handy for Colab sessions which can be interrupted at any time. Snapshots of Colab trainings reside on the user's Google Drive and are thus persistent over session resets.

4. Generation of Text and 'Dialog'

At any point, training can be interrupted, and the snapshot that yielded highest precision can be loaded for text generation.

Run notebook in Google Colab

Requirements to run with standard Jupyter

PyTorch (1.x)
Python 3
Jupyter Notebook

Performance

Model: 2x256, 64 steps

Nvidia 1080ti: 0.00011 sec/sample
Tesla V4 (Colab): 0.00012 sec/sample
Apple M1 (CPU): 0.0032 sec/sample (Note: at the time of tests there was no pytorch version that supports M1 GPU, hence the very slow result, update 09/2022: while there is now (09/2022) an implementation with GPU support, is crashes during LSTM backward path, so can't be tested, s.b.)

Apple Silicon Errata

Since at the time of this writing, Apple's implementation of the LSTM backward path is broken, the model can't be trained on Apple Silicon. For Apple silicon have a look at torch-transformer-poet which uses transformers instead of LSTMs (easier to implement backward path :-) ). This works fine with local Apple Silicon, NVIDIA or Colab GPUs.

History

2022-12-13: Pytorch 2: Apple's MPS stuff still crashes on LSTM backward path. See also pytorch github issue, triaged
2022-09-11: Tested Apple M1 Metal support (MPS), but LSTM backward pass is still broken, crashes because getting confused on tensor dimensions. (PyTorch 1.12.1 and nightly of 09/2022 both crash). Latest ml_indie_tools enabled.
2022-03-12: ml_indie_tools is used for Projekt Gutenberg access.
ongoing: support for direct Project Gutenberg queries for training data generation, various optimizations, usage of torch dataloaders.
2020-02-13: PyTorch 1.4, women literature default texts. Saving/Restoring model data
2019-05-11: Pytorch 1.1, support for running on Google Colab.
2018-10-17: retested with PyTorch 1.0rc1, ok, no changes necessary.
2018-05-13: adapted for PyTorch 0.4
2018-03-02: adapted for PyTorch 0.3.1

References

Andrej Karpathy's char-rnn
The Unreasonable Effectiveness of Recurrent Neural Networks
See tensor-poet for a similar implementation, using Tensorflow.
See rnnreader for a pure C++ implementation of the same idea.
ml_indie_tools: Gutenberg access and boiler-plate.
transformer-poet for a transformer based implementation (tensorflow).

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
torch_poet.ipynb		torch_poet.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

torch-poet: a PyTorch char-rnn implementation

Overview

Structure

1. Books from Project Gutenberg

2. Training pipeline

3. Training

4. Generation of Text and 'Dialog'

Run notebook in Google Colab

Requirements to run with standard Jupyter

Performance

Apple Silicon Errata

History

References

About

Releases

Packages

Languages

License

domschl/torch-poet

Folders and files

Latest commit

History

Repository files navigation

torch-poet: a PyTorch char-rnn implementation

Overview

Structure

1. Books from Project Gutenberg

2. Training pipeline

3. Training

4. Generation of Text and 'Dialog'

Run notebook in Google Colab

Requirements to run with standard Jupyter

Performance

Apple Silicon Errata

History

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages