A tiny LLM based on GPT-2.
The simplest, fastest repository for training/finetuning medium-sized GPTs. It is a rewrite of nanoGPT that prioritizes teeth over education. Still under active development, but currently the file train.py
reproduces GPT-2 (124M) on OpenWebText, running on a single 8XA100 40GB node in about 4 days of training. The code is divided into separate python modules.
Because the code is so simple, it is very easy to hack to your needs, train new models from scratch, or finetune pretrained checkpoints (e.g. biggest one currently available as a starting point would be the GPT-2 1.3B model from OpenAI).
Install the requirements using the following command:
pip install -r requirements.txt
This implementation is logically identical to NanoGPT.
The purpose of this project is to apply and expand my knowledge about LLMs.