FridaLM is a custom implementation of a decoder-only Transformer Language Model built from scratch in PyTorch. It is designed to be lightweight and is trained on the TinyStories dataset to generate coherent, simple narrative English.
This project demonstrates the end-to-end process of building an LLM: creating a BPE tokenizer, implementing the Transformer architecture (Attention, FFN, Positional Encoding), writing a training loop with mixed-precision, and implementing an inference generation script.
The model is capable of generating consistent short stories after just a few epochs of training.
| Prompt: "jack" | Prompt: "tim liked..." |
|---|---|
![]() |
![]() |
FridaLM follows the standard GPT-style decoder-only architecture. It utilizes causal masking to predict the next token in a sequence.
- Type: Decoder-only Transformer
- Embedding Dimension (
d_model): 512 - Context Window (
seq_len): 256 tokens - Layers: 6 Decoders
- Attention Heads: 8
- Vocabulary: ~32,000 (BPE)
- Optimizations:
- Weight Tying: The output linear layer shares weights with the input embedding layer.
- Mixed Precision: Training utilizes
torch.amp.autocastfor memory efficiency. - Positional Encoding: Standard Sinusoidal injection.
-
Clone the repository:
git clone https://github.com/yourusername/FridaLM.git cd FridaLM -
Install dependencies: It is recommended to use a virtual environment.
pip install -r requirements.txt
Note: Ensure you have a version of PyTorch installed that supports your hardware (CUDA/CPU).
To train the model from scratch. This script will automatically download the roneneldan/TinyStories dataset, train a custom BPE tokenizer, and begin the training loop.
Note: This Model was Trained on Runpod A40s instance for 4 hours
python train.py- Configuration: You can adjust hyperparameters (Batch size, Learning Rate, Epochs) at the top of
train.py. - Checkpoints: The model saves checkpoints to
/workspace/output(or your configuredSAVE_DIR) at the end of every epoch. - Resuming: The script is capable of resuming from the last saved checkpoint if defined in
RESUME_FROM.
Once you have a trained model (e.g., checkpoint_epoch_10.pt and tokenizer.json), you can interact with it.
python inference.py- The script loads the model onto the available device (CUDA/CPU).
- Type a prompt to start the story.
- Parameters like
temperatureandtop_kcan be adjusted inside themain()function ofinference.pyto control creativity.
model.py: Contains the PyTorch classes for the Transformer (Self-Attention, FFN, Decoder, FridaLM).train.py: Handles data loading, tokenization, training loop, scheduler, and checkpoint saving.inference.py: Loads the saved checkpoint and runs the generation loop.requirements.txt: Python dependencies.
This model is trained on TinyStories (HuggingFace Link). It is a synthetic dataset, containing short stories with a limited vocabulary, making it perfect for training small language models on consumer hardware while still achieving grammatical coherence.


