Skip to content

Redtius/fridaLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FridaLM

python pytorch

FridaLM is a custom implementation of a decoder-only Transformer Language Model built from scratch in PyTorch. It is designed to be lightweight and is trained on the TinyStories dataset to generate coherent, simple narrative English.

This project demonstrates the end-to-end process of building an LLM: creating a BPE tokenizer, implementing the Transformer architecture (Attention, FFN, Positional Encoding), writing a training loop with mixed-precision, and implementing an inference generation script.

📊 Results

The model is capable of generating consistent short stories after just a few epochs of training.

Story Generation Examples

Prompt: "jack" Prompt: "tim liked..."
Result 1 Result 2

🏗️ Architecture

FridaLM follows the standard GPT-style decoder-only architecture. It utilizes causal masking to predict the next token in a sequence.

High Level Architecture

Key Model Specs

  • Type: Decoder-only Transformer
  • Embedding Dimension (d_model): 512
  • Context Window (seq_len): 256 tokens
  • Layers: 6 Decoders
  • Attention Heads: 8
  • Vocabulary: ~32,000 (BPE)
  • Optimizations:
    • Weight Tying: The output linear layer shares weights with the input embedding layer.
    • Mixed Precision: Training utilizes torch.amp.autocast for memory efficiency.
    • Positional Encoding: Standard Sinusoidal injection.

🛠️ Setup & Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/FridaLM.git
    cd FridaLM
  2. Install dependencies: It is recommended to use a virtual environment.

    pip install -r requirements.txt

    Note: Ensure you have a version of PyTorch installed that supports your hardware (CUDA/CPU).

🚀 Usage

1. Training

To train the model from scratch. This script will automatically download the roneneldan/TinyStories dataset, train a custom BPE tokenizer, and begin the training loop.

Note: This Model was Trained on Runpod A40s instance for 4 hours

python train.py
  • Configuration: You can adjust hyperparameters (Batch size, Learning Rate, Epochs) at the top of train.py.
  • Checkpoints: The model saves checkpoints to /workspace/output (or your configured SAVE_DIR) at the end of every epoch.
  • Resuming: The script is capable of resuming from the last saved checkpoint if defined in RESUME_FROM.

2. Inference (Generation)

Once you have a trained model (e.g., checkpoint_epoch_10.pt and tokenizer.json), you can interact with it.

python inference.py
  • The script loads the model onto the available device (CUDA/CPU).
  • Type a prompt to start the story.
  • Parameters like temperature and top_k can be adjusted inside the main() function of inference.py to control creativity.

📂 Project Structure

  • model.py: Contains the PyTorch classes for the Transformer (Self-Attention, FFN, Decoder, FridaLM).
  • train.py: Handles data loading, tokenization, training loop, scheduler, and checkpoint saving.
  • inference.py: Loads the saved checkpoint and runs the generation loop.
  • requirements.txt: Python dependencies.

🧠 Dataset

This model is trained on TinyStories (HuggingFace Link). It is a synthetic dataset, containing short stories with a limited vocabulary, making it perfect for training small language models on consumer hardware while still achieving grammatical coherence.

About

FridaLM is a custom implementation of a decoder-only Transformer Language Model built from scratch in PyTorch.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages