FridaLM

FridaLM is a custom implementation of a decoder-only Transformer Language Model built from scratch in PyTorch. It is designed to be lightweight and is trained on the TinyStories dataset to generate coherent, simple narrative English.

This project demonstrates the end-to-end process of building an LLM: creating a BPE tokenizer, implementing the Transformer architecture (Attention, FFN, Positional Encoding), writing a training loop with mixed-precision, and implementing an inference generation script.

📊 Results

The model is capable of generating consistent short stories after just a few epochs of training.

Story Generation Examples

Prompt: "jack"	Prompt: "tim liked..."

🏗️ Architecture

FridaLM follows the standard GPT-style decoder-only architecture. It utilizes causal masking to predict the next token in a sequence.

Key Model Specs

Type: Decoder-only Transformer
Embedding Dimension (d_model): 512
Context Window (seq_len): 256 tokens
Layers: 6 Decoders
Attention Heads: 8
Vocabulary: ~32,000 (BPE)
Optimizations:
- Weight Tying: The output linear layer shares weights with the input embedding layer.
- Mixed Precision: Training utilizes torch.amp.autocast for memory efficiency.
- Positional Encoding: Standard Sinusoidal injection.

🛠️ Setup & Installation

Clone the repository:

git clone https://github.com/yourusername/FridaLM.git
cd FridaLM

Install dependencies: It is recommended to use a virtual environment.
```
pip install -r requirements.txt
```
Note: Ensure you have a version of PyTorch installed that supports your hardware (CUDA/CPU).

🚀 Usage

1. Training

To train the model from scratch. This script will automatically download the roneneldan/TinyStories dataset, train a custom BPE tokenizer, and begin the training loop.

Note: This Model was Trained on Runpod A40s instance for 4 hours

python train.py

Configuration: You can adjust hyperparameters (Batch size, Learning Rate, Epochs) at the top of train.py.
Checkpoints: The model saves checkpoints to /workspace/output (or your configured SAVE_DIR) at the end of every epoch.
Resuming: The script is capable of resuming from the last saved checkpoint if defined in RESUME_FROM.

2. Inference (Generation)

Once you have a trained model (e.g., checkpoint_epoch_10.pt and tokenizer.json), you can interact with it.

python inference.py

The script loads the model onto the available device (CUDA/CPU).
Type a prompt to start the story.
Parameters like temperature and top_k can be adjusted inside the main() function of inference.py to control creativity.

📂 Project Structure

model.py: Contains the PyTorch classes for the Transformer (Self-Attention, FFN, Decoder, FridaLM).
train.py: Handles data loading, tokenization, training loop, scheduler, and checkpoint saving.
inference.py: Loads the saved checkpoint and runs the generation loop.
requirements.txt: Python dependencies.

🧠 Dataset

This model is trained on TinyStories (HuggingFace Link). It is a synthetic dataset, containing short stories with a limited vocabulary, making it perfect for training small language models on consumer hardware while still achieving grammatical coherence.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
img		img
README.md		README.md
inference.py		inference.py
model.py		model.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FridaLM

📊 Results

Story Generation Examples

🏗️ Architecture

Key Model Specs

🛠️ Setup & Installation

🚀 Usage

1. Training

2. Inference (Generation)

📂 Project Structure

🧠 Dataset

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FridaLM

📊 Results

Story Generation Examples

🏗️ Architecture

Key Model Specs

🛠️ Setup & Installation

🚀 Usage

1. Training

2. Inference (Generation)

📂 Project Structure

🧠 Dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages