This repository contains an implementation of a Transformer-based GPT model trained on the OpenWebText dataset. The model follows the architecture principles of the original GPT paper and is built using PyTorch. The goal of this project is to explore language modeling and text generation using modern deep learning techniques.
- Implements a Transformer-based architecture
- Pretrained on OpenWebText dataset
- Supports text generation and fine-tuning
- Built with PyTorch for flexibility and scalability
To use this repository, first clone it and install the required dependencies:
git clone https://github.com/Executer13/gpt.git
cd gpt
pip install -r requirements.txtTo train the GPT model on your dataset, run:
python train.py --config configs/default.yamlModify configs/default.yaml to change model hyperparameters and training settings.
To generate text using a trained model, run:
python generate.py --model-checkpoint path/to/checkpoint.pth --prompt "Your text prompt here"Fine-tune the model on a custom dataset using:
python fine_tune.py --dataset path/to/datasetThe model follows the standard Transformer-based GPT architecture, including:
- Multi-head self-attention mechanism
- Layer normalization
- Position-wise feedforward network
- Causal masking for autoregressive generation
The model is trained on OpenWebText, an open-source dataset designed to replicate OpenAI’s WebText dataset. Users can replace this dataset with their own corpus for custom training.
Contributions are welcome! Feel free to open issues or submit pull requests.
This project is licensed under the MIT License. See LICENSE for details.
For questions or discussions, open an issue or reach out via GitHub Discussions.