Skip to content

TimilsinaBimal/transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer Model Documentation

Introduction

This project implements a Transformer-based model using PyTorch for sequence-based tasks. It leverages Byte Pair Encoding (BPE) for tokenization and is designed to handle text generation tasks, such as predicting the next sequence of tokens given an initial prompt. The repository includes scripts for training the model, running inference, and handling tokenization using BPE.

Key Components

  1. BytepairEncoding: This module provides an implementation of Byte Pair Encoding for tokenization. It is used in the preprocessing and tokenization of text data.

  2. Transformer: The core model that consists of an encoder and decoder, built using multi-head attention layers and feed-forward networks.

  3. Inference: Handles the loading of the trained model and executes the generation of text sequences given a prompt.

  4. Training Script: Implements the workflow for training the Transformer model, leveraging the prepared dataset and handling model parameter updates through backpropagation.

  5. Tokenizer: Located in transformer/tokenizer, this part handles command-line based interactions for training models, encoding text, and decoding tokens back into text with visualization options.

Running the Code

Prerequisites

  • Python 3.x
  • PyTorch library
  • tqdm for progress bars
  • colorama for terminal-based color visuals
  • numpy for numerical operations
  • Install any other missing packages as required

Steps to Run

  1. Setup the Environment:

    • Ensure all dependencies are installed. Use pip to install required packages.
    • Prepare your dataset text file, presumably named like tiny_shakespeare.txt.
  2. Train the Model:

    • Edit transformer/config.py to set your configurations such as paths, sequence length, batch size, and other hyperparameters.
    • Run the training script with:
      python main.py
      Then input y to train the model.
    • This will tokenize the dataset, prepare data loaders, and start training the transformer model, updating model weights iteratively.
  3. Perform Inference:

    • The inference.py script allows text sequence generation. Ensure the path to the trained model is correctly configured and run the script using:
      python main.py
      Then input n for inference.
    • Provide a starting prompt, and specify the maximum length of the generation.
  4. Tokenization:

    • You can interact with the BPE tokenizer through transformer/tokenizer/main.py. Example commands include training the BPE, encoding, and decoding operations.
    • Use the command line to run:
      python -m transformer.tokenizer.main --train --file <your-file-path>
  5. Testing:

    • The scripts may include testing components such as transformer/test_model.py to ensure the implementation functions as expected. These can often be run directly to verify particular aspects of the model.

Implementation Notes

  • The code structure includes separation into inference, training, and tokenization components.
  • Training involves standard machine learning workflows with dataset preparation, model definition, configuration management, and iterative training cycles.
  • The model architecture follows a typical Transformer setup, leveraging multi-head attention and layer normalization.
  • The given model paths and configuration should be adjusted as per your local setup and data deployment paths.

This documentation provides a high-level overview and practical guide for using the scripts included in this project for building and utilizing a Transformer-based text processing model.

About

Implementation of Transformer Architecture

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages