Skip to content

A repository serving as a personal learning journey, mostly will contain paper architectural/method implementation

KartikVashishta/papers-with-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Learning Paper Implementations

This repository serves as a personal learning journey through important papers in deep learning, starting with foundational architectures and gradually expanding to more complex models. Each implementation is meant to be a clean, educational reference point with a focus on understanding the core concepts.

Current Implementations

Paper Implementation Key Concepts
Attention Is All You Need transformer-implementation/ - Multi-Head Attention
- Positional Encoding
- Layer Normalization
- Label Smoothing
- Warmup Learning Rate
Neural Machine Translation by Jointly Learning to Align and Translate BPE/ - Byte Pair Encoding
- Subword Tokenization
- Vocabulary Building
- Special Token Handling
Language Models are Unsupervised Multitask Learners gpt-2/ - Transformer Decoder
- Autoregressive Language Modeling
- Transfer Learning
- Advanced Text Generation

Transformer Implementation Details

The current implementation includes a complete transformer architecture with:

  • Multi-headed self-attention mechanism
  • Position-wise feed-forward networks
  • Positional encodings
  • Layer normalization
  • Encoder and decoder stacks
  • Label smoothing
  • Learning rate scheduling with warmup

BPE Tokenizer Details

The BPE (Byte Pair Encoding) tokenizer implementation is inspired by Sebastian Raschka's work and includes:

  • Complete training algorithm to learn subword tokens from a corpus
  • Efficient encoding and decoding methods with merge prioritization
  • Full support for special tokens and Unicode characters
  • Space preprocessing using 'Ġ' character (following GPT tokenizer convention)
  • OpenAI-compatible format loader for GPT-2 vocabularies
  • Performance optimizations with caching mechanisms
  • Regex-based tokenization for faster processing

GPT-2 Implementation Details

The GPT-2 implementation is inspired by Andrej Karpathy's work with many optimizations to be made. It features:

  • Transformer decoder architecture
  • Autoregressive language modeling
  • Pre-training and fine-tuning capabilities
  • Text generation with various sampling strategies (temperature, top-k, top-p)
  • Efficient attention patterns for improved training
  • Educational implementation focusing on clarity and understanding

Note

These implementations are meant for educational purposes and self-reference. While they aim to be correct, they may not be optimized for production use. They serve as a starting point for understanding the underlying concepts and architectures described in the papers.

About

A repository serving as a personal learning journey, mostly will contain paper architectural/method implementation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published