Implementations of literally the most random things for my own practice.
- Transformer model with Multi-head attention. (a) Get it working (b) Implement KV-caching (c) Implement a mini GPT (maybe GPT-2 or something) model
TODOs:
- Segment Tree with Lazy Evaluation
- BIT