DD2412 project at KTH by Leo Hiselius, Jonas Thunberg and Alfons Heintz, {leohi, jonthu, alfonsh}"at"kth.se
This project strives to combine two models:
- The minGPT model, a light weight implementation of iGPT by Andrej Karpathy published under the MIT license
- The BigBird attention masking developed by Zaheer et. al
- Run train.py to train GPT models.
- Rune finetune.py to finetune the model heads to a classification task.
- Run linearprobe.py to train one linear probe model per layer of each model.
- Run accuracy to check the accuracy of all of the different models.
- Run generate to generate example pictures from the full vanilla and BigBird models.
Maybe useful example of autograd on sparse matrices in comments
Video on BigBird, timestamp on block/roll implementation of sparse attention