minGPT-with-BigBird

DD2412 project at KTH by Leo Hiselius, Jonas Thunberg and Alfons Heintz, {leohi, jonthu, alfonsh}"at"kth.se

This project strives to combine two models:

The minGPT model, a light weight implementation of iGPT by Andrej Karpathy published under the MIT license
The BigBird attention masking developed by Zaheer et. al

Instructions to recreate results presented in project report

Run train.py to train GPT models.
Rune finetune.py to finetune the model heads to a classification task.
Run linearprobe.py to train one linear probe model per layer of each model.
Run accuracy to check the accuracy of all of the different models.
Run generate to generate example pictures from the full vanilla and BigBird models.

Notes for devs

Maybe useful example of autograd on sparse matrices in comments

Video on BigBird, timestamp on block/roll implementation of sparse attention

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
accuracy		accuracy
checkpoints		checkpoints
generated		generated
minGPT		minGPT
LICENSE		LICENSE
README.md		README.md
accuracy.py		accuracy.py
finetune.py		finetune.py
generate.py		generate.py
linearprobe.py		linearprobe.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accuracy

accuracy

checkpoints

checkpoints

generated

generated

minGPT

minGPT

LICENSE

LICENSE

README.md

README.md

accuracy.py

accuracy.py

finetune.py

finetune.py

generate.py

generate.py

linearprobe.py

linearprobe.py

train.py

train.py

Repository files navigation

minGPT-with-BigBird

Instructions to recreate results presented in project report

Notes for devs

About

Releases

Packages

Languages

License

aphex-bin/minGPT-with-BigBird

Folders and files

Latest commit

History

Repository files navigation

minGPT-with-BigBird

Instructions to recreate results presented in project report

Notes for devs

About

Resources

License

Stars

Watchers

Forks

Languages