Skip to content

aphex-bin/minGPT-with-BigBird

Repository files navigation

minGPT-with-BigBird

DD2412 project at KTH by Leo Hiselius, Jonas Thunberg and Alfons Heintz, {leohi, jonthu, alfonsh}"at"kth.se

This project strives to combine two models:

Instructions to recreate results presented in project report

  • Run train.py to train GPT models.
  • Rune finetune.py to finetune the model heads to a classification task.
  • Run linearprobe.py to train one linear probe model per layer of each model.
  • Run accuracy to check the accuracy of all of the different models.
  • Run generate to generate example pictures from the full vanilla and BigBird models.

Notes for devs

Maybe useful example of autograd on sparse matrices in comments

Video on BigBird, timestamp on block/roll implementation of sparse attention

About

An implementation of the minGPT architecture using BigBird masking.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages