Source code for our ICML 2023 paper How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding.
lda_bert_demo.ipynb
: train a BERT model on LDA (topic modeling) data, and plot its attention pattern, and save other information such as attention score statistics, embedding dot products, model parameter visualizations, etc.
config/
: the config files are auto-generated when you run the above iPython notebook and set the hyperparameters accordingly.
The code heavily borrows from dyck-transformer and dyckkm-learning. Thanks these authors!
If you found our paper or codes useful, please cite the paper and star this repo, thank you!
Feel free to contact yuchenl4@cs.cmu.edu if you have any questions.
@misc{li2023transformers,
doi = {10.48550/ARXIV.2303.04245},
url = {https://arxiv.org/abs/2303.04245},
author = {Li, Yuchen and Li, Yuanzhi and Risteski, Andrej},
keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), Machine Learning (stat.ML), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding},
publisher = {arXiv},
year = {2023},
copyright = {arXiv.org perpetual, non-exclusive license}
}