This repo provides the source code & data for our paper:
Too Much in Common: Shifting of Embeddings in Transformer Language Models and its Implications
To install torch version matching your system see the official guide.
conda create -n tmic python=3.8 && conda activate tmic
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c conda-forge
conda install transformers numpy scipy scikit-learn pandas matplotlib seaborn -c conda-forge
chmod +x
To replicate the experiments from the paper run:
This will create the directories:
if they do not yet exist. This is where the respective results can be found. Plots are also available to download from here. (~75mb)
One can run the scripts for a specific model as:
python src/ --model <model-name-from-hugging-face> --output <path-to-output-json-file>
python src/ --model <model-name-from-hugging-face> --output <path-to-output-json-file>
By default, if no arguments are provided, the experiments are executed for all models evaluated in the paper.
In that case, the results are saved in experiments/<isotropy|benchmarks>/dmYHMS.json
author = {Daniel Biś and Maksim Podkorytov and Xiuwen Liu},
title = {Too Much in Common: Shifting of Embeddings in Transformer Language Models and its Implications},
year = {2021},
booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)},