- Proceedings of The 61st Annual Meeting of the Association for Computational Linguistics, Toronto, Canada, 2023
- Supervised by Professor Danushka Bollegala, head of NLP and machine learning research group at the University of Liverpool
- Co-author: Yi Zhou
- Investigate the impact of word frequency on cosine similarity of contextualized word embedding of masked language models such as BERT
- Mitigate the impact of word frequency on cosine similarity underestimation of contextualized word embedding in high-frequency words
- download and save BookCorpus dataset
- collect word embedding, word frequency, and l2-norm from the entire BookCorpus
- plots between word frequency and l2-norm
- Process WiC by adding word frequency, default BERT cosine similarity, and BERT word embedding
- Bayesian optimization to find the theta of the classifier
- Prepare 5 cross-validations dataset from WiC
- Cosine similarity prediction of the l2-norm discounting method compared to default BERT
- word embedding and BERT properties collection
- distributions of words
- word frequency VS local l2-norm
- word frequency VS global l2-norm
- word frequency VS l2-norm variance
- word frequency VS global isotropy
- word frequency VS global self-similarity
part3: approaches to mitigate the impact of word frequency on under and over cosine similarity estimation of BERT
- BERT baseline
- approach1: a single linear adjustment
- approach2: two-way linear adjustment
- approach3: word type-dependent adjustment
- approach4: probabilistic word type-dependent adjustment
- approach5: z-score normalization