Skip to content
/ xANLG Public

Data and code for "Understanding Linearity of Cross-Lingual Word Embedding Mappings" (TMLR 2022)

License

Notifications You must be signed in to change notification settings

Pzoom522/xANLG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

xANLG

Data and code for Understanding Linearity of Cross-Lingual Word Embedding Mappings (TMLR 2022)

Data

Please find the cross-lingual word analogy corpus (xANLG) in the /data folder.

Code

  • get_emb.py: Retrieve vectors corresponding to lexicons of xANLG from pre-trained word embeddings, then perform pre-processing steps. We process one language pair per time.
  • LRCos: Please directly use the Vecto library.
  • validate_analogy.py: Perform the parallelogram validation algorithm introduced in §4.1.3.
  • linear_map.py: Find the linear mapping using Generic Procrustes Analysis.

About

If you like our project or find it useful, please give us a ⭐ and cite us

@article{xANLG,
title={Understanding Linearity of Cross-Lingual Word Embedding Mappings},
author={Xutan Peng and Mark Stevenson and Chenghua Lin and Chen Li},
journal={Transactions on Machine Learning Research},
year={2022},
url={https://openreview.net/forum?id=8HuyXvbvqX}
}

About

Data and code for "Understanding Linearity of Cross-Lingual Word Embedding Mappings" (TMLR 2022)

Topics

Resources

License

Stars

Watchers

Forks

Languages