If you also use XLING data and save the dataset in the same directories as we did, you could directly run the code without modifying anything; you might only need to pay a little attention to (c) and (d) below. When using customised data, you would also need to modify the input/output directories specified in (a) and (b) below.

(a) In ./C1/run_all.py, set the directories for (1) pretrained static word embeddings for source and target languages respectively; (2) test dictionary; (3) training dictionary; (4) saving C1-aligned CLWEs.

(b) In ./C2/run_all.py, set the directories for (1) saving hard negative pairs extracted from C1; (2) saving fine-tuned multilingual LMs; (3) pretrained multilingual LMs; (3) some hyper parameter values for C2; (4) pretrained static word embeddings for source and target languages; (5) test dictionary; and (6) training dictionary.

(c) To run supervised (5k seed pairs) and semi-supervised (1k seed pairs) BLI tasks, please set the size of the training dictionary to '5k' or '1k' respectively in both ./C1/run_all.py and ./C2/run_all.py.

(d) To save C1-aligned embeddings for C2's use, please set args.save_aligned_we = True in ./C1/src/main.py. If you run the code for the first time, you might also need to mkdir directories for saving C1-aligned CLWEs, hard negative pairs extracted for C2, and C2 model checkpints, respectively.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SetDirectories.md

SetDirectories.md

Files

SetDirectories.md

Latest commit

History

SetDirectories.md

File metadata and controls