The source code is developed upon kNN-MT. You can see detail in https://github.com/urvashik/knnmt, many thanks to the authors for making their code avaliable.
- pytorch version >= 1.5.0
- python version >= 3.6
- faiss-gpu >= 1.6.5
- pytorch_scatter = 2.0.5
- 1.19.0 <= numpy < 1.20.0
You can install this project by
pip install --editable ./
We use an example to show how to use our codes.
The pre-trained translation model can be downloaded from this site. We use the De->En Single Model for all experiments. And the pre-processed data can be downloaded from this site.
Tips: for convenience, please place the pre-trained model at models/wmt19.de-en/
and place the pre-processed data at datasets/
This script will create datastore (includes key.npy and val.npy) for the data.
cd revisedkey-scripts
# use pre-trained model to build datastore for koran domain (the following `base` means the pre-trained model)
bash build_datastore.sh base koran
To evaluate the kNN-MT on the test set:
cd revisedkey-scripts
bash knnmt_inference.sh base koran
To train the revised:
cd revisedkey-scripts
# step 1. prepare heldout data from base model
bash save_retrieve_result.sh base koran
bash save_valid_kv.sh base koran
bash save_valid_retrieve_result.sh base koran
# step 2. fine-tune base model to prepare heldout data
bash finetune.sh koran
# step 3. prepare heldout data from fine-tuned model
bash build_datastore.sh finetune koran
bash save_retrieve_result.sh finetune koran
bash save_valid_kv.sh finetune koran
bash save_valid_retrieve_result.sh finetune koran
# step 4. train reviser
bash train_reviser.sh koran
Or you can obtain the revised datastores following steps:
- Download checkpoints of fine-tuned models from this site
- Download trained revisers and faiss index from this site
- Refer datastore-revise.ipynb to revise the original datastore
To evaluate revised datastores in batches:
cd revisedkey-scripts
bash batch_knnmt_inference.sh koran ../save/news_to_koran
To evaluate single revised datastore:
cd revisedkey-scripts
bash our_inference.sh koran <datastore_path>