Skip to content

amazon-science/multilingual-robust-contrastive-pretraining

Multilingual Robust Constrastive Pretraining

This code is released as part of our EACL paper on Robustification of Multilingual Language Models to Real-world Noise with Robust Contrastive Pretraining (official link coming up).

Citation

If you use code/data in this repository, you will have to cite the following work:

@proceedings{eacl-2023-asa-sailik,
    title = "Robustification of Multilingual Language Models to Real-world Noise with Robust Contrastive Pretraining",
    author = {Stickland, Asa Cooper and Sengupta, Sailik and Krone, Jason and Mansour, Saab and He, He},
    booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics",
    year = "2023",
    publisher = "Association for Computational Linguistics"
}

Dependencies

As the code-base has several code paths (eg. wikipedia data downloading, model pre-training, joint ic-sl evaluation, xnli, ner evaluation), we do not provide a single requriements.txt file with all dependencies. We suggest the user to download dependencies as and when required and provide a few basic building block package installations:

- python>=3.6
- torch==1.6.0
- transformers==3.0.2
- seqeval==0.0.12
- pytorch-crf==0.7.2

LICENSES

The code base is build on the shoulder of other code-bases. Licenses for these code bases can be found inside THIRD_PARTY_LICENSES.md. Any amendments made to the code in this code-base is licensed as per LICENSE.

The data inside paper_data have licenses of their own (which overrides the aforementioned license). More information about the individual licenses for the data can found in this README.md file.

Training & Evaluation

Run scripts can be found inside the runner_scripts directory.

About

Robust Contrastive Pre-training for developing robust multilingual language models (EACL 2023).

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published