This is the code for RAZOR: Sharpening Knowledge by Cutting Bias with Unsupervised Text Rewriting.
We mitigate the spurious correlations between tokens and labels by rewriting the dataset using large language models, please check our paper here.
All of the datasets we used are open-soursed.
Fever dataset: https://fever.ai/dataset/adversarial.html
MNLI dataset: https://paperswithcode.com/dataset/multinli
SNLI dataset: https://paperswithcode.com/dataset/snli
Before running our code, please ensure that the following dependencies are met.
| Library | Version |
|---|---|
| torch | 2.3.0 |
| tokenizers | 0.19.1 |
| transformers | 4.40.1 |
| spacy | 3.7.4 |
| shap | 0.46.0 |
| sentence-transformers | 3.0.1 |
| openai | 1.27.0 |
To run our program, you can simply execute the main.py file located in the root directory.
The directory of the files and some commonly used hyperparameters can be passed via the command line.
Please note that hyperparameters used during training need to be manually adjusted by modifying the relevant sections of the main.py code.
If you are interested in our work or want to use our code, please use the following citation information.
@article{yang2024razor,
title={RAZOR: Sharpening Knowledge by Cutting Bias with Unsupervised Text Rewriting},
author={Yang, Shuo and Prenkaj, Bardh and Kasneci, Gjergji},
journal={arXiv preprint arXiv:2412.07675},
year={2024}
}