aatlantise/transformers

This Huggingface Transformers-forked framework was used to run experiments for Junghyun Min¹'s master's thesis The roots and effects of heuristics in NLI and QA models, produce in collaboration with Tom McCoy¹ and Tal Linzen². This framework was developed on Nafise Sadat Moosavi³'s original implementation of HANS evaluation on Huggingface Transformers.

¹Department of Cognitive Science, Johns Hopkins University, Baltimore, MD

²Department of Linguistics; Department of Data Science, New York University, New York, NY

³Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany

Setting up the experiment enviornment

Please run setup.py and pip -r install requirements.txt to setup the necessary tools and packages.

Running the experiments on Docker

We recommened that you take advantage of the PyTorch Docker image provided by Huggingface, located at docker/transformers-pytorch-gpu. If you are new to Docker, you're in luck because I was as well 24 hours ago. We walk through the Docker setup and installation in this section. First, download the repository. The Docker image needs to be in the root directory of the repository.

mkdir transformers && cd transformers
git clone https://www.github.com/aatlantise/transformers
cp docker/transformer-pytorch-gpu/Dockerfile ./

Then, we build an image. We will name the image transformers-docker.

docker build . -t transofmers-docker

Now, for our convenivnce, we create a shared drive before creating the container. We name it roots-effects.

mkdir ~/container-data
docker run -it --name roots-effects -v ~/container-data:/data transformers-docker

Perchance, you may need to specify a GPU in a multiGPU machine or server:

docker run -e NVIDIA_VISIBLE_DEVICES=[GPU-number] -it -P --name [container-name] -v [shared-drive-local-path]:[shared-drive-container-path] [image name]

You are in the container! Now let's finish the setup.:

cd transformers
pip install -e . && pip install -r experiments/requirements.txt

You're ready to run your experiments! If you run into a versioning / GPU problem while running jobs, it is likely a PyTorch issue. Visit pytorch.org and install what is appropriate for your device and ocontainer.

Training and in-distribution evaluation

We provide two tasks: natural language inference (NLI) and question answering (QA). We use the MultiNLI and the BoolQ datasets for training and in-distribution evaluation, respectively. The dataset for BoolQ, cached and tsv, can be found in /examples/boolq/boolq_data, while the MNLI dataset must be downloaded using curl https://dl.fbaipublicfiles.com/glue/data/WNLI.zip -o MNLI.zip, then unzipped.

Out-of-distribution evaluation

For each task, we provide an out-of-distribution evaluation set that targets three known heuristics in NLI models (Mccoy et al. 2019b). We use Heuristics Analysis for NLI Systems to evaluate NLI models, and its QA adaptation QA-HANS to evaluate QA models. The datasets are located in /examples/[task-name]/hans/.

Results

We include figures for one iteration for each task for your reference, located in /examples/[task-name]/[task-name]_save/. [task-name]_[step-count].txt include in-distribution accuracy at step [step-count], while hans_[step-count].txt and hansres_[step-count].txt include HANS evaluation labels and its accuracy broken down into categories and subcases.

Running your experiments

To run your experiment, simply configure mnli.sh or boolq.sh with your parameters and run bash mnli.sh or bash boolq.sh. Your model will be run, and at every checkpoint, the model's evaluation results will be saved.

If your GPU environment uses Slurm, feel free to use the mnli_finetune.scr or boolq_finetune.scr scripts.

On a k80 GPU, the BoolQ training will take a couple hours, while MNLI training may take between 12 and 18 hours. Note that the experiment only produces test set accuracies as [task-name]_[step-count].txt and HANS label outputs as hans_[step-count].txt. Run python evaluate_heur_output.py hans_[step-count].txt && mv formattedFile.txt hansres_[step-count].txt to produce HANS performances on categories and subcases.

If you have any questions, please feel free to contact us at jmin10@jhu.edu, tom.mccoy@jhu.edu, and linzen@nyu.edu.

Name		Name	Last commit message	Last commit date
Latest commit History 4,050 Commits
.circleci		.circleci
.github		.github
docker		docker
docs		docs
examples		examples
model_cards		model_cards
notebooks		notebooks
src/transformers		src/transformers
templates		templates
tests		tests
utils		utils
.coveragerc		.coveragerc
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
New_Dockerfile		New_Dockerfile
README.md		README.md
deploy_multi_version_doc.sh		deploy_multi_version_doc.sh
hubconf.py		hubconf.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
transformers-cli		transformers-cli
valohai.yaml		valohai.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aatlantise/transformers

Setting up the experiment enviornment

Running the experiments on Docker

Training and in-distribution evaluation

Out-of-distribution evaluation

Results

Running your experiments

About

Releases

Packages

Languages

License

Aatlantise/transformers

Folders and files

Latest commit

History

Repository files navigation

aatlantise/transformers

Setting up the experiment enviornment

Running the experiments on Docker

Training and in-distribution evaluation

Out-of-distribution evaluation

Results

Running your experiments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages