By: Nam Do (Email, Website, Google Scholar), Ellie Pavlick (Email, Website, Google Scholar)
Important links: Paper, Code and Data
BibTeX*:
@article{dorotten,
title={Are Rotten Apples Edible? Challenging Commonsense Inference Ability with Exceptions},
author={Do, Nam and Pavlick, Ellie}
}
The dataset is located at data/winoventi_bert_large_final.tsv
. When loading the data, it is important to note that the data is a tab-separated sheet (separated by \t
). An example code to load the data:
imporot pandas as pd
data = pd.read_csv("data/winoventi_bert_large_final.tsv", sep="\t")
Dataset length: There are 4352 rows and 9 fields in the dataset, representing 4352 challenges (2176 adversarial, 2176 stereotypical) to a language model.
Fields: The fields and descriptions of types and what they represent are as follows:
Word
: A String, that represents the entity of interest (from the THINGS dataset, as mentioned in the paper)Associative Bias
: A String, that represents adjectives that are associated with the entity regardless of the context being positive or negative (e.g.,apple
is associated withedible
regardless of the context beingThe apple is _____
orThe apple is not _____
).Alternative
: A String, that represents the crowdsourced adjectives that might be true of the entity when the associative bias adjective is not (see paper).biased_word_context
: A String, that represents the context that makes the entity to be correctly characterized by the associative bias adjective and not by the alternative adjective. See paper for a more detailed description.adversarial_word_context
: A String, that conversely represents the context that makes the entity to be correctly characterized by the alternative adjective and not by the associative bias adjective. See paper for a more detailed description.masked_prompt
: A String that combines the context and the descriptor of the entity and mask the correct answer.target
: A String, that represents the correct answer to themasked_prompt
.incorrect
: A String, that represents the incorrect andswer to themasked_prompt
.test_type
: A number, that represents the type of challenge that the schema is testing.1
represents the "stereotypical challenge", testing whether a language model correctly predicts the associative bias descriptor when the context is thebiased_word_context
.2
represents the "exception challenge", testing whether the language model correctly predicts the alternative descriptor when the context is theadversarial_word_context
.
Other relevant files in data/
:
data/source/things_concepts.tsv
: The original THINGS dataset, from which we derived our entities of interest.data/assets/associativebias_registry.tsv
: The file that records the biases that language models associate with our entities of interest.data/assets/crowdsourcing
: Contains files that we used to prepare the our crowdsourcing tasks, as well as the results we collected.data/assets/finetune
: Contains the train/test splits that we use in order to perform the finetuning experiments as mentioned in the paper.