Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

MRC Ablation

This is a repository for the paper "Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets" (Sugawara et al., AAAI 2020).

Analyzed Datasets

Dataset year web spec size paper misc
1 CoQA 2018 link dialogue-based QA 127k link
2 DuoRC 2018 link QA on movie scripts 186k link
3 HotpotQA 2018 link multi-hop reasoning 113k link
4 SQuAD1.1 2016 link QA on Wikipedia 100k link
5 SQuAD2.0 2018 link unanswerable QA on Wikipedia 100k link
6 ARC 2018 link science exam on retrieved docs 8k link
7 MCTest 2015 link children-level narrative QA 2.6k link
8 MultiRC 2018 link multi-sentence QA 6k link
9 RACE 2017 link English exam 100k link
10 SWAG 2018 link machine-generated commonsense QA 113k link

Scripts for Ablation

Our codebase is extended from huggingface's BERT implementation (originally huggingface/pytorch-pretrained-bert as of Nov. 2018).

Coming soon.

Ablation Methods

Each dataset directory under results contains following directories:

Ablation method Directory Description
0 original original the original data (development set)
1 Question interrogatives only drop_question_except_interrogatives drop question words except interrogatives (wh*, how)
2 Function words only drop_content_words drop content words (verb, noun, ...)
3 Content words only drop_function_words drop function words (= stop words here)
4 Vocabulary anonymization vocab_anon replace tokens with their POS tags
5 Question-context similarity drop_except_most_similar_sentences keep the sentences that are the most similar to the question in terms of unigram overlap and drop the other sentences.
6 Shuffle context words shuffle_document_words randomly shuffle all words in the context
7 Shuffle sentence words shuffle_sentence_words randomly shuffle the words in all the sentences except the last token
8 Shuffle sentence order shuffle_sentence_order randomly shuffle the order of the sentences in the context
9 Dummy numerics mask_numerics replace numerical expressions with random numbers
10 Logical words dropped drop_logical_words drop logical terms such as not, every, and if
11 Pronoun words dropped mask_pronouns drop personal and possessive pronouns (PRP and PRP$ tags)
12 Causal words dropped drop_causal_words drop causal terms/clauses such as because and therefore
3' (trained) content words only train_content_only drop function words (= stop words here) (also in training)
6' (trained) shuffle context words train_doc_shuff randomly shuffle all words in the context (also in training)
7' (trained) shuffle sentence words train_sent_shuff randomly shuffle the words in all the sentences except the last token (also in training)
x Context dropped drop_question_words drop all question words
y Question dropped drop_context_words drop all context words
z Options only drop_except_options drop all question and context words (only for multiple choice datasets)

There are results of five different seeds for the shuffle-based methods (seed1 to seed12345).

Each result directory has args_log.txt that specifies hyperparameters.

You can’t perform that action at this time.