Skip to content

constraint-solvers/benchmark-corruptions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 

Repository files navigation

benchmark-corruptions

Public repo containing supplementary material for the work submitted to EMNLP-19

This repository hosts the corrupted test sets that were generated to benchmark the robustness of the FastText and Bi-LSTM models as described in the paper.Benchmarking Popular Classification Models' Robustness to Random and Targeted Corruptions These test sets were derived by applying the two corrupted strategies and the different corruption methods to the test sets of the following datasets: SST2, IMDB, YELP and DBPEDIA

The repository consists of two folders each corresponding to the respective models used. The filenames follow the convention: [DATASET][CORRUPTION STRATEGY][CORRUPTION METHOD][MODEL]_n

  • DATASET is the base dataset that was used
  • CORRUPTION STRATEGY One of Random or LIME based corruption
  • CORRUPTION METHOD Corresponds to how the selected word was modified (Deleted, Spelling Error Introduced, Text Noise introduced, Replaced by Synonym)
  • MODEL The model used for LIME explanations
  • n The number of words changed as per the strategy

PS: For the fasttext model, the files are in the format LABEL[TAB]TEXT For the Bi-LSTM model, we use a self-explanatory JSON format to be compatible with AllenNLP

About

EMNLP-2019 Supplementary Material

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published