Diagnostic NLI dataset for evaluating presuppositional reasoning in conditionals (the proviso problem). 8,500 examples with controlled linguistic variations.
Within the scripts folder, the Dataset_Creation subfolder contains the script used to produce the CONFER Challenge Subsets, while the subfolders Zero-Shot Evaluation, Subset 2, Subset 3, and Subset 4 contain the scripts to run the respective experiments. The Interpret folder contains the Interpret PyTorch implementation of Integrated Gradients (from https://github.com/koren-v/Interpret.git) used on Subsets 1-3. The Results_Processing subfolder contains the scripts for processing the results and producing the visualizations.
The filepaths for the datasets will need to be updated accordingly. HuggingFace and OpenAI tokens need to be added.