Skip to content

ddhruvkr/CONTRADOC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models

This is the repo for ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models.

Dataset Introduction:

CONTRADOC contains 449 self-contradictory(positive examples) and 442 non-contradictory(negative examples) documents, covering documents sourced from CNN_Dailymain News, Wikipedia and Story Summaries, document length varying from 300 to 2200 tokens. This dataset is created by introducing self-contradiction to documents using GPT-4-Modify => Human Annotate and Verify. This dataset is developed as a benchmark to test model's ability in finding contradiction in long document.

Dataset Format:

The positive examples are in "pos", while negative examples are in "neg", please refer to the paper for more detials of each label.

{"pos":
  {"DOC_ID":
    {"text": DOCUMENT, 
      "evidence": SENTENCE_INTRODUCING_CONTRADICTION,
      "unique id": DOC_ID,
      "doc_type":"story_OR_news_OR_wiki",
      "contra_plug": "Insert_OR_Replace", 
      "contra_type": [contradiction type],
      "scope": "global_OR_local_OR_intra",
      "ref sentences"(optional): [sentences contradict the evidence]
    },
  },
},
{"neg":
  {"DOC_ID":
    {"text": DOCUMENT,
      "doc_type": "story_OR_news_OR_wiki",
      "unique id": DOC_ID
    },
  },
}

We will be releasing the code for evaluation and dataset creation soon.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages