GitHub

ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models

This is the repo for ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models.

Dataset Introduction:

CONTRADOC contains 449 self-contradictory(positive examples) and 442 non-contradictory(negative examples) documents, covering documents sourced from CNN_Dailymain News, Wikipedia and Story Summaries, document length varying from 300 to 2200 tokens. This dataset is created by introducing self-contradiction to documents using GPT-4-Modify => Human Annotate and Verify. This dataset is developed as a benchmark to test model's ability in finding contradiction in long document.

Dataset Format:

The positive examples are in "pos", while negative examples are in "neg", please refer to the paper for more detials of each label.

{"pos":
  {"DOC_ID":
    {"text": DOCUMENT, 
      "evidence": SENTENCE_INTRODUCING_CONTRADICTION,
      "unique id": DOC_ID,
      "doc_type":"story_OR_news_OR_wiki",
      "contra_plug": "Insert_OR_Replace", 
      "contra_type": [contradiction type],
      "scope": "global_OR_local_OR_intra",
      "ref sentences"(optional): [sentences contradict the evidence]
    },
  },
},
{"neg":
  {"DOC_ID":
    {"text": DOCUMENT,
      "doc_type": "story_OR_news_OR_wiki",
      "unique id": DOC_ID
    },
  },
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ContraDoc.json		ContraDoc.json
LICENSE		LICENSE
README.md		README.md
eval_metric.py		eval_metric.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models

Dataset Introduction:

Dataset Format:

We will be releasing the code for evaluation and dataset creation soon.

About

Releases

Packages

Contributors 2

Languages

License

ddhruvkr/CONTRADOC

Folders and files

Latest commit

History

Repository files navigation

ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models

Dataset Introduction:

Dataset Format:

We will be releasing the code for evaluation and dataset creation soon.

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages