CLinical Information Retrieval Evaluation Collection (CLIREC)
Version 1.0 - July 2010
If you use this dataset, please cite the following articles:
Positional Language Models for Clinical Information Retrieval. Florian Boudin, Jian-Yun Nie, Martin Dawes. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2010.
Clinical Information Retrieval using Document and PICO Structure. Florian Boudin, Jian-Yun Nie, Martin Dawes. Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2010.
Deriving a test collection for clinical information retrieval from systematic reviews. Florian Boudin, Jian-Yun Nie, Martin Dawes. Data and Text Mining in Biomedical Informatics (DTMBIO), 2010.
This archive contains three files:
clirec.queries.xml : XML-formatted file containing the queries
Given this example :Adults Barrett's oesophagus Pharmacological treatment Non-resectional surgical treatment Complete eradication of dysplasia 12 months ...
is the root element
Each clinical query is contained in a element
It has 3 attributes:
- 'id' is the query unique identifier
- 'before' is the publication date of the Cochrane review
- 'keywords' is the corresponding keywords query
And up to 6 children:
- is the population/patient query part
- is the problem query part
- is the intervention/exposure query part
- is the comparison query part
- is the clinical outcome query part
- is the duration query part
collection.pmids: file containing the PMIDs of the collection of documents we used in our experiments. You can use this list of 1,212,042 identifiers to gather your collection of documents or alternatively reconduct the search using the following PubMed query:
hasabstract[text] AND "humans"[MeSH Terms] AND (Clinical Trial[ptyp] OR Editorial[ptyp] OR Letter[ptyp] OR Meta-Analysis[ptyp] OR Practice Guideline[ptyp] OR Randomized Controlled Trial[ptyp] OR Review[ptyp]) AND English[lang]
clirec.qrels : relevance judgments (TREC-Like formatted)
Given this example :
A10.1 1991 1724669 1 A10.1 1998 9540427 1 A10.1 1998 6130330 1 A10.1 2006 16616232 1 ...
This first column is the query identifier, the second is the publication date of the relevant document and the third its PMID.