Skip to content

Programs used as a part of the TREC 2017 PM/CDS Track utilizing Python and Elasticsearch.

Notifications You must be signed in to change notification settings

ajinkyathorve/TREC-2017-PM-CDS-Track

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

TREC-2017-PM-CDS-Track

Synopsis

The Text REtrieval Conference (TREC), co-sponsored by the National Institute of Standards and Technology (NIST) and U.S. Department of Defense, is a series of workshops that focus on Information Retrieval research.
It has different research areas (called tracks) and each track has different challenges in which particular retrieval tasks are defined.
The aim for TREC 2017 Precision Medicine Track was to provide useful information to physicians for treating cancer patients.
This is a small part of my codebase that I used for the TREC 2017 Precision Medicine/Clinical Decision Support Track.

Usage

It mainly contains two programs (and they should be run in the following order):

  1. extract_xml_to_elastic.py
  • It reads the data from the input xml files and indexes it in Elasticsearch.
  • The input dataset this is currently configured to work on is the clinical trials dataset (which has over 2,41,006 xml files).
  • You will need to modify the path to the input xml files on line 27.
  1. query_elasticsearch.py
  • It queries Elasticsearch with different query topics and writes the output to a file.
  • The output is a text file with retrieved results for each query in the standard trec_eval format.
  • You will need to modify the path to the query xml file on line 25.
  • You might want to change the name and location of the output text file on line 102 as per your preference.

After making the necessary modifications, the above programs can simply be executed from the command line as shown below.

python extract_xml_to_elastic.py

and

python query_elasticsearch.py

The Elasticsearch version used for this project is 5.5.0 and Python version used is 2.7.12.

Useful Links

TREC
TREC 2017 Precision Medicine/Clinical Decision Support Track
Elasticsearch
Elasticsearch Python API

About

Programs used as a part of the TREC 2017 PM/CDS Track utilizing Python and Elasticsearch.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages