Retrieve most relevant documents to the query from the TREC-COVID Complete dataset. More details in project_description.pdf
.
Using Python 3.8.6
- Mehdi Saffar
- Hüseyin Can Bölükbaş
- Burak Berk Özer
Normally for the preprocessing we would use ./data/metadata.csv
but we found that reading csv file is too slow. We decided to use feather
file format instead which is much faster and helps us iterate quickly.
Make sure the following files exist inside ./data
folder:
- Faster-loading metadata file:
metadata.feather
- File containing evaluations of queries:
eval.txt
- File containing topics:
topics-rnd5.xml
To preprocess the documents run the following command:
$ python prep.py
To query the documents run the following command:
$ python query.py [INSERT_YOUR_QUERY_HERE]
Example:
$ python query.py "coronavirus"
$ python query.py "virus origin"
For odd topics use:
$ python eval.py odd
For even topics use:
$ python eval.py even
After above command "mAP", "ndcg" and "P.10" metrics will be outputted to the terminal.