Skip to content

MehdiSaffar/cmpe493-term-project-covid-trec-search-engine

Repository files navigation

CMPE 493 - Term Project 2020

Retrieve most relevant documents to the query from the TREC-COVID Complete dataset. More details in project_description.pdf.

Using Python 3.8.6

Members

  • Mehdi Saffar
  • Hüseyin Can Bölükbaş
  • Burak Berk Özer

How to Run

Preprocessing

Normally for the preprocessing we would use ./data/metadata.csv but we found that reading csv file is too slow. We decided to use feather file format instead which is much faster and helps us iterate quickly.

Make sure the following files exist inside ./data folder:

  • Faster-loading metadata file: metadata.feather
  • File containing evaluations of queries: eval.txt
  • File containing topics: topics-rnd5.xml

To preprocess the documents run the following command:

$ python prep.py

Query

To query the documents run the following command:

$ python query.py [INSERT_YOUR_QUERY_HERE]

Example:

$ python query.py "coronavirus"
$ python query.py "virus origin"

Evaluation

For odd topics use:

$ python eval.py odd

For even topics use:

$ python eval.py even

After above command "mAP", "ndcg" and "P.10" metrics will be outputted to the terminal.

About

CMPE493 Term project: Retrieve most relevant documents from TREC-COVID Complete dataset to the given query.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages