About

This repository is our submission to Assignment-2 for the course Information Retrieval (CS F469) offered 2nd semester 2019-2020 at BITS Pilani, Pilani Campus.

It's basically a TF-IDF vector space model to rank documents wrt queries with some additional improvements - spelling correction on queries and bigram index to better answer phrasal queries.

Use

To create inverted-index and other data structures, run python3 util.py

Enter path to corpus file (example wiki_02 file above)
For part-1 and part-2, improvement1 (spelling correction) same index is used so enter 1
For part-2, improvement2 (phrasal queries via bigram index) new index is to be created so enter 2
All the files are stored in the current directory.
For option 1, files stored are - inv_index.pkl, doc_lengths.pkl, doc_id_2_title.pkl
For option 2, files stored are - inv_index.pkl, doc_lengths.pkl, doc_id_2_title.pkl, doc_bi_lengths.pkl
Notice the name of the files are same in both cases.

To query the index, run python3 test_queries.py

Enter the query
To query against original index, enter 1 (should have all files with above names in the current directory)
To query against original index with spelling correction (improvement1), enter 2 (again should have files)
To query against combined index, enter 3 (should have all files from construction code option 2)

Note

In the test_queries.py file, the names of the files to be loaded are specified in load_files() function.
The structure of corpus file is:

<doc>...</doc>
<doc>...</doc>
...
<doc>...</doc>

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
test_queries.py		test_queries.py
util.py		util.py
wiki_02		wiki_02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

test_queries.py

test_queries.py

util.py

util.py

wiki_02

wiki_02

Repository files navigation

About

Use

Note

About

Releases

Packages

Languages

Kumar-Tarun/document-ranking

Folders and files

Latest commit

History

Repository files navigation

About

Use

Note

About

Topics

Resources

Stars

Watchers

Forks

Languages