Skip to content

Simple document search (boolean retrieval or TF-IDF) in Python

Notifications You must be signed in to change notification settings

AlexP11223/search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple document search (boolean retrieval or TF-IDF) created during an university course based on Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.

Currently supports only AND queries, e.g. machine learning optimization to find documents containing all these words.

Usage

Requirements:

Install dependencies by executing pipenv install. Use pipenv shell or pipenv run to run scripts.

Use PyTest to run tests: python -m pytest -v test.py or in PyCharm.

Run

python index.py data/data.json output/index.json

to create index file.

Run

python search.py output/index.json

to execute search queries.

Releases

No releases published

Packages

No packages published

Languages