Skip to content

An information retrieval system for boolean queries, proximity quries and wildcard queries using Inverted indexing, Biword indexing, positional indexing and soundex indexing.

License

Notifications You must be signed in to change notification settings

Ojas1804/InfoRet-System

Repository files navigation

DESCRIPTION OF EACH FILE:

There are 12 files in this asignment folder. BooleanOperator.py: defines and, or and not operator for list data structure. Conversion.py: defines infix to postfix conversion of boolean expressions ExtendedBinaryRetrieval.py: defines the extended binary retrieval model (phrase query with biword index). InverseIndex.py: defines basic inverted indexing Lemmatizer.py: defines tokenization and lemmatization of text main.py: main program. This is where from where you can test this assignment. Query.py: defines query processing (both normal and biword query processing) README.md: this file Stack.py: defines different operations of stack data structure PositionalIndex.py: defines positional indexing SoundexIndex.py: defines soundex indexing Soundex.py: defines soundex algorithm

  • ExtendedBinaryRetrieval.py extends InverseIndex.py

FOLDERS:

Dataset/corpus for this assignment is present in the Dataset folder. posting_list.txt and biword_index.txt contain posting lists for single words and biwords respectively.

Indexes folder contains the indexes generated by the program. The indexes are stored in the form of a dictionary. The dictionary is stored in a text file.

THINGS TO ADD:

  • Implementing indexes through B+-trees
  • Better structure to classes
  • Processing proximity queries.

About

An information retrieval system for boolean queries, proximity quries and wildcard queries using Inverted indexing, Biword indexing, positional indexing and soundex indexing.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages