poly-ir-toolkit

Information retrieval toolkit for large document collections.

Developed at the Web Exploration and Search Technology Lab at the Polytechnic Institute of NYU, under the advisement of Professor Torsten Suel, PolyIRTK provides tools for indexing and querying large document collections.

The aim of PolyIRTK is to act as a platform for research into new algorithms for compression/decompression, querying, and indexing techniques. It implements a number of techniques from recent research literature that improves on the efficiency of indexing and querying. PolyIRTK is also focused on query optimization and early termination algorithms with rank safe properties (that is, faster querying performance, with absolutely no loss in result quality). Another goal of PolyIRTK is to be a search engine library for applications that require fast full text search.

Features

On the querying side: * BM25 document ranking * DAAT algorithms for AND/OR mode querying * WAND query optimization algorithm * MaxScore query optimization algorithm * MaxScore with block and chunk level score skipping query optimization algorithm * Early termination algorithm based on multi-layered indices with list score upperbounds * On disk index with LRU caching (cache size is configurable) or a completely in-memory index * Support for TREC experiments * ...and more

On the indexing side: * In-memory compression of intermediate postings with adjustable memory consumption * Writes out complete (fully queryable) indices (which are merged later) * Supports many fully configurable compression algorithms for docIDs, frequencies, positions, and the block header (Rice, Turbo-Rice, Variable Byte, S9, S16, PForDelta) * Allows docID remapping based on document URLs for better compression and significantly faster querying performance. * Allows indices to be layered (with several layering algorithms) by partial docID scores, used in various early termination algorithms * ...and more

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
scripts		scripts
src		src
Makefile		Makefile
README.md		README.md
irtk-index-gov2		irtk-index-gov2
irtk.conf		irtk.conf
irtk.conf.large		irtk.conf.large

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

poly-ir-toolkit

Features

About

Releases

Packages

Contributors 2

Languages

amallia/poly-ir-toolkit

Folders and files

Latest commit

History

Repository files navigation

poly-ir-toolkit

Features

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages