Skip to content

Latest commit

 

History

History
70 lines (48 loc) · 2.06 KB

usage-querybuilder.md

File metadata and controls

70 lines (48 loc) · 2.06 KB

Pyserini: Usage of the Query Builder API

The querybuilder provides functionality to construct Lucene queries through Pyserini. These queries can be directly issued through the LuceneSearcher. Instead of issuing the query hubble space telescope directly, we can also construct the same exact query manually as follows:

from pyserini.search.lucene import querybuilder

# First, create term queries for each individual query term:
term1 = querybuilder.get_term_query('hubble')
term2 = querybuilder.get_term_query('space')
term3 = querybuilder.get_term_query('telescope')

# Then, assemble into a "bag of words" query:
should = querybuilder.JBooleanClauseOccur['should'].value

boolean_query_builder = querybuilder.get_boolean_query_builder()
boolean_query_builder.add(term1, should)
boolean_query_builder.add(term2, should)
boolean_query_builder.add(term3, should)

query = boolean_query_builder.build()

Then issue the query:

from pyserini.search.lucene import LuceneSearcher

searcher = LuceneSearcher.from_prebuilt_index('robust04')

# Generate your query, per above...

hits = searcher.search(query)

for i in range(0, 10):
    print(f'{i+1:2} {hits[i].docid:15} {hits[i].score:.5f}')

The results should be exactly the same as:

hits = searcher.search('hubble space telescope')

By manually constructing queries, it is possible to define the boost for each query term individually. For example:

boost1 = querybuilder.get_boost_query(term1, 2.)
boost2 = querybuilder.get_boost_query(term2, 1.)
boost3 = querybuilder.get_boost_query(term3, 1.)

should = querybuilder.JBooleanClauseOccur['should'].value

boolean_query_builder = querybuilder.get_boolean_query_builder()
boolean_query_builder.add(boost1, should)
boolean_query_builder.add(boost2, should)
boolean_query_builder.add(boost3, should)

query = boolean_query_builder.build()

hits = searcher.search(query)

for i in range(0, 10):
    print(f'{i+1:2} {hits[i].docid:15} {hits[i].score:.5f}')

Note that the results are different, because we've placed more weight on the term hubble.