Adding multi-threads support with multiple params to SearchCollection #470

Peilin-Yang · 2018-11-05T03:31:43Z

In this PR I fundamentally change how SearchCollection works.

With this PR one can provide multiple params for base models, for example -b 0.2 0.75 for BM25 or -mu 200 2000 for QL. One can also provide multiple params for reranking models, at the same time, for example -rm3.fbDocs 10 20 -rm3.fbTerms 50 100 for RM3. As a result, there are N1N2N3*... run files being generated where N1..Nn are the values of the params.
Now the SearchCollection spawns new threads for all retrievals by default. This, together with the newly introduced -inmem option, make a bunch of retrieval less expensive since multiple retrievals need to load the index (potentially in the memory for better multi-threading) once.

… list of params and construct the Lucene searcher in a for loop and output to different files. This can reduce the effor of reading index from disk every time one'd like to run another set of params. The reranking is also supported in the similar way

lintool · 2018-11-07T16:04:14Z

src/main/java/io/anserini/search/similarity/AuxSimilarity.java

+
+import org.apache.lucene.search.similarities.Similarity;
+
+public class AuxSimilarity {


How about we name this class TaggedSimilarity?

lintool · 2018-11-07T16:04:23Z

src/main/java/io/anserini/search/similarity/AuxSimilarity.java

+package io.anserini.search.similarity;
+
+import org.apache.lucene.search.similarities.Similarity;
+


Add top-level javadoc?

lintool · 2018-11-07T16:04:55Z

src/main/java/io/anserini/search/SearchCollection.java

    searcher.close();
    final long durationMillis = TimeUnit.MILLISECONDS.convert(System.nanoTime() - start, TimeUnit.NANOSECONDS);
-    LOG.info("Total " + numTopics + " topics searched in "
-        + DurationFormatUtils.formatDuration(durationMillis, "HH:mm:ss"));
+    LOG.info("Total run time: " + DurationFormatUtils.formatDuration(durationMillis, "HH:mm:ss"));


Why don't we print out number of topics anymore?

Because now it is print inside each retrieval thread.
Search for LOG.info("Run " + topics.size() in this PR

lintool

Lots of code change, but fairly straightforward, actually...

lintool · 2018-11-07T16:15:08Z

src/main/java/io/anserini/search/SearchCollection.java

+
+    private SearcherThread(IndexSearcher searcher, SortedMap<K, Map<String, String>> topics, AuxSimilarity auxSimilarity,
+                           String cascadeTag, RerankerCascade cascade, String outputPath, String runTag) throws IOException {
+      this.searcher = searcher;


Why do we need a separate cascade tag? Each reranker has its own tag, right? So can't the RerankerCascade reconstruct the tag by joining each individual tag?

Most of the time we will probably have just 1 reranker and I think it is better to have just 1 tag.
Also, we have TieBreaker as the default last reranker and I am not sure if it is worthy to concatenate all tags.

lintool · 2018-11-07T16:17:38Z

High-level comments, for discussion:

Instead of depending on toString, maybe use explicit tag method?
Would it make sense to introduce a new abstraction called RankingModel that is a combination of the similarity and the rerankers?

Peilin-Yang · 2018-11-11T15:53:22Z

Instead of depending on toString, maybe use explicit tag method?
Will do
Would it make sense to introduce a new abstraction called RankingModel that is a combination of the similarity and the rerankers?
Yes, this makes sense. But let's make it in another PR?

lintool · 2018-11-11T16:58:01Z

Can you create a separate issue for above so we don't lose track of it?

Also, please make sure regressions pass before you merge?

Peilin-Yang · 2018-11-11T23:39:19Z

ALL Regression tests passed on tuna, going to merge

token test bug fix

Peilin-Yang added 3 commits November 3, 2018 22:21

make searching multi-threads

9578858

add javadoc to SearchCollection

fdd8afc

Peilin-Yang requested a review from lintool November 6, 2018 15:04

lintool reviewed Nov 7, 2018

View reviewed changes

lintool requested changes Nov 7, 2018

View reviewed changes

lintool reviewed Nov 7, 2018

View reviewed changes

addressed CRs

8e9720c

lintool approved these changes Nov 11, 2018

View reviewed changes

Merge branch 'master' into multi_paras_retrieval_test

68b8ef1

Peilin-Yang merged commit 4d81578 into master Nov 11, 2018

Peilin-Yang deleted the multi_paras_retrieval_test branch November 16, 2018 19:58

crystina-z pushed a commit to crystina-z/anserini that referenced this pull request Oct 28, 2022

tokentest fix for issue (castorini#470)

c28d9d3

token test bug fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding multi-threads support with multiple params to SearchCollection #470

Adding multi-threads support with multiple params to SearchCollection #470

Peilin-Yang commented Nov 5, 2018

lintool Nov 7, 2018

lintool Nov 7, 2018

lintool Nov 7, 2018

Peilin-Yang Nov 11, 2018

lintool left a comment

lintool Nov 7, 2018

Peilin-Yang Nov 11, 2018

lintool commented Nov 7, 2018

Peilin-Yang commented Nov 11, 2018

lintool commented Nov 11, 2018

Peilin-Yang commented Nov 11, 2018


		import org.apache.lucene.search.similarities.Similarity;

		public class AuxSimilarity {

		package io.anserini.search.similarity;

		import org.apache.lucene.search.similarities.Similarity;

Adding multi-threads support with multiple params to SearchCollection #470

Adding multi-threads support with multiple params to SearchCollection #470

Conversation

Peilin-Yang commented Nov 5, 2018

lintool Nov 7, 2018

Choose a reason for hiding this comment

lintool Nov 7, 2018

Choose a reason for hiding this comment

lintool Nov 7, 2018

Choose a reason for hiding this comment

Peilin-Yang Nov 11, 2018

Choose a reason for hiding this comment

lintool left a comment

Choose a reason for hiding this comment

lintool Nov 7, 2018

Choose a reason for hiding this comment

Peilin-Yang Nov 11, 2018

Choose a reason for hiding this comment

lintool commented Nov 7, 2018

Peilin-Yang commented Nov 11, 2018

lintool commented Nov 11, 2018

Peilin-Yang commented Nov 11, 2018