Implement Sequential Dependency Model query constructor #359

Peilin-Yang · 2018-07-22T16:49:31Z

It turns out that we can combine different query semantics from Lucene to construct SDM query.
For example, for ordered window and unordered window query, we can use SpanQuery.
We then use BooleanQuery together with BoostQuery to construct the complete query.

Please see the unit test for examples.

Test on Gov2 701-750: MAP - 0.2922 (0.2832 was reported in the original paper)

lintool · 2018-07-22T17:06:44Z

Would it make sense to have a generic QueryBuilder API instead of these static methods?

Peilin-Yang · 2018-07-22T17:12:51Z

You mean subclass Query and create an inner Builder class?

Peilin-Yang · 2018-07-23T15:05:11Z

After thinking about it and looking at the Query class code, I think we do not necessary need SDM as a query class.
The Query class and its subclasses mainly implements primitive query functions and SDM query can be constructed by leveraging the existing query semantics.
(From the discussion in Slack channel: Jimmy mentioned to also make query expansion as subclasses of Query. I don't think this is suitable too. Query mainly deal with raw text queries w/o interacting with other components, e.g. index but query expansion involves reading indexing which is complicated to reason about)

I think we could make a separated Query package under Search and wrap this up?

lintool · 2018-07-23T15:18:20Z

sg

…ependencyQuery) into search package

lintool · 2018-07-24T15:30:47Z

src/main/java/io/anserini/search/query/QueryBase.java


 import java.io.IOException;
 import java.io.StringReader;
 import java.util.ArrayList;
 import java.util.List;

-public class AnalyzerUtils {
+public abstract class QueryBase {


I'd still like to keep AnalyzerUtil for the tokenize method - which is used beyond just query formulation.

lintool · 2018-07-24T15:33:10Z

How about QueryGenerator as an abstract class? I don't quite like how QueryBase sounds.
Then we can have {X,Y,Z}QueryGenerator.
How about folding the field and the analyzer in the constructor? I think that'd make the API cleaner?

lintool · 2018-07-25T10:46:19Z

src/main/java/io/anserini/search/query/BagOfTermsQueryGenerator.java

+/*
+ * Bag of Terms query builder
+ */
+public class BagOfTermsQueryGenerator extends QueryGenerator {


Can we rename it BagOfWords since that's commonly known?

lintool · 2018-07-25T10:47:36Z

src/main/java/io/anserini/search/query/TermDependencyQueryGenerator.java

+/* Build the Term Dependency query. See:
+ * D. Metzler and W. B. Croft. A markov random field model for term dependencies. In SIGIR ’05.
+ */
+public class TermDependencyQueryGenerator extends QueryGenerator {


"TermDependency" is vague. How about SdmQueryGenerator to make consistent with test case below?

lintool · 2018-08-12T14:16:11Z

@Peilin-Yang I know we were going to defer this until v0.3.0, but should we just move this up to v0.2.0 and get it merged in?

We'll need regressions in a separate PR?

Peilin-Yang · 2018-08-12T14:18:04Z

Actually, we do not need to move this to 2 in order to get it merged....
We just merge it and it will be marked as done in the project page.

lintool · 2018-08-12T14:25:55Z

Sure, either way is fine with me.
Fix conflicts and I'll do a CR now?

lintool · 2018-08-12T14:26:52Z

src/main/java/io/anserini/search/SearchArgs.java

+  public boolean sdm = false;
+
+  @Option(name = "-sdm.tw", metaVar = "[value]", usage = "term weight in sdm")
+  public float sdm_tw = 0.85f;


"term weight in sdm" -> "SRM term weight"?
And below.

lintool · 2018-08-12T14:27:13Z

src/main/java/io/anserini/search/SearchCollection.java

+    BagOfTerms,
+    SequentialDependenceModel
+  }
+  private final QueryConstructor qc;


add empty line?

lintool · 2018-08-12T14:28:08Z

src/test/java/io/anserini/search/SdmQueryTest.java

+    Query termQuery = new BagOfWordsQueryGenerator().buildQuery(field, analyzer, sdmQueryStr);
+    TopDocs rsTerm = searcher.search(termQuery, 1);
+    assertEquals(rs1.scoreDocs[0].score, rsTerm.scoreDocs[0].score, 1e-6f);
+


kill extra empty line?

lintool · 2018-08-12T14:28:18Z

src/test/java/io/anserini/search/SdmQueryTest.java

+    q = new SpanNearQuery(new SpanQuery[]{t2, t1}, 16, false);
+    rs = searcher.search(q, 1);
+    assertEquals(rs.scoreDocs.length, 1);
+


kill extra empty line?

lintool

Minor comments, otherwise lgtm.

lintool · 2018-08-12T14:30:41Z

@Peilin-Yang you're changing the BoW code path, right? So please run all regressions before merging?

* rename dense index * underscores to dashes

Peilin-Yang added 6 commits July 20, 2018 10:54

add Sequential Dependence Model query builder

92f3e81

small fixes

7ca123f

tmp

3a1f99b

Merge branch 'master' into sdm

43e156a

update unit tests

b29d38f

remove unnecessary comments

231c07e

Merge branch 'master' into sdm

0a69229

Peilin-Yang and others added 4 commits July 23, 2018 22:13

refactor: put query constructors (currently BagOfTermsQuery and TermD…

49a1c59

…ependencyQuery) into search package

Merge branch 'master' into sdm

cda8a84

remove unnecessary function toString

3513bbd

Merge branch 'master' into sdm

ee17955

Peilin-Yang requested a review from lintool July 24, 2018 15:11

Peilin-Yang added this to To do in v0.3.0 Jul 24, 2018

Peilin-Yang moved this from To do to In progress in v0.3.0 Jul 24, 2018

lintool reviewed Jul 24, 2018

View reviewed changes

Peilin-Yang added 3 commits July 24, 2018 18:00

Merge branch 'master' into sdm

e3facdc

rename QueryBase to QueryGenerator; revert AnalyzerUtils

549a644

Merge branch 'sdm' of https://github.com/castorini/Anserini into sdm

a58a982

lintool reviewed Jul 25, 2018

View reviewed changes

rename query generator classes

6252abf

lintool reviewed Aug 12, 2018

View reviewed changes

lintool approved these changes Aug 12, 2018

View reviewed changes

Peilin-Yang added 3 commits August 12, 2018 07:51

merge master

f72c877

minor tweak

14a1da1

Merge branch 'master' into sdm

6550c53

Peilin-Yang merged commit 84d077d into master Aug 13, 2018

v0.3.0 automation moved this from In progress to Done Aug 13, 2018

Peilin-Yang deleted the sdm branch August 13, 2018 17:24

Peilin-Yang mentioned this pull request Aug 14, 2018

Implement term dependence model #5

Closed

lintool removed this from Done in v0.3.0 Sep 10, 2018

crystina-z pushed a commit to crystina-z/anserini that referenced this pull request Oct 28, 2022

Rename dense index (castorini#359)

9936cfc

* rename dense index * underscores to dashes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Sequential Dependency Model query constructor #359

Implement Sequential Dependency Model query constructor #359

Peilin-Yang commented Jul 22, 2018

lintool commented Jul 22, 2018

Peilin-Yang commented Jul 22, 2018

Peilin-Yang commented Jul 23, 2018

lintool commented Jul 23, 2018

lintool Jul 24, 2018

lintool commented Jul 24, 2018

lintool Jul 25, 2018

lintool Jul 25, 2018

lintool commented Aug 12, 2018

Peilin-Yang commented Aug 12, 2018

lintool commented Aug 12, 2018

lintool Aug 12, 2018

Peilin-Yang Aug 12, 2018

lintool Aug 12, 2018

lintool Aug 12, 2018

lintool Aug 12, 2018

lintool left a comment

lintool commented Aug 12, 2018

Implement Sequential Dependency Model query constructor #359

Implement Sequential Dependency Model query constructor #359

Conversation

Peilin-Yang commented Jul 22, 2018

lintool commented Jul 22, 2018

Peilin-Yang commented Jul 22, 2018

Peilin-Yang commented Jul 23, 2018

lintool commented Jul 23, 2018

Choose a reason for hiding this comment

lintool commented Jul 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lintool commented Aug 12, 2018

Peilin-Yang commented Aug 12, 2018

lintool commented Aug 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lintool left a comment

Choose a reason for hiding this comment

lintool commented Aug 12, 2018