-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Sequential Dependency Model query constructor #359
Conversation
Would it make sense to have a generic |
You mean subclass |
After thinking about it and looking at the I think we could make a separated |
sg |
|
||
import java.io.IOException; | ||
import java.io.StringReader; | ||
import java.util.ArrayList; | ||
import java.util.List; | ||
|
||
public class AnalyzerUtils { | ||
public abstract class QueryBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still like to keep AnalyzerUtil
for the tokenize
method - which is used beyond just query formulation.
How about |
/* | ||
* Bag of Terms query builder | ||
*/ | ||
public class BagOfTermsQueryGenerator extends QueryGenerator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename it BagOfWords
since that's commonly known?
/* Build the Term Dependency query. See: | ||
* D. Metzler and W. B. Croft. A markov random field model for term dependencies. In SIGIR ’05. | ||
*/ | ||
public class TermDependencyQueryGenerator extends QueryGenerator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"TermDependency" is vague. How about SdmQueryGenerator to make consistent with test case below?
@Peilin-Yang I know we were going to defer this until v0.3.0, but should we just move this up to v0.2.0 and get it merged in? We'll need regressions in a separate PR? |
Actually, we do not need to move this to 2 in order to get it merged.... |
Sure, either way is fine with me. |
public boolean sdm = false; | ||
|
||
@Option(name = "-sdm.tw", metaVar = "[value]", usage = "term weight in sdm") | ||
public float sdm_tw = 0.85f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"term weight in sdm" -> "SRM term weight"?
And below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SDM?
BagOfTerms, | ||
SequentialDependenceModel | ||
} | ||
private final QueryConstructor qc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add empty line?
Query termQuery = new BagOfWordsQueryGenerator().buildQuery(field, analyzer, sdmQueryStr); | ||
TopDocs rsTerm = searcher.search(termQuery, 1); | ||
assertEquals(rs1.scoreDocs[0].score, rsTerm.scoreDocs[0].score, 1e-6f); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kill extra empty line?
q = new SpanNearQuery(new SpanQuery[]{t2, t1}, 16, false); | ||
rs = searcher.search(q, 1); | ||
assertEquals(rs.scoreDocs.length, 1); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kill extra empty line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, otherwise lgtm.
@Peilin-Yang you're changing the BoW code path, right? So please run all regressions before merging? |
* rename dense index * underscores to dashes
It turns out that we can combine different query semantics from Lucene to construct SDM query.
For example, for ordered window and unordered window query, we can use SpanQuery.
We then use
BooleanQuery
together withBoostQuery
to construct the complete query.Please see the unit test for examples.
Test on Gov2 701-750: MAP - 0.2922 (0.2832 was reported in the original paper)