Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
34 lines (26 sloc) 2.31 KB

Axiomatic Reranking

How to use

To use it with BM25:

Anserini/target/appassembler/bin/SearchCollection -topicreader Trec -index /path/to/index/ -hits 1000 -topics Anserini/src/main/resources/topics-and-qrels/topics.51-100.txt -bm25 -axiom -axiom.beta 0.4 -output run_axiom_beta_0.4.txt

To use it with Dirichlet Language Model:

Anserini/target/appassembler/bin/SearchCollection -topicreader Trec -index /path/to/index/ -hits 1000 -topics Anserini/src/main/resources/topics-and-qrels/topics.51-100.txt -ql -axiom -axiom.beta 0.4 -output run_axiom_beta_0.4.txt

Algorithm

  1. Rank the documents and pick the top M documents as the reranking documents pool RP
  2. Randomly select (R-1)*M documents from the index and add them to RP so that we have R*M documents in the reranking pool
  3. Build the inverted term-docs list RTL for RP
  4. For each term in RTL, calculate its reranking score as the mutual information between query terms and itself: s(q,t)=I(X_q, X_t|RP)=SUM(p(X_q,X_t|W)*log(p(X_q,X_t|W)/p(X_q|W)/p(X_t|W))) where X_q and X_t are two binary random variables that denote the presence/absence of query term q and term t in the document.
  5. The final reranking score of each term t in RTL is calculated by summing up its scores for all query terms: s(t) = SUM(s(q,t))
  6. Pick top K terms from RTL based on their reranking scores with their weights s(t)
  7. Rerank the documents by using the K reranking terms with their weights. In Lucene, it is something like (term1^0.2 term2^0.01 ...)

Notes

Axiomatic Reranking algorithm is a non-deterministic algorithm since it randomly pick (R-1)*M documents as part of the reranking pool (see algorithm above for details). Here we just list the performance references for major TREC collections. The ranking model we used is BM25 and the parameter beta is set as 0.4 for all collections although this is definitely not the optimal value for individual collection. We report MAP for all collections except ClueWeb collections where ndcg@20 is reported.

Please refer to the paper [Yang et al, 2013] Yang, P., and Fang, H. (2013). Evaluating the Effectiveness of Axiomatic Approaches in Web Track. In TREC 2013. for more details.

You can’t perform that action at this time.