# INFS7410 Week 4 Practical 

##### version 1.0

###### The INFS7410 Teaching Team

##### Tutorial Etiquette
Please refrain from loud noises, irrelevant conversations and use of mobile phones during tutorial activities. Be respectful of everyone's opinions and ideas during the tutorial activities. You will be asked to leave if you disturb. Remember the tutor is there to help you understand and learn, not to provide debugging of your code or solutions to assignments. 



## Exercise 1: 

Below, you have been provided some sample Python code with partial implementations for two rank fusion algorithms -- Borda and CombSUM. Your task in this exercise is to complete these partially implemented algorithms, then implement from scratch another algorithm called CombMNZ, and finally perform an analysis of the results of these algorithms.

Rank fusion (also called ranking fusion or runs fusion) algorithms aim to leverage two or more document rankings returned in answer to a query to produce a new ranking that is formed by combining the rankings provided as input. This fusion can be interpreted and understood using a variety of metaphors. For example, you can think that each individual ranking, or run, expresses a vote towards the documents it contains -- and these votes could be weighted by the rank positions. Then, fusing together $k$ runs could be interpreted as an election process, where the rank fusion algorithm is talling together the votes for the different documents expressed by the individual rankings. This is for example the metaphor used by Borda; the open question is: how do we count the votes, and consider the weights with respect to the rank position? This tutorial will provide you with an intuition for some of the rank fusion approaches; we will see more details as usual in the lecture.

You can find a good description of rank fusion algorithms in: Benham & Culpepper, _Risk-reward trade-offs in rank fusion_, ADCS'17. (note, our notation slightly differs from theirs to make things clearer).

In [3]:
from sklearn.preprocessing import minmax_scale
from collections import defaultdict
import pytrec_eval
import numpy as np
from pprint import pprint

def print_results(run, qrel_file='qrel.txt', measures=["map", "ndcg_cut_10", "recall_1000"]):
    # Open the qrels file.
    with open(qrel_file, "r") as f:
        msmarco_qrels = pytrec_eval.parse_qrel(f)

    evaluator = pytrec_eval.RelevanceEvaluator(query_relevance=msmarco_qrels, measures=measures)
    results = evaluator.evaluate(run)
    for measure in sorted(measures):
        print('{:25s}{:8s}{:.4f}'.format(measure, 'all', pytrec_eval.compute_aggregated_measure(measure,
                                  [query_measures[measure]for query_measures in results.values()])))  
        
def normalise_run(run):
    for k, v in run.items():
        #print(k,v)
        r = [(docid, score) for docid, score in v.items()]
        scores = minmax_scale([x[1] for x in r])
        run[k] = dict(zip([x[0] for x in r], scores))
    return run

In this practical, you will use the run files you created in the previous week, and apply rank fusion to them. This means that you need to go back and complete last weeks prac if you have not yet done so.

In [4]:
with open("week3-tf.run", "r") as f:
    week3_tf_run = normalise_run(pytrec_eval.parse_run(f))
    
with open("week3-idf.run", "r") as f:
    week3_idf_run = normalise_run(pytrec_eval.parse_run(f))
    
with open("week3-tfidf.run", "r") as f:
    week3_tfidf_run = normalise_run(pytrec_eval.parse_run(f))

with open("week3-bm25.run", "r") as f:
    week3_bm25_run = normalise_run(pytrec_eval.parse_run(f))
    
runs = [week3_tf_run, week3_idf_run, week3_tfidf_run, week3_bm25_run]

### Borda

Borda count is a voting algorithm that sums the difference in rank position from the total number of document candidates in each list, or ranking. It can be expressed formally as:
$$
\frac{n-r(d)+1}{n}
$$
Where $n$ is the total number of documents in a rank list and $r(d)$ is the rank position of a document.


In [5]:
def borda(runs):
    seen = {}
    for run in runs:
        for topic, results in run.items():
            # print(topic, results)
            if topic not in seen:
                seen[topic] = {}
            # print(results.keys())
            for i, docid in enumerate(results.keys()):
                # print(i, docid)
                n = len(results)
                rd = i
                score = (n-i+1)/n
                if docid not in seen[topic]:
                    seen[topic][docid] = score
                else:
                    seen[topic][docid] += score
    return seen

fused_borda_run = borda(runs)
print_results(fused_borda_run)

map                      all     0.2697
ndcg_cut_10              all     0.4370
recall_1000              all     0.6881


### CombSUM

CombSUM adds the retrieval scores of documents contained in more than one list and rearranges the order. It can be expressed formally as:
$$
\sum_{d\in D}s(d)
$$
Where $D$ is the set of documents (produced by the union of the runs) and $s(d)$ is the score of a single document.

In [6]:
def combsum(runs):
    seen = {}
    for run in runs:
        for topic, results in run.items():
            if topic not in seen:
                seen[topic] = {}
            for docid, score in results.items():
                #TODO fill in the gap.
                if docid not in seen[topic]:
                    seen[topic][docid] = score
                else:
                    seen[topic][docid] += score
                
                
    return seen

fused_combsum_run = combsum(runs)
print_results(fused_combsum_run)

map                      all     0.3237
ndcg_cut_10              all     0.4457
recall_1000              all     0.6881


### CombMNZ

CombMNZ adds the retrieval scores of documents contained in more than one list and rearranges the order, and multiplies their sum by the number of lists where the document occurs. It can be expressed formally as:
$$
|d\in R > 0|\cdot\sum_{d\in D}s(d)
$$
Where $D$ is the set of documents and $s(d)$ is the score of a single document, and $|d\in R > 0|$ is the number of runs $R$ in which $d$ occurs with a score greater than zero.

In [7]:
def combmnz(runs):
    # TODO your implementation here.
    seen = {}
    num = {}
    for run in runs:
        count = []
        for topic, results in run.items():
            if topic not in seen:
                seen[topic] = {}
            for docid, score in results.items():
                if docid not in count:
                    count.append(docid)
                if docid not in seen[topic]:
                    seen[topic][docid] = score
                else:
                    seen[topic][docid] += score
        for docid in count:
            if docid not in num:
                num[docid] = 1
            else:
                num[docid] += 1
    
    # print(num)
    
    for topic in seen:
        for docid in seen[topic]:
            seen[topic][docid] *= num[docid]
    
    return seen
        

fused_combmnz_run = combmnz(runs)
print_results(fused_combmnz_run)

map                      all     0.3248
ndcg_cut_10              all     0.4564
recall_1000              all     0.6881


### Analysis

Find the most effective combination of runs to fuse by trying all combinations, evaluating using MAP and nDCG, and performing statistical significance tests. Complete the table and questions below with the most effective combination only.


|         | MAP  | nDCG |
| ------- | ---- | ---- |
| Borda   |0.2697|0.4370|
| CombSUM |0.3237|0.4457|
| CombMNZ |0.3248|0.4564|

_What is the most effective combination of runs to use, and which fusion method lead to this result?_ 

### Questions

Using all of the data you have collected about different combinations of runs with different fusion methods, complete the following questions.

_What happens when you remove the normalisation of runs? Which fusion method(s) does this change? Why?_


_Can you think of any other ways to fuse runs together? Use a publication database (e.g., Google Scholar) to search for research about "rank fusion". Summarise one (or more) other method(s) in the space provided below._