# Assignment 2A, Part 1: Evaluation

You are given two sample files, `data/sample_ranking.csv` and `data/sample_qrels.csv`, to test your solution.

This notebook is to be used for evaluating the rankings generated in [Part 2](2_Retrieval.ipynb) and [Part 3](3_Multifield_retrieval.ipynb).

In [1]:
RANKING_FILE = "data/sample_ranking.csv"  # file with the document rankings
QRELS_FILE = "data/sample_qrels.csv"  # file with the relevance judgments (ground truth)

**TODO**: Complete the function that calculates evaluation metrics for a given a ranking (`ranking`) against the ground truth (`gt`). It should return the results as a dictionary, where the key is the retrieval metric.

(Hint: see [Exercises #1 and #2 from Lecture 8](https://github.com/kbalog/uis-dat640-fall2019/tree/master/exercises/lecture_08).)

In [2]:
def eval_query(ranking, gt):
    """Calculates the ranking against the ground truth for a given query."""
    p5, p10, ap, rr, num_rel = 0, 0, 0, 0, 0

    for i, doc_id in enumerate(ranking):
        if doc_id in gt:
            num_rel += 1  
            pi = num_rel / (i + 1)
            ap += pi  # AP
            
            if i < 10:
                p10 += 1
    
            if rr == 0:  # Reciprocal rank
                rr = 1 / (i + 1)
                
    p10 /= 10
    ap /= len(gt)
    
    return {"P10": p10, "AP": ap, "RR": rr}

**TODO**: Complete the function that evaluates an output file, which contains rankings for a set of queries. It is almost complete, you just need to add the computation of mean scores (over the entire query set).

In [3]:
def eval(gt_file, output_file):
    """Prints evaluation scores for each query as well as the means over the query set."""
    # load data from ground truth file
    gt = {}  # holds a list of relevant documents for each queryID
    with open(gt_file, "r") as fin:
        header = fin.readline().strip()
        if header != "queryID,docIDs":
            raise Exception("Incorrect file format!")
        for line in fin.readlines():
            qid, docids = line.strip().split(",")
            gt[qid] = docids.split()
            
    # load data from output file
    output = {}
    with open(output_file, "r") as fin:
        header = fin.readline().strip()
        if header != "QueryId,DocumentId":
            raise Exception("Incorrect file format!")
        for line in fin.readlines():
            qid, docid = line.strip().split(",")
            if qid not in output:
                output[qid] = []
            output[qid].append(docid)
    
    # evaluate each query that is in the ground truth
    print("  QID  P@10   (M)AP  (M)RR")
    sum_p10, sum_ap, sum_rr = 0, 0, 0
    for qid in sorted(gt.keys()):
        res = eval_query(output.get(qid, []), gt.get(qid, []))
        print("%5s %6.3f %6.3f %6.3f" % (qid, res["P10"], res["AP"], res["RR"]))
        sum_p10 += res["P10"]
        sum_ap += res["AP"]
        sum_rr += res["RR"]
    
    # TODO compute averages over the entire query set
    
    # print averages
    print("%5s %6.3f %6.3f %6.3f" % ("ALL", round(sum_p10 / len(gt.keys()), 3), round(sum_ap / len(gt.keys()), 3), round(sum_rr / len(gt.keys()), 3)))

### Main

In [4]:
eval(QRELS_FILE, RANKING_FILE)

  QID  P@10   (M)AP  (M)RR
   Q1  0.200  0.467  1.000
   Q2  0.500  0.925  1.000
   Q3  0.100  0.500  0.500
  ALL  0.267  0.631  0.833


In [5]:
eval("data/qrels2.csv", "bm25_singlefield.csv")

  QID  P@10   (M)AP  (M)RR
  303  0.600  0.313  1.000
  307  0.000  0.003  0.031
  310  0.100  0.028  0.250
  314  0.000  0.000  0.000
  322  0.100  0.006  0.200
  325  0.000  0.000  0.000
  330  0.000  0.000  0.000
  336  0.200  0.181  0.500
  341  0.000  0.005  0.021
  344  0.000  0.000  0.000
  347  0.700  0.137  1.000
  353  0.000  0.000  0.000
  354  0.200  0.021  0.500
  362  0.200  0.101  0.143
  363  0.400  0.046  0.333
  367  0.100  0.010  0.200
  372  0.000  0.000  0.000
  374  0.100  0.172  0.143
  383  0.100  0.016  0.125
  389  0.000  0.000  0.000
  393  0.300  0.062  0.200
  399  0.600  0.049  1.000
  401  0.100  0.007  0.500
  404  0.300  0.092  0.500
  408  0.100  0.005  0.200
  409  0.100  0.007  0.143
  416  0.000  0.005  0.026
  419  0.100  0.086  1.000
  426  0.600  0.117  0.333
  427  0.800  0.552  1.000
  433  0.200  0.091  0.500
  435  0.000  0.009  0.048
  436  0.100  0.006  0.100
  439  0.000  0.001  0.053
  443  0.000  0.012  0.040
  448  0.000  0.002  0.062
 

In [6]:
eval("data/qrels2.csv", "lm_singlefield.csv")

  QID  P@10   (M)AP  (M)RR
  303  0.000  0.000  0.000
  307  0.000  0.000  0.000
  310  0.000  0.000  0.000
  314  0.000  0.000  0.000
  322  0.100  0.005  0.143
  325  0.000  0.000  0.000
  330  0.000  0.000  0.000
  336  0.100  0.071  0.500
  341  0.000  0.000  0.000
  344  0.000  0.000  0.000
  347  0.100  0.021  0.200
  353  0.000  0.000  0.000
  354  0.300  0.006  0.333
  362  0.000  0.000  0.000
  363  0.000  0.000  0.000
  367  0.100  0.011  0.200
  372  0.000  0.011  0.023
  374  0.000  0.000  0.000
  383  0.000  0.014  0.034
  389  0.000  0.000  0.000
  393  0.000  0.000  0.000
  399  0.100  0.001  0.111
  401  0.100  0.002  0.111
  404  0.100  0.048  0.143
  408  0.000  0.001  0.017
  409  0.000  0.006  0.050
  416  0.000  0.000  0.000
  419  0.000  0.000  0.000
  426  0.100  0.074  0.125
  427  0.000  0.000  0.000
  433  0.000  0.000  0.000
  435  0.000  0.010  0.056
  436  0.000  0.006  0.038
  439  0.000  0.000  0.000
  443  0.000  0.000  0.016
  448  0.100  0.003  0.143
 

In [7]:
eval("data/qrels2.csv", "bm25f_multifield.csv")

  QID  P@10   (M)AP  (M)RR
  303  0.600  0.291  1.000
  307  0.000  0.008  0.029
  310  0.100  0.014  0.125
  314  0.000  0.000  0.000
  322  0.000  0.007  0.034
  325  0.000  0.008  0.015
  330  0.000  0.000  0.000
  336  0.200  0.156  0.333
  341  0.000  0.011  0.024
  344  0.000  0.000  0.000
  347  0.500  0.065  1.000
  353  0.000  0.000  0.000
  354  0.400  0.015  0.500
  362  0.800  0.225  0.500
  363  0.500  0.053  1.000
  367  0.100  0.014  0.125
  372  0.000  0.000  0.000
  374  0.600  0.286  0.333
  383  0.200  0.036  0.250
  389  0.000  0.000  0.000
  393  0.200  0.061  0.167
  399  0.500  0.055  0.500
  401  0.100  0.014  1.000
  404  0.100  0.029  0.500
  408  0.100  0.005  0.200
  409  0.100  0.011  0.333
  416  0.100  0.019  0.111
  419  0.000  0.012  0.077
  426  0.100  0.090  0.125
  427  0.800  0.594  1.000
  433  0.300  0.196  1.000
  435  0.000  0.030  0.059
  436  0.000  0.014  0.091
  439  0.000  0.002  0.067
  443  0.000  0.016  0.091
  448  0.000  0.009  0.083
 

In [8]:
eval("data/qrels2.csv", "mlm_multifield.csv")

  QID  P@10   (M)AP  (M)RR
  303  0.000  0.000  0.000
  307  0.000  0.000  0.000
  310  0.100  0.019  0.167
  314  0.000  0.000  0.000
  322  0.000  0.005  0.067
  325  0.000  0.002  0.012
  330  0.000  0.000  0.000
  336  0.200  0.041  0.167
  341  0.000  0.000  0.000
  344  0.000  0.000  0.000
  347  0.500  0.096  0.250
  353  0.000  0.000  0.000
  354  0.200  0.007  1.000
  362  0.300  0.124  1.000
  363  0.300  0.033  1.000
  367  0.100  0.016  0.167
  372  0.100  0.167  0.333
  374  0.300  0.126  0.250
  383  0.200  0.024  0.167
  389  0.000  0.000  0.000
  393  0.000  0.048  0.071
  399  0.700  0.062  1.000
  401  0.100  0.002  0.167
  404  0.100  0.039  0.143
  408  0.000  0.001  0.037
  409  0.100  0.005  0.167
  416  0.000  0.022  0.071
  419  0.000  0.001  0.012
  426  0.100  0.141  0.111
  427  0.900  0.604  1.000
  433  0.100  0.016  0.250
  435  0.000  0.014  0.083
  436  0.000  0.016  0.038
  439  0.000  0.002  0.071
  443  0.100  0.007  0.100
  448  0.000  0.001  0.040
 