In this notebook we explore the `fuse` and `optimize_fusion` methods and the `OptimizationReport` class offered by `ranx`.

First of all we need to install [ranx](https://github.com/AmenRa/ranx)

Mind that the first time you run any ranx' functions they may take a while as they must be compiled first

In [None]:
!pip install -U ranx

Download the data we need

In [None]:
import os
import requests

for file in ["qrels", "run_4", "run_5"]:
    os.makedirs("notebooks/data", exist_ok=True)

    with open(f"notebooks/data/{file}.trec", "w") as f:
        master = f"https://raw.githubusercontent.com/AmenRa/ranx/master/notebooks/data/{file}.trec"
        f.write(requests.get(master).text)

Load data

In [3]:
from ranx import Qrels, Run

# Let's load qrels and runs from files
qrels = Qrels.from_file("notebooks/data/qrels.trec", kind="trec")

run_4 = Run.from_file("notebooks/data/run_4.trec", kind="trec")
run_4.name = "System A"
run_5 = Run.from_file("notebooks/data/run_5.trec", kind="trec")
run_5.name = "System B"

## Fuse

Here are reported all the fusion algorithms provided by `ranx`, along with their aliases.  
The **Optim.** column indicates whether an algorithm require an optimization phase.

| **Algorithm**                                              | **Alias** | **Optim.** | **Algorithm**                            | **Alias**   | **Optim.** |
| ---------------------------------------------------------- | --------- | :--------: | ---------------------------------------- | ----------- | :--------: |
| CombMIN                                         | min       |     No     | CombMAX                       | max         |     No     |
| CombMED                                         | med       |     No     | CombSUM                       | sum         |     No     |
| CombANZ                                         | anz       |     No     | CombMNZ                       | mnz         |     No     |
| CombGMNZ                                       | gmnz      |     No     | ISR                               | isr         |     No     |
| Log_ISR                                         | log_isr   |     No     | LogN_ISR                     | logn_isr    |    Yes     |
| Reciprocal Rank Fusion (RRF) | rrf       |    Yes     | PosFuse                       | posfuse     |    Yes     |
| ProbFuse                                       | probfuse  |    Yes     | SegFuse                       | segfuse     |    Yes     |
| SlideFuse                                     | slidefuse |    Yes     | MAPFuse                       | mapfuse     |    Yes     |
| BordaFuse                                     | bordafuse |     No     | Weighted BordaFuse | w_bordafuse |    Yes     |
| Condorcet                                     | condorcet |     No     | Weighted Condorcet | w_condorcet |    Yes     |
| BayesFuse                                     | bayesfuse |    Yes     | Mixed                           | mixed       |    Yes     |
| WMNZ                                               | wmnz      |    Yes     | Wighted Sum               | wsum        |    Yes     |
| Rank-Biased Centroids (RBC)   | rbc       |      yes      |                                          |             |

Let's try some _unsupervised_ fusion algorithms!

In [7]:
from ranx import fuse, evaluate

print(run_4.name, evaluate(qrels, run_4, "ndcg@100"))
print(run_5.name, evaluate(qrels, run_5, "ndcg@100"))

for method in [
    "min",  # Alias for CombMIN
    "max",  # Alias for CombMAX
    "med",  # Alias for CombMED
    "sum",  # Alias for CombSUM
    "anz",  # Alias for CombANZ
    "mnz",  # Alias for CombMNZ
]:

    combined_run = fuse(
        runs=[run_4, run_5],
        norm="min-max",  # Default normalization strategy
        method=method,
    )


    print(combined_run.name, evaluate(qrels, combined_run, "ndcg@100"))

System A 0.45236291280341645
System B 0.501471970057649
comb_min 0.4484841464559678
comb_max 0.5279870104050673
comb_med 0.4835126884606188
comb_sum 0.5434398431491004
comb_anz 0.4835126884606188
comb_mnz 0.5326546303230408


## Normalization

Let's try out other normalization strategies!

In [8]:
from ranx import fuse, evaluate

print(run_4.name, evaluate(qrels, run_4, "ndcg@100"))
print(run_5.name, evaluate(qrels, run_5, "ndcg@100"))

for norm in ["min-max", "max", "sum", "zmuv", "rank", "borda"]:
    combined_run = fuse(
        runs=[run_4, run_5],
        norm=norm,
        method="sum",  # Alias for CombSUM
    )

    print(norm, evaluate(qrels, combined_run, "ndcg@100"))

System A 0.45236291280341645
System B 0.501471970057649
min-max 0.5434398431491004
max 0.516739634212384
sum 0.5520741847476792
zmuv 0.5436270286338574
rank 0.5177678013739043
borda 0.5150156642597997


## Optimize Fusion

Let's try some fusion algorithm that requires optimization!

WARNING: here we use the same runs for optimizing the algorithms and to get the final combination.  
However, in a real-world scenario you should use non-test data for the optimization phase.  
For example, rels and runs for the dev set or few hundreads/thousands of train samples.

In [9]:
from ranx import fuse, evaluate, optimize_fusion

print(run_4.name, evaluate(qrels, run_4, "ndcg@100"))
print(run_5.name, evaluate(qrels, run_5, "ndcg@100"))

# Optimize a given fusion method
best_params = optimize_fusion(
    qrels=qrels,
    runs=[run_4, run_5],
    norm="min-max",  # Default value
    method="wsum",  # Alias for Weighted Sum
    metric="ndcg@100",  # Metric we want to maximize
)

combined_run = fuse(
    runs=[run_4, run_5],
    norm="min-max",  # Default value
    method="wsum",  # Alias for Weighted Sum
    params=best_params,
)

print(combined_run.name, evaluate(qrels, combined_run, "ndcg@100"))

weighted_sum 0.555987399857977


The hyper-parameter search space can be altered as in the next cell.  
Please, refer to the official documentation for a complete list of the  
search space parameters of each algorithm.

In [10]:
from ranx import fuse, evaluate, optimize_fusion

print(run_4.name, evaluate(qrels, run_4, "ndcg@100"))
print(run_5.name, evaluate(qrels, run_5, "ndcg@100"))

# Optimize a given fusion method
best_params = optimize_fusion(
    qrels=qrels,
    runs=[run_4, run_5],
    norm="min-max",  # Default value
    method="wsum",  # Alias for Weighted Sum
    metric="ndcg@100",  # Metric we want to maximize
    step=0.01,
)

combined_run = fuse(
    runs=[run_4, run_5],
    norm="min-max",  # Default value
    method="wsum",  # Alias for Weighted Sum
    params=best_params,
)

print(combined_run.name, evaluate(qrels, combined_run, "ndcg@100"))

weighted_sum 0.5565951483669139


The `optimize_fusion` method can also return a report of all the evaluated configurations.

In [11]:
from ranx import fuse, evaluate, optimize_fusion

best_params, optimization_report = optimize_fusion(
    qrels=qrels,
    runs=[run_4, run_5],
    norm="min-max",
    method="wsum",
    metric="ndcg@100",
    return_optimization_report=True,
)

# The optimization results are saved in a OptimizationReport instance,
# which provides handy functionalities such as tabular formatting
optimization_report.to_table()

In [12]:
# You can change the number of shown digits as follows
optimization_report.rounding_digits = 4
optimization_report.to_table()

In [13]:
# You can show percentages insted of digits
# Note that the number of shown digits is based on
# the `rounding_digits` attribute, try changing it
optimization_report.rounding_digits = 3
optimization_report.show_percentages = True
optimization_report.to_table()

In [None]:
# `rounding_digits` and `show_percentages` can be passed directly when
# calling `optimize_fusion`
best_params, optimization_report = optimize_fusion(
    qrels=qrels,
    runs=[run_4, run_5],
    norm="min-max",
    method="wsum",
    metric="ndcg@100",
    return_optimization_report=True,
    rounding_digits=4,
    show_percentages=True,
)

optimization_report.to_table()