[Feature Request] custom fusion method in optimize_fusion #32

PaulLerner · 2022-11-28T10:13:58Z

Is your feature request related to a problem? Please describe.
Hi, you’ve done a great job implementing plenty of different fusion algorithms, but I think it will always be a bottleneck.
What would you think about letting the user define their own training function?

Describe the solution you'd like
For example, in optimize_fusion, allow method to be a callable and in this case, do not call has_hyperparams and optimization_switch.

Describe alternatives you've considered

Open a feature request every time I want to try out something new :)
Fork ranx and implement new fusion methods there

My use case/ Ma et al.
By the way, at the moment, my use case is to use the default-minimum trick of Ma et al.: when combining results from systems A and B, it consists in giving the minimum score of A's results if a given document was only retrieved by system B, and vice-versa.

Maybe this is already possible in ranx via some option/method named differently? Or maybe you’d like to add it in the core ranx fusion algorithms?

The text was updated successfully, but these errors were encountered:

AmenRa · 2022-11-28T12:15:31Z

Hi Paul,

Are you referring to this passage?
Finally, there are a few more details of exactly how to combine BM25 and DPR scores worth exploring. As a baseline, we tried using the raw scores directly in the linear combination (exactly as above). However, we noticed that the range of scores from DPR and BM25 can be quite different. To potentially address this issue, we tried the following normalization technique: If a document from sparse retrieval is not in the dense retrieval results, we assign to it the the minimum dense retrieval score among the retrieved documents as its dense retrieval score, and vice versa for the sparse retrieval score.

If so, as the authors say, this is a normalization technique, not a fusion method.
You can easily implement it and run it before passing the runs to ranx.
Also, you can bypass the normalization step of fuse and optimize_fusion by passing norm=None.

I did not have the time to read the entire paper, but this "normalization" method is not very sound to me.
Normalization should make relevance scores computed by different systems comparable, and this is not the case.
There is any comparison between their method and simple min-max normalization in the paper?

PaulLerner · 2022-11-28T13:17:12Z

Are you referring to this passage?

Yes

If so, as the authors say, this is a normalization technique, not a fusion method. You can easily implement it and run it before passing the runs to ranx.

Ok, then I’ll close this issue for now, thanks for the tips!

I did not have the time to read the entire paper, but this "normalization" method is not very sound to me. Normalization should make relevance scores computed by different systems comparable, and this is not the case.

I think both techniques can be combined. The default-minimum technique helps because of the top-K cutoff.

There is any comparison between their method and simple min-max normalization in the paper?

No

AmenRa · 2022-11-28T13:24:51Z

Would you mind computing the results for wsum with that normalisation method vs min-max and max norms and post them here? Thanks.

PaulLerner · 2022-11-28T13:29:00Z

What do you mean? Reproducing the experiments of Ma et al. ? Or on some dummy runs?

PaulLerner · 2022-11-28T13:45:53Z

By the way, at the moment, my use case is to use the default-minimum trick of Ma et al.: when combining results from systems A and B, it consists in giving the minimum score of A's results if a given document was only retrieved by system B, and vice-versa.

I did not have the time to read the entire paper, but this "normalization" method is not very sound to me. Normalization should make relevance scores computed by different systems comparable, and this is not the case.

I think both techniques can be combined. The default-minimum technique helps because of the top-K cutoff.

P.S. Especially if you use ZMUV normalization, a document absent from A’s results would effectively have a score of 0, so an average score instead of a bad score

AmenRa · 2022-11-28T13:53:04Z

Sorry, I assumed you were reproducing the results from that paper...
If you try that normalization method on whatever non-dummy runs, could please check if it allows reaching better results than the normalization methods implemented in ranx?

PaulLerner · 2022-11-28T14:01:54Z

No problem :)
I am not reproducing their results but I used their technique in https://github.com/PaulLerner/ViQuAE

I’ll let you know once I’ll compare with other methods

PaulLerner · 2022-11-28T14:56:09Z

So, for what it’s worth, combined with a global ZMUV normalization (mean and std computed over the whole dataset instead of being query-dependent) and wsum fusion, the default-minimum technique helps to fuse DPR and CLIP (as described in https://hal.archives-ouvertes.fr/hal-03650618), on ViQuAE’s dev set:
With:

Weights	MRR@100
(0.0, 1.0)	0.322
(0.1, 0.9)	0.323
(0.2, 0.8)	0.327
(0.3, 0.7)	0.335
(0.4, 0.6)	0.340
(0.5, 0.5)	0.342
(0.6, 0.4)	0.333
(0.7, 0.3)	0.293
(0.8, 0.2)	0.242
(0.9, 0.1)	0.168
(1.0, 0.0)	0.127

Without:

Weights	MRR@100
(0.0, 1.0)	0.295
(0.1, 0.9)	0.313
(0.2, 0.8)	0.316
(0.3, 0.7)	0.315
(0.4, 0.6)	0.310
(0.5, 0.5)	0.299
(0.6, 0.4)	0.276
(0.7, 0.3)	0.259
(0.8, 0.2)	0.238
(0.9, 0.1)	0.215
(1.0, 0.0)	0.165

I implemented it in pure python but I guess you would like it in numba:

def default_minimum(runs):
    # union results
    all_documents = {}
    for run in runs:
        for q_id, results in run.run.items():
            all_documents.setdefault(q_id, set())
            all_documents[q_id] |= results.keys()
            
    # set default-minimum in runs
    for run in runs:
        for q_id, results in run.run.items():
            minimum = min(results.values())
            for d_id in all_documents[q_id]:
                results.setdefault(d_id, minimum)
    
    return runs

AmenRa · 2022-11-28T14:59:18Z

I can convert it in Numba, don't worry.
Could you please post the results for min-max and max norms with and without that approach?
Thank you!

PaulLerner · 2022-11-28T15:21:19Z

Mmh that brings me to another feature request (opening another issue)

PaulLerner · 2022-11-29T14:27:22Z

Hi, so it turns out that it depends a lot on the normalization method. Your zmuv (query-dependent) works worse than my custom ZMUV over the whole dataset (results above), but the overall best is max-normalization, without default-minimum. Maybe this will depend on the fusion method though. See results below.

With default minimum

Norm: zmuv, Method: wsum. Best parameters: {'weights': (0.3, 0.7)}.
Weighted SUM

Weights	MRR@100
(0.0, 1.0)	0.322
(0.1, 0.9)	0.323
(0.2, 0.8)	0.323
(0.3, 0.7)	0.324
(0.4, 0.6)	0.322
(0.5, 0.5)	0.306
(0.6, 0.4)	0.248
(0.7, 0.3)	0.176
(0.8, 0.2)	0.148
(0.9, 0.1)	0.142
(1.0, 0.0)	0.127

Norm: min-max, Method: wsum. Best parameters: {'weights': (0.4, 0.6)}.
Weighted SUM

Weights	MRR@100
(0.0, 1.0)	0.322
(0.1, 0.9)	0.323
(0.2, 0.8)	0.324
(0.3, 0.7)	0.328
(0.4, 0.6)	0.329
(0.5, 0.5)	0.269
(0.6, 0.4)	0.169
(0.7, 0.3)	0.153
(0.8, 0.2)	0.146
(0.9, 0.1)	0.141
(1.0, 0.0)	0.127

Norm: max, Method: wsum. Best parameters: {'weights': (0.3, 0.7)}.

Weights	MRR@100
(0.0, 1.0)	0.322
(0.1, 0.9)	0.329
(0.2, 0.8)	0.338
(0.3, 0.7)	0.339
(0.4, 0.6)	0.320
(0.5, 0.5)	0.280
(0.6, 0.4)	0.242
(0.7, 0.3)	0.191
(0.8, 0.2)	0.160
(0.9, 0.1)	0.146
(1.0, 0.0)	0.127

Without default minimum

Norm: zmuv, Method: wsum. Best parameters: {'weights': (0.2, 0.8)}.
Weighted SUM

Weights	MRR@100
(0.0, 1.0)	0.322
(0.1, 0.9)	0.323
(0.2, 0.8)	0.323
(0.3, 0.7)	0.322
(0.4, 0.6)	0.318
(0.5, 0.5)	0.294
(0.6, 0.4)	0.251
(0.7, 0.3)	0.196
(0.8, 0.2)	0.159
(0.9, 0.1)	0.147
(1.0, 0.0)	0.135

Norm: min-max, Method: wsum. Best parameters: {'weights': (0.4, 0.6)}.
Weighted SUM

Weights	MRR@100
(0.0, 1.0)	0.322
(0.1, 0.9)	0.323
(0.2, 0.8)	0.324
(0.3, 0.7)	0.328
(0.4, 0.6)	0.329
(0.5, 0.5)	0.313
(0.6, 0.4)	0.169
(0.7, 0.3)	0.154
(0.8, 0.2)	0.145
(0.9, 0.1)	0.141
(1.0, 0.0)	0.127

Norm: max, Method: wsum. Best parameters: {'weights': (0.2, 0.8)}.
Weighted SUM

Weights	MRR@100
(0.0, 1.0)	0.322
(0.1, 0.9)	0.351
(0.2, 0.8)	0.351
(0.3, 0.7)	0.350
(0.4, 0.6)	0.350
(0.5, 0.5)	0.327
(0.6, 0.4)	0.173
(0.7, 0.3)	0.172
(0.8, 0.2)	0.173
(0.9, 0.1)	0.172
(1.0, 0.0)	0.127

AmenRa · 2022-11-29T14:42:43Z

Thank you very much, Paul!

I am happy to see that max-norm outperforms default-minimum.
To give you some context, I added/invented max norm because the minimum score is often unknown.
We usually fuse only the top retrieved documents from each model, which makes min-max (in this specific context) not very sound to me.
I did not do extensive experimentation but from my experience max norm outperforms min-max very often.

PaulLerner · 2022-11-30T14:43:09Z

Note that the results above depend on the models. With other models I found default-minimum to be essential to ZMUV normalization, which really makes sense to me, as I’ve said above.

What’s your opinion about ‘‘global’’ normalization? e.g. for ZMUV, computing the mean and std over the whole dataset instead of per-query?

By the way, I originally went for ZMUV + weighted sum because of https://doi.org/10.1145/502585.502657

AmenRa · 2022-11-30T15:12:46Z

I never used ZMUV, to be honest. I implemented it for completeness and tried it for comparison purposes but never got better results than min-max, max, or sum, which sometimes works the best.

In general, I prefer local normalization schemes because they are "unsupervised" and can be used out of the box.
Without strong empirical evidence that default-minimum (w/ or w/o ZMUV) works better than min-max, max, or sum, I would not use it.

Also, without a standardized way of normalizing/fusing results is often difficult to understand what brings improvements over the state-of-the-art. Conducting in-depth ablation studies is costly, and we often lack enough space on conference papers to write about them.

PaulLerner added the enhancement New feature or request label Nov 28, 2022

PaulLerner changed the title ~~[Feature Request] custom fusion methode~~ [Feature Request] custom fusion method in optimize_fusion Nov 28, 2022

PaulLerner closed this as completed Nov 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] custom fusion method in optimize_fusion #32

[Feature Request] custom fusion method in optimize_fusion #32

PaulLerner commented Nov 28, 2022

AmenRa commented Nov 28, 2022

PaulLerner commented Nov 28, 2022

AmenRa commented Nov 28, 2022

PaulLerner commented Nov 28, 2022

PaulLerner commented Nov 28, 2022 •

edited

Loading

AmenRa commented Nov 28, 2022

PaulLerner commented Nov 28, 2022

PaulLerner commented Nov 28, 2022

AmenRa commented Nov 28, 2022

PaulLerner commented Nov 28, 2022

PaulLerner commented Nov 29, 2022

AmenRa commented Nov 29, 2022

PaulLerner commented Nov 30, 2022

AmenRa commented Nov 30, 2022

[Feature Request] custom fusion method in optimize_fusion #32

[Feature Request] custom fusion method in optimize_fusion #32

Comments

PaulLerner commented Nov 28, 2022

AmenRa commented Nov 28, 2022

PaulLerner commented Nov 28, 2022

AmenRa commented Nov 28, 2022

PaulLerner commented Nov 28, 2022

PaulLerner commented Nov 28, 2022 • edited Loading

AmenRa commented Nov 28, 2022

PaulLerner commented Nov 28, 2022

PaulLerner commented Nov 28, 2022

AmenRa commented Nov 28, 2022

PaulLerner commented Nov 28, 2022

PaulLerner commented Nov 29, 2022

With default minimum

Without default minimum

AmenRa commented Nov 29, 2022

PaulLerner commented Nov 30, 2022

AmenRa commented Nov 30, 2022

PaulLerner commented Nov 28, 2022 •

edited

Loading