-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] custom fusion method in optimize_fusion #32
Comments
Hi Paul, Are you referring to this passage? If so, as the authors say, this is a normalization technique, not a fusion method. I did not have the time to read the entire paper, but this "normalization" method is not very sound to me. |
Yes
Ok, then I’ll close this issue for now, thanks for the tips!
I think both techniques can be combined. The default-minimum technique helps because of the top-K cutoff.
No |
Would you mind computing the results for |
What do you mean? Reproducing the experiments of Ma et al. ? Or on some dummy runs? |
P.S. Especially if you use ZMUV normalization, a document absent from A’s results would effectively have a score of 0, so an average score instead of a bad score |
Sorry, I assumed you were reproducing the results from that paper... |
No problem :) I’ll let you know once I’ll compare with other methods |
So, for what it’s worth, combined with a global ZMUV normalization (mean and std computed over the whole dataset instead of being query-dependent) and
Without:
I implemented it in pure python but I guess you would like it in numba: def default_minimum(runs):
# union results
all_documents = {}
for run in runs:
for q_id, results in run.run.items():
all_documents.setdefault(q_id, set())
all_documents[q_id] |= results.keys()
# set default-minimum in runs
for run in runs:
for q_id, results in run.run.items():
minimum = min(results.values())
for d_id in all_documents[q_id]:
results.setdefault(d_id, minimum)
return runs |
I can convert it in Numba, don't worry. |
Mmh that brings me to another feature request (opening another issue) |
Hi, so it turns out that it depends a lot on the normalization method. Your With default minimumNorm: zmuv, Method: wsum. Best parameters: {'weights': (0.3, 0.7)}.
Norm: min-max, Method: wsum. Best parameters: {'weights': (0.4, 0.6)}.
Norm: max, Method: wsum. Best parameters: {'weights': (0.3, 0.7)}.
Without default minimumNorm: zmuv, Method: wsum. Best parameters: {'weights': (0.2, 0.8)}.
Norm: min-max, Method: wsum. Best parameters: {'weights': (0.4, 0.6)}.
Norm: max, Method: wsum. Best parameters: {'weights': (0.2, 0.8)}.
|
Thank you very much, Paul! I am happy to see that |
Note that the results above depend on the models. With other models I found default-minimum to be essential to ZMUV normalization, which really makes sense to me, as I’ve said above. What’s your opinion about ‘‘global’’ normalization? e.g. for ZMUV, computing the mean and std over the whole dataset instead of per-query? By the way, I originally went for ZMUV + weighted sum because of https://doi.org/10.1145/502585.502657 |
I never used In general, I prefer local normalization schemes because they are "unsupervised" and can be used out of the box. Also, without a standardized way of normalizing/fusing results is often difficult to understand what brings improvements over the state-of-the-art. Conducting in-depth ablation studies is costly, and we often lack enough space on conference papers to write about them. |
Is your feature request related to a problem? Please describe.
Hi, you’ve done a great job implementing plenty of different fusion algorithms, but I think it will always be a bottleneck.
What would you think about letting the user define their own training function?
Describe the solution you'd like
For example, in optimize_fusion, allow
method
to be acallable
and in this case, do not callhas_hyperparams
andoptimization_switch
.Describe alternatives you've considered
My use case/ Ma et al.
By the way, at the moment, my use case is to use the default-minimum trick of Ma et al.: when combining results from systems A and B, it consists in giving the minimum score of A's results if a given document was only retrieved by system B, and vice-versa.
Maybe this is already possible in ranx via some option/method named differently? Or maybe you’d like to add it in the core ranx fusion algorithms?
The text was updated successfully, but these errors were encountered: