add an option to disable sort_dict_of_dict_by_value when adding results to a run #9

PaulLerner · 2022-01-19T16:55:00Z

Hi -- guy with the weird feature requests here 😅 --

Motivation

You don’t want to ask, but, I have some use case where all the documents returned by my system have the same score, however the order matters!
And, when you add_and_sort documents to a run, you end up applying sort_dict_of_dict_by_value, which might reverse the order or completely shuffle the order of document ids:

In [1]: from ranx import Qrels, Run, evaluate

In [2]: run = Run()
   ...: run.add_multi(
   ...:     q_ids=["q_1", "q_2"],
   ...:     doc_ids=[
   ...:         ["doc_12", "doc_23", "doc_25", "doc_36", "doc_32", "doc_35"],
   ...:         ["doc_12", "doc_11", "doc_25", "doc_36", "doc_2",  "doc_35"],
   ...:     ],
   ...:     scores=[
   ...:         [0.9, 0.9, 0.9, 0.9, 0.9, 0.9],
   ...:         [0.9, 0.9, 0.9, 0.9, 0.9, 0.9],
   ...:     ],
   ...: )
In [3]: list(run.run['q_1'].keys())
Out[3]: ['doc_35', 'doc_32', 'doc_36', 'doc_25', 'doc_23', 'doc_12']

Solution

Obviously, my system could add a slightly negative number to preserve the order of documents, however, this is more of a pain to me than commenting this line.

The request

Would you be be willing to add an option to disable sort_dict_of_dict_by_value when calling add_multi?

Thanks for the quick response on my other issues :)

The text was updated successfully, but these errors were encountered:

AmenRa · 2022-01-19T17:43:21Z

Hi Paul,

The rationale behind forcing sorting is to prevent the users to forget about it, which could cause a wrong evaluation.

I thought about adding an option to avoid sorting to add_multi to avoid useless computation.
You could add queries to your Run / Qrels by batch, causingranx to perform sorting even when it's not needed.
Because of that, I suggest using .from_dict to create Run / Qrels at the moment.

However, your problem poses a question about evaluating your lists as they are not ranked.
If you are sure everything is fine with your data/model, you should manage the issue for your specific case.
Otherwise, you could run into reproducibility issues, in my opinion.

Sorry if what I'm about to say seems obvious.
If you have a sorted list of document IDs without meaningful scores, you could generate those as simple as follows:

scores=[s for s in range(len(doc_ids))][::-1]

It seems pretty feasible to me. What do you think?

Best,

Elias

PaulLerner · 2022-01-21T07:32:19Z

Hi,

Thanks for your answer.
I won’t get into the details but my use case is actually a little bit more tricky than this.

I’ll consider using from_dict!

AmenRa · 2022-01-21T08:22:32Z

Mind that from_dictstill triggers sorting.

PaulLerner · 2022-01-21T14:04:08Z

Oh, ok, I misunderstood your first answer. So are you still considering

adding an option to avoid sorting to add_multi to avoid useless computation

?

AmenRa · 2022-01-21T14:25:03Z

I am, but I will probably make changes that do not solve your issue.
My idea is to postpone the sorting operation to the first time a Qrels or Run is used for evaluation, following the lazy evaluation paradigm, but not to make sorting completely optional.
As I told you before, I want ranx to take care of everything so that the user doesn't have to worry about sorting and other operations.

PaulLerner · 2022-01-21T15:31:33Z

Ok, I understand, thanks for your quick answers :)

AmenRa added the enhancement New feature or request label Jan 19, 2022

PaulLerner closed this as completed Jan 21, 2022

PaulLerner reopened this Jan 21, 2022

PaulLerner closed this as completed Jan 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add an option to disable sort_dict_of_dict_by_value when adding results to a run #9

add an option to disable sort_dict_of_dict_by_value when adding results to a run #9

PaulLerner commented Jan 19, 2022

AmenRa commented Jan 19, 2022 •

edited

Loading

PaulLerner commented Jan 21, 2022

AmenRa commented Jan 21, 2022

PaulLerner commented Jan 21, 2022

AmenRa commented Jan 21, 2022

PaulLerner commented Jan 21, 2022

add an option to disable sort_dict_of_dict_by_value when adding results to a run #9

add an option to disable sort_dict_of_dict_by_value when adding results to a run #9

Comments

PaulLerner commented Jan 19, 2022

Motivation

Solution

The request

AmenRa commented Jan 19, 2022 • edited Loading

PaulLerner commented Jan 21, 2022

AmenRa commented Jan 21, 2022

PaulLerner commented Jan 21, 2022

AmenRa commented Jan 21, 2022

PaulLerner commented Jan 21, 2022

AmenRa commented Jan 19, 2022 •

edited

Loading