Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add an option to disable sort_dict_of_dict_by_value when adding results to a run #9

Closed
PaulLerner opened this issue Jan 19, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@PaulLerner
Copy link

Hi -- guy with the weird feature requests here 😅 --

Motivation

You don’t want to ask, but, I have some use case where all the documents returned by my system have the same score, however the order matters!
And, when you add_and_sort documents to a run, you end up applying sort_dict_of_dict_by_value, which might reverse the order or completely shuffle the order of document ids:

In [1]: from ranx import Qrels, Run, evaluate

In [2]: run = Run()
   ...: run.add_multi(
   ...:     q_ids=["q_1", "q_2"],
   ...:     doc_ids=[
   ...:         ["doc_12", "doc_23", "doc_25", "doc_36", "doc_32", "doc_35"],
   ...:         ["doc_12", "doc_11", "doc_25", "doc_36", "doc_2",  "doc_35"],
   ...:     ],
   ...:     scores=[
   ...:         [0.9, 0.9, 0.9, 0.9, 0.9, 0.9],
   ...:         [0.9, 0.9, 0.9, 0.9, 0.9, 0.9],
   ...:     ],
   ...: )
In [3]: list(run.run['q_1'].keys())
Out[3]: ['doc_35', 'doc_32', 'doc_36', 'doc_25', 'doc_23', 'doc_12']

Solution

Obviously, my system could add a slightly negative number to preserve the order of documents, however, this is more of a pain to me than commenting this line.

The request

Would you be be willing to add an option to disable sort_dict_of_dict_by_value when calling add_multi?

Thanks for the quick response on my other issues :)

@AmenRa AmenRa added the enhancement New feature or request label Jan 19, 2022
@AmenRa
Copy link
Owner

AmenRa commented Jan 19, 2022

Hi Paul,

The rationale behind forcing sorting is to prevent the users to forget about it, which could cause a wrong evaluation.

I thought about adding an option to avoid sorting to add_multi to avoid useless computation.
You could add queries to your Run / Qrels by batch, causingranx to perform sorting even when it's not needed.
Because of that, I suggest using .from_dict to create Run / Qrels at the moment.

However, your problem poses a question about evaluating your lists as they are not ranked.
If you are sure everything is fine with your data/model, you should manage the issue for your specific case.
Otherwise, you could run into reproducibility issues, in my opinion.

Sorry if what I'm about to say seems obvious.
If you have a sorted list of document IDs without meaningful scores, you could generate those as simple as follows:

scores=[s for s in range(len(doc_ids))][::-1]

It seems pretty feasible to me. What do you think?

Best,

Elias

@PaulLerner
Copy link
Author

Hi,

Thanks for your answer.
I won’t get into the details but my use case is actually a little bit more tricky than this.

I’ll consider using from_dict!

@AmenRa
Copy link
Owner

AmenRa commented Jan 21, 2022

Mind that from_dictstill triggers sorting.

@PaulLerner
Copy link
Author

Oh, ok, I misunderstood your first answer. So are you still considering

adding an option to avoid sorting to add_multi to avoid useless computation

?

@PaulLerner PaulLerner reopened this Jan 21, 2022
@AmenRa
Copy link
Owner

AmenRa commented Jan 21, 2022

I am, but I will probably make changes that do not solve your issue.
My idea is to postpone the sorting operation to the first time a Qrels or Run is used for evaluation, following the lazy evaluation paradigm, but not to make sorting completely optional.
As I told you before, I want ranx to take care of everything so that the user doesn't have to worry about sorting and other operations.

@PaulLerner
Copy link
Author

Ok, I understand, thanks for your quick answers :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants