Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with r-precision #2

Closed
osf9018 opened this issue Nov 30, 2021 · 8 comments
Closed

Problem with r-precision #2

osf9018 opened this issue Nov 30, 2021 · 8 comments

Comments

@osf9018
Copy link

osf9018 commented Nov 30, 2021

Hi,

I tested your code and found that it was easy to use and integrate. Moreover, the results I got are fully coherent with those I previously obtained with a personal implementation of trec_eval and the computation of the measures is fast. This is clearly an interesting software and its presentation to the demo session of ECIR 2022 is a good thing.

I had only a problem with the R-precision measure. The main problem is that if you replace ""ndcg@5" in the 4th cell of the overview.ipynb notebook, you get:

`

TypeError Traceback (most recent call last)
/tmp/ipykernel_28676/2318072837.py in
1 # Compute NDCG@5
----> 2 evaluate(qrels, run, "r-precision")

/vol/data/ferret/tools-distrib/_research_code/rank_eval/rank_eval/meta_functions.py in evaluate(qrels, run, metrics, return_mean, threads, save_results_in_run)
149 for m, scores in metric_scores_dict.items():
150 for i, q_id in enumerate(run.get_query_ids()):
--> 151 run.scores[m][q_id] = scores[i]
152 # Prepare output -----------------------------------------------------------
153 if return_mean:

TypeError: 'numpy.float64' object does not support item assignment
`

I first detected the problem through the integration of your code and obtained the same error. By looking at the file meta_functions.py where the problem arises:

143 if type(run) == Run and save_results_in_run:
144 for m, scores in metric_scores_dict.items():
145 if m not in ["r_precision", "r-precision"]:
146 run.mean_scores[m] = np.mean(scores)
147 else:
148 run.scores[m] = np.mean(scores)
149 for m, scores in metric_scores_dict.items():
150 for i, q_id in enumerate(run.get_query_ids()):
151 run.scores[m][q_id] = scores[i]

I saw your recent last update of this part of the code but there is still a problem since for R-precision, the mean of the scores is stored in run.score and not in run.mean_scores. As a consequence, the use of run.scores for storing the score of each query raises a problem if both return_mean and save_results_in_run flags are set to True. More globally, I am not sure to understand why you differentiate R-precision from the other measures concerning the computation of the mean score.

Thank you by advance for your efforts for fixing the issue.

Olivier

@AmenRa
Copy link
Owner

AmenRa commented Nov 30, 2021

Hi Oliver,

Thanks for your interest in rank_eval and the kind words.

Yesterday, when I did the last commit, I noticed something was off in the code there!
I'm going to address the problem in the next few days and come back to you.

Thanks for your feedback.

Have a good one,

Elias

@AmenRa
Copy link
Owner

AmenRa commented Dec 1, 2021

@osf9018 the issue is now fixed.

Thanks again for your feedback!

Closing.

@AmenRa AmenRa closed this as completed Dec 1, 2021
@osf9018
Copy link
Author

osf9018 commented Dec 5, 2021 via email

@AmenRa
Copy link
Owner

AmenRa commented Dec 6, 2021

Sorry Oliver, I do not understand what ***@***.*** means.
Could you clarify, please?

@osf9018
Copy link
Author

osf9018 commented Dec 6, 2021 via email

@AmenRa
Copy link
Owner

AmenRa commented Dec 9, 2021

Sorry Oliver, I am confused: I see ***@***.*** in your messages on four different browsers on two devices.

I think I probably called that measure Hits because it is the (mean) number of relevant documents retrieved for each query.
It is an integer value for each query, not a boolean one.
I am sure I saw it in some paper, but I do not recall which one at the moment.

It is a sub-function of other metrics, so I decided to expose it if someone wants to use it, but maybe I should hide it as it is not that useful anyway.

For example, precision is something like:

def precision(qrels, run, k):
    # If k is 0 use the number of retrieved documents
    k = k if k != 0 else len(run)

    return hits(qrels, run, k) / k

while recall is something like:

def recall(qrels, run, k):
    # If k is 0 use the number of retrieved documents
    # In this case k is used just to avoid useless computations
    # as we divide the number of retrieved relevant documents (hits)
    # by the number of relevant documents later
    k = k if k != 0 else len(run)

    return hits(qrels, run, k) / len(qrels)

You can use it for an analysis purpose if you want, but I suggest you stick with the other metrics for scientific evaluation / comparison.

@osf9018
Copy link
Author

osf9018 commented Dec 10, 2021 via email

@AmenRa
Copy link
Owner

AmenRa commented Dec 10, 2021

Hi Oliver,

Thanks again for your feedback.
I will take your suggestion into consideration.

I also inform you that my tool is going to change name soon because of naming similarities with other tools.
The new name is ranx. You can already install the library with pip install ranx.

Best regards,

Elias

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants