Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: hits (or accuracy?) #7

Closed
PaulLerner opened this issue Jan 13, 2022 · 7 comments
Closed

feature request: hits (or accuracy?) #7

PaulLerner opened this issue Jan 13, 2022 · 7 comments
Labels
enhancement New feature or request

Comments

@PaulLerner
Copy link

Hi,

@osf9018 mentioned it in #2 but I guess it’s better to create a specific issue.

Motivation

It is often difficult to estimate the total number of relevant document for a query.
For example, in Question Answering, if you have a large enough Knowledge Base, you can find the answer to your question in a surprisingly large number of documents that one cannot annotate in advance. Because of this, the relevance of the document is often estimated on-the-go, by checking whether the answer string is in the document retrieved by the system.

Because of this, recall is not an appropriate metric. However, one way to circumvent this is to compute recall "as if" there was only a single relevant document. After averaging over the whole dataset, it corresponds to the proportion of question for which the system retrieved at least one relevant document in top-K. This is what @osf9018 and I call "hits@K" (I can’t remember but I’ve seen it in a paper) and others, such as Karpukhin et al., call "accuracy". Accuracy is a confusing term IMO.

The request

Would you be interested in implementing or integrating this feature in your library?
It might take some renaming but it could be implemented very easily by using the _hits function. It is simply min(1, _hits(qrels, run, k))

@AmenRa AmenRa added the enhancement New feature or request label Jan 14, 2022
@AmenRa
Copy link
Owner

AmenRa commented Jan 14, 2022

Hi, I can add it to the pool of the provided metrics for sure! :)

I'm just not confident about how I should call it.
Could success_rate or hit_rate be appropriate?
I can even call it hits and rename or hide the current hits metric.

What do you think?

@PaulLerner
Copy link
Author

hit_rate seems fine, as you want really :)

@PaulLerner
Copy link
Author

Hi,

I will need this feature pretty soon, do you plan to implement it soon?
Otherwise could you provide instructions so that I implement it myself?

Bests,

Paul

@AmenRa
Copy link
Owner

AmenRa commented Feb 1, 2022

Hi,

Added in 0.1.9 as hit_rate.
It supports at k as usual.

Closing.

@AmenRa AmenRa closed this as completed Feb 1, 2022
@PaulLerner
Copy link
Author

Should probably update report:

Traceback (most recent call last):
  File "/gpfswork/rech/fih/usl47jg/miniconda3/envs/meerqat/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/gpfswork/rech/fih/usl47jg/miniconda3/envs/meerqat/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/gpfsdswork/projects/rech/fih/usl47jg/meerqat/meerqat/ir/metrics.py", line 165, in <module>
    compare(args['--qrels'], args['<run>'], output_path=args['--output'], **kwargs)
  File "/gpfsdswork/projects/rech/fih/usl47jg/meerqat/meerqat/ir/metrics.py", line 136, in compare
    print(report)
  File "/gpfsdswork/projects/rech/fih/usl47jg/ranx/ranx/report.py", line 202, in __str__
    return self.to_table()
  File "/gpfsdswork/projects/rech/fih/usl47jg/ranx/ranx/report.py", line 75, in to_table
    for x in list(list(self.results.values())[0].keys())
  File "/gpfsdswork/projects/rech/fih/usl47jg/ranx/ranx/report.py", line 75, in <listcomp>
    for x in list(list(self.results.values())[0].keys())
  File "/gpfsdswork/projects/rech/fih/usl47jg/ranx/ranx/report.py", line 59, in get_metric_label
    return f"{metric_labels[m_splitted[0]]}@{m_splitted[1]}"
KeyError: 'hit_rate'

PaulLerner added a commit to PaulLerner/ranx that referenced this issue Feb 2, 2022
@PaulLerner
Copy link
Author

this fixes it. I can open a PR PaulLerner@f9a6751

@AmenRa
Copy link
Owner

AmenRa commented Feb 2, 2022

Fixed in 0.1.10. Sorry for the inconvenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants