Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with MAP #22

Closed
Perenz opened this issue Aug 1, 2022 · 2 comments
Closed

Problems with MAP #22

Perenz opened this issue Aug 1, 2022 · 2 comments

Comments

@Perenz
Copy link

Perenz commented Aug 1, 2022

I understood that, when evaluating MAP@k, relevance judgment scores equal to 0 are ignored.
In my case, I get a bit of a weird behaviour.

I'm working on a balanced dataset with binary relevancy and define qrels by including both 1s and 0s documents.
While ndcg@10 gives me results at about 0.7, MAP@10 is extremely low at about 0.10.

Can this be because, besides the very first documents, the model perform poorly or am I doing something wrong when evaluating?

qrels = Qrels.from_df(
    df=test_loaded_pdf,
    q_id_col="user_id",
    doc_id_col="run_session_id",
    score_col="target_binary",
)

run = Run.from_df(
    df=test_loaded_pdf,
    q_id_col="user_id",
    doc_id_col="run_session_id",
    score_col="predictions",
)

evaluate(qrels, run, ["map@10", "mrr", "ndcg@10"])

predictions in test_loaded_pdf is not a list of binary relevancy but it's a float relevancy score

@AmenRa
Copy link
Owner

AmenRa commented Aug 2, 2022

Hi Stefano,

Almost all the metrics ignore qrels with zero scores, including NDCG.
So the difference you get is not because of that.

However, I think your results are entirely possible if you have many relevance judgments for each query.
Note that the Average Precision denominator is equal to the number of relevant documents, regardless of the cut-off.
Conversely, DCG and, therefore, NDCG depends on the cut-off and not on the number of relevant documents.
(Please, cross check this for correctness.)

Here is a toy example:

from ranx import Qrels, Run, evaluate

# 100 relevant docs
qrels = Qrels({
    "q1": {f"d{i}": 1 for i in range(100)}
})

# Only one relevant doc is returned
run = Run({
    "q1": {**{"d1":1000}, **{f"dd{i}": i for i in range(99)}}
})

>>>
{
    "map": 0.01,
    "map@10": 0.01,
    "ndcg@10": 0.22009176629808017,
    "ndcg@100": 0.047758523260819974,
}

From my experience, MAP is commonly used with larger cut-offs than 10 (usually 100) or with no cut-off at all.

Best,

Elias

@AmenRa
Copy link
Owner

AmenRa commented Aug 29, 2022

Closing for inactivity.
Feel free to re-open if needed.

@AmenRa AmenRa closed this as completed Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants