-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A profile's average precision is 1 if it contains any NaNs #59
Comments
100% agree
Yep
agreed # Check for NaN values in feats DataFrame
if feats.isna().any(axis=None):
raise ValueError("Feature columns should not have NaN values.")
This is certainly more complicated. I think we should take a conservative+simple approach, albeit not the most user-friendly |
Thanks for catching this @jessica-ewald ! I agree on raising ValueError for any Please let me know if you want to author the PR for this change. |
I've just submitted a pull request to check for NaNs. Haven't looked into behavior surrounding Infs but I'll be working with mAP more this week and can add it to the list! |
The problem
All results come from this notebook
If a profile contains even a single NaN, every pairwise similarity will also be NaN. Here is an example:
Here is an example where the profile that we are computing AP for has no NaN, but some of its pairs do:
When all pairs (pos and neg) have NaN similarities, copairs returns an AP of 1. This results in any profile with even a single NaN getting an AP of 1 (see ZMYND10_Arg340Gln and UBQLN2_Pro506Thr rows):
What to do?
As @alxndrkalinin pointed out, probably we don't want to enforce any NaN handling strategies inside of the pairwise similarity calculations because this can get complex.
But, maybe this also shouldn't happen without any warning? Users could easily have a small number of NaNs in their data and never notice because the mAP values would look normal (between 0 and 1), but would be biased upwards whenever one/some of the individual AP values are 1.
I suggest we add one simple check for NaNs in the feats input in the validate_pipeline_input function, and just give users a warning describing this behavior and let them know they should resolve NaNs. Another (more complicated) solution would be to change the tie strategy when computing average precision from the ranked list such that the average precision is like NaN instead of 1 if everything is NaN and therefore tied.
The text was updated successfully, but these errors were encountered: