A profile's average precision is 1 if it contains any NaNs #59

jessica-ewald · 2024-03-01T15:39:33Z

The problem

All results come from this notebook

If a profile contains even a single NaN, every pairwise similarity will also be NaN. Here is an example:

Here is an example where the profile that we are computing AP for has no NaN, but some of its pairs do:

When all pairs (pos and neg) have NaN similarities, copairs returns an AP of 1. This results in any profile with even a single NaN getting an AP of 1 (see ZMYND10_Arg340Gln and UBQLN2_Pro506Thr rows):

What to do?

As @alxndrkalinin pointed out, probably we don't want to enforce any NaN handling strategies inside of the pairwise similarity calculations because this can get complex.

But, maybe this also shouldn't happen without any warning? Users could easily have a small number of NaNs in their data and never notice because the mAP values would look normal (between 0 and 1), but would be biased upwards whenever one/some of the individual AP values are 1.

I suggest we add one simple check for NaNs in the feats input in the validate_pipeline_input function, and just give users a warning describing this behavior and let them know they should resolve NaNs. Another (more complicated) solution would be to change the tie strategy when computing average precision from the ranked list such that the average precision is like NaN instead of 1 if everything is NaN and therefore tied.

shntnu · 2024-03-01T16:28:06Z

But, maybe this also shouldn't happen without any warning?

100% agree

Users could easily have a small number of NaNs in their data and never notice because the mAP values would look normal (between 0 and 1), but would be biased upwards whenever one/some of the individual AP values are 1.

Yep

I suggest we add one simple check for NaNs in the feats input in the validate_pipeline_input function, and just give users a warning describing this behavior and let them know they should resolve NaNs.

agreed

    # Check for NaN values in feats DataFrame
    if feats.isna().any(axis=None):
        raise ValueError("Feature columns should not have NaN values.")

Another (more complicated) solution would be to change the tie strategy when computing average precision from the ranked list such that the average precision is like NaN instead of 1 if everything is NaN and therefore tied.

This is certainly more complicated. I think we should take a conservative+simple approach, albeit not the most user-friendly

johnarevalo · 2024-03-02T14:53:27Z

Thanks for catching this @jessica-ewald ! I agree on raising ValueError for any NaN. It'd be good to check if np.inf also lead to unexpected behaviors.

Please let me know if you want to author the PR for this change.

jessica-ewald · 2024-03-04T17:45:40Z

I've just submitted a pull request to check for NaNs. Haven't looked into behavior surrounding Infs but I'll be working with mAP more this week and can add it to the list!

johnarevalo linked a pull request Mar 5, 2024 that will close this issue

Check feature input for NaNs #60

Merged

johnarevalo mentioned this issue Mar 5, 2024

Check behavior of np.inf feature values #61

Open

johnarevalo closed this as completed in #60 Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A profile's average precision is 1 if it contains any NaNs #59

A profile's average precision is 1 if it contains any NaNs #59

jessica-ewald commented Mar 1, 2024

shntnu commented Mar 1, 2024

johnarevalo commented Mar 2, 2024

jessica-ewald commented Mar 4, 2024

A profile's average precision is 1 if it contains any NaNs #59

A profile's average precision is 1 if it contains any NaNs #59

Comments

jessica-ewald commented Mar 1, 2024

The problem

What to do?

shntnu commented Mar 1, 2024

johnarevalo commented Mar 2, 2024

jessica-ewald commented Mar 4, 2024