Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of NaNs #80

Open
naoise-h opened this issue Mar 20, 2023 · 0 comments
Open

Handling of NaNs #80

naoise-h opened this issue Mar 20, 2023 · 0 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@naoise-h
Copy link
Member

Currently, the presence of NaNs in a dataset produces a distinguishing event, as for most functions (other than nanmean, nanvar, nanstd) the output will always be NaN. Is the best solution for all functions to just ignore NaNs (like nanmean, etc)?

For single-dimensional problems, there should be no issue, as removing NaNs is a simple deterministic pre-processing step. For multi-dimensional problems, removing data rows may prove problematic for utility. Is there justification here for doing something fancier, like mapping to a value within the range?

inf may also need special consideration, although these can usually be overcome when clipping the data, as inf will clip to the upper bound, and -inf to the lower bound. NaN has no obvious value to map to. When the algorithm requires the norm of a row to be clipped (like LogisticRegression), mapping from inf to a value is no longer trivial. Do we map inf to a value that ensures the row's norm matches the clip, or do we also scale the rest of the row?

@naoise-h naoise-h added bug Something isn't working enhancement New feature or request labels Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant