Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of NaN #12

Open
kmuehlbauer opened this issue May 16, 2018 · 2 comments
Open

Handling of NaN #12

kmuehlbauer opened this issue May 16, 2018 · 2 comments

Comments

@kmuehlbauer
Copy link

Short question, is it somehow possible to extend this to handle NaN, like numpy nanmedian?

@ajcr
Copy link
Owner

ajcr commented May 17, 2018

Hi @kmuehlbauer, that's a good idea, I'll have to give some thought about how this could be implemented for each rolling iterator without affecting complexity.

For now, it should be straightforward to do this for some of the functions, just by using a generator with an appropriate fill-value. For example, Sum, filling NaN with 0:

>>> import math
>>> array = [1, 2, math.nan, 7, math.nan, 3, 2]
>>> array_fill_nan = (0 if math.isnan(x) else x for x in array) # generator, fills NaN values
>>> list(rolling.Sum(array_fill_nan, 3))
[3, 9, 7, 10, 5]

This approach doesn't work for Median however, as the fill value required at each step is not necessarily constant. I'll see whether adding support for missing values is feasible here. FWIW I think pandas just consider the whole window to be NaN if it contains at least one NaN value.

If your window size is small, rolling.Apply(array, window_size, operation=np.nanmedian)) should still be quite fast.

@kmuehlbauer
Copy link
Author

@ajcr Thanks for looking into this. I'll definitely try your suggestion using rolling.Apply(array, window_size, operation=np.nanmedian)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants