Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rolling.Std ValueError: math domain error #20

Closed
syundo0730 opened this issue Jan 19, 2021 · 4 comments
Closed

rolling.Std ValueError: math domain error #20

syundo0730 opened this issue Jan 19, 2021 · 4 comments
Labels

Comments

@syundo0730
Copy link

Environment: OSX 10.15.7, python 3.7.9, rolling=0.2.0

When I run the following code,

values = [
    138,
    136,
    137,
    137,
    135,
    136,
    135,
    135,
    135,
]
std = rolling.Std(values, window_size=3, window_type='variable')
for _ in values:
    next(std)

I got an error.

.venv/lib/python3.7/site-packages/rolling/stats.py", line 166, in current_value
    return sqrt(self._sslm / (self._obs - self.ddof))
ValueError: math domain error

This is because self._sslm is a negative value.
The value of sslm changed as follows.

0.0
2.0
2.0
0.6666666666666572
2.6666666666666288
1.9999999999999147
0.6666666666664867
0.6666666666664867
-2.2737367544323206e-13
@ajcr
Copy link
Owner

ajcr commented Feb 28, 2021

Thanks for the report - this is definitely a bug

@ajcr ajcr added the bug label Feb 28, 2021
@mx781
Copy link

mx781 commented Jun 12, 2021

It seems to me that the bug is in Var._update_window - rewriting it as a delete-then-add operation seems to match results pandas gives for these values:

import pandas as pd
import numpy as np
from rolling import Var

class VarFixed(Var):
    def _update_window(self, new):
        self._remove_old()
        self._add_new(new)

window = 3
values = [138, 136, 137, 137, 135, 136, 135, 135, 135]

def test(VarianceClass):
    std = VarianceClass(values, window_size=window)
    for _ in values:
        try:
            print(next(std))
        except StopIteration:
            break

print("existing")
test(Var)
print("fixed")
test(VarFixed)
print("pandas")
print(pd.Series(values, dtype=np.float64).rolling(window).var())
existing
1.0
0.3333333333333286
1.3333333333333144
0.9999999999999574
0.33333333333324333
0.33333333333324333
-1.1368683772161603e-13
fixed
1.0
0.3333333333333357
1.33333333333335
1.0000000000000213
0.333333333333357
0.333333333333357
2.842170943040401e-14
pandas
0             NaN
1             NaN
2    1.000000e+00
3    3.333333e-01
4    1.333333e+00
5    1.000000e+00
6    3.333333e-01
7    3.333333e-01
8    2.842171e-14
dtype: float64

On a separate note, I wonder if it's worth considering adding in Kahan summation to ensure numerical stability; I noticed this while looking at the pandas source: https://github.com/pandas-dev/pandas/blob/e8dbdb07782437ad592736f6d212d0af5e7324be/pandas/_libs/window/aggregations.pyx#L317

EDIT:
Spoke too soon. This avoided the issue in the given example, but other values trip it up just as before, e.g.

[8648.0, 8650.5, 8650.5, 8653.5, 8655.0, 8652.5, 8652.0, 8652.0, 8652.0, 8652.0, 8652.5, 8652.0, 8652.0, 8652.0, 8652.0, 8652.0, 8652.0, 8652.0, 8652.0, 8652.0]

@belm0
Copy link

belm0 commented Nov 11, 2021

@ajcr there have been several releases since the report-- how about a fix?

using this expression in Std.current_value() has worked for us:

sqrt(max(0, self._sslm / (self._obs - self.ddof)))

@ajcr
Copy link
Owner

ajcr commented Nov 12, 2021

An immediate fix for this issue (variance dropping below zero) has been merged.

I've opened a separate issue to solve the broader precision issue with the float arithmetic (#25).

@ajcr ajcr closed this as completed Nov 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants