You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is similar to #35135 which was closed as "not an issue with pandas, but just numerical computations" but differently to that issue which showed minuscule differences (practically negligible), I am presenting an example where the Pearson correlation is over 15% above the maximum of 1.
I was able to reproduce this on multiple machines.
I think this might warrant a mention in the documentation.
Expected Behavior
These two are perfectly correlated, so we would expect 1. 1 is the result for both:
(data + 0.0000000000000002).corr().max().max()
(data - 0.0000000000000002).corr().max().max()
Interestingly, using corrwith or R leads to a different result which under-estimates the correlation (but at least is not out of range, and the relative error is smaller!):
Note that numpy.corrcoef clips the values into the correct range
Due to floating point rounding the resulting array may not be Hermitian, the diagonal elements may not be 1, and the elements may not satisfy the inequality abs(a) <= 1. The real and imaginary parts are clipped to the interval [-1, 1] in an attempt to improve on that situation but is not much help in the complex case.
Thanks for the report. It makes sense to me to clip the values here, in alignment with NumPy. However, I think we should not promise full compatibility with NumPy (same output on the same input).
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
The example above results in
1.1547005383792517
.This is similar to #35135 which was closed as "not an issue with pandas, but just numerical computations" but differently to that issue which showed minuscule differences (practically negligible), I am presenting an example where the Pearson correlation is over 15% above the maximum of 1.
I was able to reproduce this on multiple machines.
I think this might warrant a mention in the documentation.
Expected Behavior
These two are perfectly correlated, so we would expect
1
.1
is the result for both:(data + 0.0000000000000002).corr().max().max()
(data - 0.0000000000000002).corr().max().max()
Interestingly, using
corrwith
or R leads to a different result which under-estimates the correlation (but at least is not out of range, and the relative error is smaller!):data[['x']].corrwith(data['y'])
returns0.948683
cor
in R also returns 0.9486833Installed Versions
The text was updated successfully, but these errors were encountered: