You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Yeo-Johnson is not a surjective transformation for negative lambdas. Therefore, the inverse transformation returns np.nan when inverse transforming values outside the range of the transform. This failure is silent, so it took me quite a while of debugging to understand this behavior.
To reproduce for positive values (there is a similar problem for negative values):
importnumpyasnpimportsklearn.preprocessingtrans=sklearn.preprocessing.PowerTransformer(method='yeo-johnson')
x=np.array([1,1,1e10]).reshape(-1, 1) # extreme skewtrans.fit(x)
lmbda=trans.lambdas_[0]
print(lmbda)
assertlmbda<0# == -0.096 negative value# any value `psi` for which lambda*psi+1 <= 0 will result in nan due to lacking support, since the forwards transformation # is not surjective on negative lambdas. In this specific case, 10*-0.096 < 1psi=np.array([10]).reshape(-1, 1)
x=trans.inverse_transform(psi).item()
print(x)
assertnp.isnan(x)
Thanks for the report and the analysis. I confirm I can reproduce on main with the provided reproducer.
I think that calling inverse_transform with negative lambda values should at least raise a warning.
Not sure if it would be helpful to raise such a warning at fit time though. Maybe some users only care about the transform without inverse_transform and raising a warning would be annoying for those users.
Short circuiting means the np.any calls are often not computed at all, so it should be really cheap. tmp1 and tmp2 needs to be computed for the inverse_transform so that is no extra work.
EDIT: hopefully with better variable names than tmp1 and tmp2...
Describe the bug
The Yeo-Johnson is not a surjective transformation for negative lambdas. Therefore, the inverse transformation returns
np.nan
when inverse transforming values outside the range of the transform. This failure is silent, so it took me quite a while of debugging to understand this behavior.The problematic lines are
scikit-learn/sklearn/preprocessing/_data.py
Line 3390 in 8721245
and
scikit-learn/sklearn/preprocessing/_data.py
Line 3386 in 8721245
in which we might compute
np.power(something_negative, not_integral_value)
, which of course returnsnp.nan
as per https://numpy.org/doc/stable/reference/generated/numpy.power.htmlSteps/Code to Reproduce
To reproduce for positive values (there is a similar problem for negative values):
Expected Results
The code should either:
Actual Results
It just prints
Versions
The text was updated successfully, but these errors were encountered: