Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMStandardScaler fails when scale = 0 #124

Closed
AtharvaKhare opened this issue May 26, 2019 · 7 comments

Comments

@AtharvaKhare
Copy link

commented May 26, 2019

A relevant example:

x := PMMatrix rows: #(#(1 0) #(1 0) #(1 1) #(1 1)).
scaler := PMStandardizationScaler new.
scaler fit: x.
scaler scale. "0"
scaler transform: x. "ZeroDivideError"

Possible fix: add a "handleZeroScale" method which changes scale to 1 when it is 0.
Question: Can this be method be added in PMDataTransformer? If we add more scalers in future, all would need this message.

@SergeStinckwich

This comment has been minimized.

Copy link
Member

commented May 27, 2019

Why we should modify the scale to 1 when this is 0 ? 😄

@SergeStinckwich

This comment has been minimized.

Copy link
Member

commented May 27, 2019

@AtharvaKhare

This comment has been minimized.

Copy link
Author

commented May 27, 2019

We are doing (element - mean) / scale. When scale=0, we can't divide by zero. Since all elements are same, the element-mean will be 0 as it should be. So any number in the denominator will be fine. 1 specifically because any number divided by it is itself, but again, any number should be ok.

sklearn is also setting scale to 1.

@SergeStinckwich

This comment has been minimized.

Copy link
Member

commented May 28, 2019

ok great if they use the same trick. I find the Python code here: https://github.com/scikit-learn/scikit-learn/blob/7813f7efb/sklearn/preprocessing/data.py#L62

@SergeStinckwich

This comment has been minimized.

Copy link
Member

commented May 29, 2019

Can you propose a fix, should be easy ?

@AtharvaKhare

This comment has been minimized.

Copy link
Author

commented May 29, 2019

One of the fix is to modify this method:

PMStandardizationScaler >> scale [
^ self variance sqrt
]

To calculate sqrt for every element, compare to 0, and return sqrt or 1.

Other way is to add a handleZeros method, but I do not think we should go there unless we add more types of scaling.

Do you want me to create PR using first suggested fix?

@SergeStinckwich

This comment has been minimized.

Copy link
Member

commented May 29, 2019

Maybe later will have multiple type of scalin, but at the moment, we have only one.
So keep it simple and use first method. Maybe add a test when scale is 0 that it returns 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.