Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMStandardScaler fails when scale = 0 #124

Closed
AtharvaKhare opened this issue May 26, 2019 · 7 comments · Fixed by #128
Closed

PMStandardScaler fails when scale = 0 #124

AtharvaKhare opened this issue May 26, 2019 · 7 comments · Fixed by #128

Comments

@AtharvaKhare
Copy link

AtharvaKhare commented May 26, 2019

A relevant example:

x := PMMatrix rows: #(#(1 0) #(1 0) #(1 1) #(1 1)).
scaler := PMStandardizationScaler new.
scaler fit: x.
scaler scale. "0"
scaler transform: x. "ZeroDivideError"

Possible fix: add a "handleZeroScale" method which changes scale to 1 when it is 0.
Question: Can this be method be added in PMDataTransformer? If we add more scalers in future, all would need this message.

@SergeStinckwich
Copy link
Member

SergeStinckwich commented May 27, 2019

Why we should modify the scale to 1 when this is 0 ? 😄

@SergeStinckwich
Copy link
Member

we should have a look how this is done in scikit-learn : https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

@AtharvaKhare
Copy link
Author

We are doing (element - mean) / scale. When scale=0, we can't divide by zero. Since all elements are same, the element-mean will be 0 as it should be. So any number in the denominator will be fine. 1 specifically because any number divided by it is itself, but again, any number should be ok.

sklearn is also setting scale to 1.

@SergeStinckwich
Copy link
Member

SergeStinckwich commented May 28, 2019

ok great if they use the same trick. I find the Python code here: https://github.com/scikit-learn/scikit-learn/blob/7813f7efb/sklearn/preprocessing/data.py#L62

@SergeStinckwich
Copy link
Member

Can you propose a fix, should be easy ?

@AtharvaKhare
Copy link
Author

One of the fix is to modify this method:

PMStandardizationScaler >> scale [
^ self variance sqrt
]

To calculate sqrt for every element, compare to 0, and return sqrt or 1.

Other way is to add a handleZeros method, but I do not think we should go there unless we add more types of scaling.

Do you want me to create PR using first suggested fix?

@SergeStinckwich
Copy link
Member

Maybe later will have multiple type of scalin, but at the moment, we have only one.
So keep it simple and use first method. Maybe add a test when scale is 0 that it returns 1.

@SergeStinckwich SergeStinckwich changed the title PMStandardScaler fails when scale = 0 PMStandardScaler fails when scale = 0 Aug 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants