New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYSTEMDS-2756] Scale and PCA builtin update #1123
Conversation
Thanks @Baunsgaard for eliminating the unnecessary colMeans in case of center and scale. However, please refrain from unnecessary changes of APIs and external behavior. I'll revert to the original script and simplify the scale case accordingly. As minor notes: the tests were failing because we only support right-hand-side broadcasting, all inputs for an ifelse are computed within a single DAG which led to unnecessary computation, the scale vector replaced NaN and not zeros, and please avoid empty lines before closing braces. |
|
ad 1) besides the changed overall behavior, the comment referred to ad 2/3) for local experiments this is fine, but for the hierarchy of builtin functions, we should aim for simple and stable APIs and find other ways to automatically remove redundancy (e.g., by lineage-based reuse). Passing all these intermediates around would quickly become really messy and confusing for both users and developers. Which PCA predict do you refer to - is there already a builtin function for it? Generally, please don't rewrite the builtin functions just to make them more amenable for compressed operations. |
Yes, and exactly why i replace all the 0 values, to avoid introducing NaN. instead of removing the NaN on the full matrix after the division operation.
I agree simplicity is of utmost importance, but with the current PCA, we are limited to only apply the standard version without scaling if we want to be able to reuse the model on unseen data. This results in a limited system overall. The redundancy of the extra return values, should be handled already by the system.
I agree that it becomes messy, and it have been for a while, especially if you consider the neural network part of the system. where each layers weight and bias is parsed back. But these returns are necessary for inference on unseen data. I have added a PR with the PCA predict #1124 . Maybe it could be nice containing all the methods for an algorithm in the same file, such that the predict is located inside the same file?
I would argue that these changes are for the "greater good", and therefore also better for compression, I am avoiding changing things just for compressions sake, but thanks for the reminder 😄 . |
ad 1) there is a mismatch between what you wanted to do and what your code actual did, the comment just pointed that out. The PR did this
but you wanted to do this
|
ups, logic fine ... execution wrong. Great catch, thanks! |
No description provided.