-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MLlib][SPARK-2997] Update SVD documentation to reflect roughly square #2070
Conversation
@@ -11,7 +11,7 @@ displayTitle: <a href="mllib-guide.html">MLlib</a> - Dimensionality Reduction | |||
of reducing the number of variables under consideration. | |||
It can be used to extract latent features from raw and noisy features | |||
or compress data while maintaining the structure. | |||
MLlib provides support for dimensionality reduction on tall-and-skinny matrices. | |||
MLlib provides support for dimensionality reduction on the RowMatrix class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a link to the scaladoc or the documentation for RowMatrix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you give some high-level guidance on the scalability of our approaches / what sorts of matrices and values of k on which its appropriate to run MLlib's SVD?
QA tests have started for PR 2070 at commit
|
QA tests have started for PR 2070 at commit
|
QA tests have finished for PR 2070 at commit
|
QA tests have finished for PR 2070 at commit
|
@@ -119,14 +137,13 @@ statistical method to find a rotation such that the first coordinate has the lar | |||
possible, and each succeeding coordinate in turn has the largest variance possible. The columns of | |||
the rotation matrix are called principal components. PCA is used widely in dimensionality reduction. | |||
|
|||
MLlib supports PCA for tall-and-skinny matrices stored in row-oriented format. | |||
MLlib supports PCA for matrices stored in row-oriented format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need tall-and-skinny matrices for PCA.
QA tests have started for PR 2070 at commit
|
QA tests have finished for PR 2070 at commit
|
Aside from the one issue of |
QA tests have started for PR 2070 at commit
|
QA tests have finished for PR 2070 at commit
|
I've merged this into master and branch-1.1. Thanks!! |
Update the documentation to reflect the fact we can handle roughly square matrices. Author: Reza Zadeh <rizlar@gmail.com> Closes #2070 from rezazadeh/svddocs and squashes the following commits: 826b8fe [Reza Zadeh] left singular vectors 3f34fc6 [Reza Zadeh] PCA is still TS 7ffa2aa [Reza Zadeh] better title aeaf39d [Reza Zadeh] More docs 788ed13 [Reza Zadeh] add computational cost explanation 6429c59 [Reza Zadeh] Add link to rowmatrix docs 1eeab8b [Reza Zadeh] Update SVD documentation to reflect roughly square (cherry picked from commit b1b2030) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Update the documentation to reflect the fact we can handle roughly square matrices. Author: Reza Zadeh <rizlar@gmail.com> Closes apache#2070 from rezazadeh/svddocs and squashes the following commits: 826b8fe [Reza Zadeh] left singular vectors 3f34fc6 [Reza Zadeh] PCA is still TS 7ffa2aa [Reza Zadeh] better title aeaf39d [Reza Zadeh] More docs 788ed13 [Reza Zadeh] add computational cost explanation 6429c59 [Reza Zadeh] Add link to rowmatrix docs 1eeab8b [Reza Zadeh] Update SVD documentation to reflect roughly square
Update the documentation to reflect the fact we can handle roughly square matrices.