Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLlib][SPARK-2997] Update SVD documentation to reflect roughly square #2070

Closed
wants to merge 7 commits into from

Conversation

rezazadeh
Copy link
Contributor

Update the documentation to reflect the fact we can handle roughly square matrices.

@@ -11,7 +11,7 @@ displayTitle: <a href="mllib-guide.html">MLlib</a> - Dimensionality Reduction
of reducing the number of variables under consideration.
It can be used to extract latent features from raw and noisy features
or compress data while maintaining the structure.
MLlib provides support for dimensionality reduction on tall-and-skinny matrices.
MLlib provides support for dimensionality reduction on the RowMatrix class.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a link to the scaladoc or the documentation for RowMatrix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give some high-level guidance on the scalability of our approaches / what sorts of matrices and values of k on which its appropriate to run MLlib's SVD?

@SparkQA
Copy link

SparkQA commented Aug 21, 2014

QA tests have started for PR 2070 at commit 6429c59.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 21, 2014

QA tests have started for PR 2070 at commit 7ffa2aa.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 21, 2014

QA tests have finished for PR 2070 at commit 6429c59.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 21, 2014

QA tests have finished for PR 2070 at commit 7ffa2aa.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -119,14 +137,13 @@ statistical method to find a rotation such that the first coordinate has the lar
possible, and each succeeding coordinate in turn has the largest variance possible. The columns of
the rotation matrix are called principal components. PCA is used widely in dimensionality reduction.

MLlib supports PCA for tall-and-skinny matrices stored in row-oriented format.
MLlib supports PCA for matrices stored in row-oriented format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need tall-and-skinny matrices for PCA.

@SparkQA
Copy link

SparkQA commented Aug 22, 2014

QA tests have started for PR 2070 at commit 3f34fc6.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 22, 2014

QA tests have finished for PR 2070 at commit 3f34fc6.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@atalwalkar
Copy link
Contributor

Aside from the one issue of $U$ being the left singular vectors, this looks good to me.

@SparkQA
Copy link

SparkQA commented Aug 23, 2014

QA tests have started for PR 2070 at commit 826b8fe.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 23, 2014

QA tests have finished for PR 2070 at commit 826b8fe.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Aug 25, 2014

I've merged this into master and branch-1.1. Thanks!!

@asfgit asfgit closed this in b1b2030 Aug 25, 2014
asfgit pushed a commit that referenced this pull request Aug 25, 2014
Update the documentation to reflect the fact we can handle roughly square matrices.

Author: Reza Zadeh <rizlar@gmail.com>

Closes #2070 from rezazadeh/svddocs and squashes the following commits:

826b8fe [Reza Zadeh] left singular vectors
3f34fc6 [Reza Zadeh] PCA is still TS
7ffa2aa [Reza Zadeh] better title
aeaf39d [Reza Zadeh] More docs
788ed13 [Reza Zadeh] add computational cost explanation
6429c59 [Reza Zadeh] Add link to rowmatrix docs
1eeab8b [Reza Zadeh] Update SVD documentation to reflect roughly square

(cherry picked from commit b1b2030)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Update the documentation to reflect the fact we can handle roughly square matrices.

Author: Reza Zadeh <rizlar@gmail.com>

Closes apache#2070 from rezazadeh/svddocs and squashes the following commits:

826b8fe [Reza Zadeh] left singular vectors
3f34fc6 [Reza Zadeh] PCA is still TS
7ffa2aa [Reza Zadeh] better title
aeaf39d [Reza Zadeh] More docs
788ed13 [Reza Zadeh] add computational cost explanation
6429c59 [Reza Zadeh] Add link to rowmatrix docs
1eeab8b [Reza Zadeh] Update SVD documentation to reflect roughly square
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants