-
Notifications
You must be signed in to change notification settings - Fork 29.1k
Use numpy directly for matrix multiply. #687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Using matrix multiply to compute XtX and XtY yields a 5-20x speedup depending on problem size. For example - the following takes 19s locally after this change vs. 5m21s before the change. (16x speedup). bin/pyspark examples/src/main/python/als.py local[8] 1000 1000 50 10 10
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
examples/src/main/python/als.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@etrain Could you also update this line? We should only touch the diagonals.
This probably won't make a huge difference when K is small, but it's better style.
|
Merged build triggered. |
|
Merged build started. |
|
LGTM. Thanks! |
|
Thanks. Merged. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
Using matrix multiply to compute XtX and XtY yields a 5-20x speedup depending on problem size. For example - the following takes 19s locally after this change vs. 5m21s before the change. (16x speedup). bin/pyspark examples/src/main/python/als.py local[8] 1000 1000 50 10 10 Author: Evan Sparks <evan.sparks@gmail.com> Closes #687 from etrain/patch-1 and squashes the following commits: e094dbc [Evan Sparks] Touching only diaganols on update. d1ab9b6 [Evan Sparks] Use numpy directly for matrix multiply. (cherry picked from commit 6ed7e2c) Signed-off-by: Reynold Xin <rxin@apache.org>
Using matrix multiply to compute XtX and XtY yields a 5-20x speedup depending on problem size. For example - the following takes 19s locally after this change vs. 5m21s before the change. (16x speedup). bin/pyspark examples/src/main/python/als.py local[8] 1000 1000 50 10 10 Author: Evan Sparks <evan.sparks@gmail.com> Closes apache#687 from etrain/patch-1 and squashes the following commits: e094dbc [Evan Sparks] Touching only diaganols on update. d1ab9b6 [Evan Sparks] Use numpy directly for matrix multiply.
Using matrix multiply to compute XtX and XtY yields a 5-20x speedup depending on problem size.
For example - the following takes 19s locally after this change vs. 5m21s before the change. (16x speedup).
bin/pyspark examples/src/main/python/als.py local[8] 1000 1000 50 10 10