-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed improvements #18
Conversation
1. Achived 5X speed improvement in `mse` by using `np.square` rather than `np.power(x, 2)`, and by reusing intermediate buffers via use of universal function's out= keyword. ``` In [2]: # %load mse.py ...: import numpy as np ...: ...: def mse(x, xhat): ...: """ Calcualte mse between vector or matrix x and xhat """ ...: sum_ = np.sum(np.power(x - xhat, 2)) ...: return sum_ / x.size ...: ...: def mse2(x, xhat): ...: """ Calculate mse between vector or matrix x and xhar """ ...: sum_ = np.sum(np.square(x - xhat)) ...: return sum_ / x.size ...: ...: def mse3(x, xhat): ...: """ Calcualte mse between vector or matrix x and xhat """ ...: sum_ = x - xhat ...: np.square(sum_, out=sum_) ...: return np.sum(sum_) / x.size ...: ...: In [3]: v = np.random.uniform(0,1, size=10**6) In [4]: m = np.random.uniform(0,1, size=(2048, 2048)) In [5]: %timeit mse(v, 0.5) 14.7 ms ± 6.88 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [6]: %timeit mse2(v, 0.5) 4.4 ms ± 2.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [7]: %timeit mse3(v, 0.5) 1.58 ms ± 734 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [8]: %timeit mse(m, 0.5) 67 ms ± 44.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [9]: %timeit mse2(m, 0.5) 24.7 ms ± 37.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [10]: %timeit mse3(m, 0.5) 15.1 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [11]: np.equal(mse(v, 0.5), mse2(v, 0.5)) Out[11]: True In [12]: np.equal(mse(v, 0.5), mse3(v, 0.5)) Out[12]: True In [13]: np.all(np.equal(mse(m, 0.5), mse3(m, 0.5))) Out[13]: True In [14]: np.all(np.equal(mse(m, 0.5), mse2(m, 0.5))) Out[14]: True In [15]: def mse4(x, xhat): ...: """ Calcualte mse between vector or matrix x and xhat """ ...: buf_ = x - xhat ...: np.square(buf_, out=buf_) ...: sum_ = np.sum(buf_) ...: sum_ /= x.size ...: return sum_ ...: ...: In [16]: %timeit mse4(m, 0.5) 14.4 ms ± 9.14 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [17]: %timeit mse4(v, 0.5) 1.6 ms ± 392 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [18]: %timeit mse4(v, 0.5) 1.59 ms ± 430 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` 2. Routine `get_rotation_matrix` was computing composition of 3 rotations in the YZ plane through dot product of 3x3 matrices. Since it is equivalent to rotation in YZ plane though the sum of 3 angles, construct the rotation matrix directly. If the routine was intended to construct rotation matrix by Euler angles, it did not do it right. I left the old code commented out. 3. Modified other routines to utilize intermediate buffers to speed them up.
Less copying, use of broadcast_to to avoid creation of intermediate arrays. Improved efficiency of `theta_from_eigendecomp` and `x_from_eigendecom` by means of efficient use of broadcasting in numpy
ping. |
Hi thank you very much for the contribution! I am reviewing it now, will get back asap |
Very valuable contribution, learned a few tricks while reading this! If you could just remove the few lines of code that you commented out to replace them with something more efficient, that'd be great. I added notes for the two places where the comments should be left. Also sorry for the late answer, I was not notified about this pull request (thank you @fakufaku for the notice). |
pylocus/basics.py
Outdated
# cz, sz = np.cos(theta_z), np.sin(theta_z) | ||
# Rz = np.array([[1, 0, 0], [0, cz, sz], [0, -sz, cz]]) | ||
# return Rx.dot(Ry.dot(Rz)) | ||
# N.B.: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. All above "N.B.:" can be deleted
@@ -62,20 +72,21 @@ def eigendecomp(G, d): | |||
N = G.shape[0] | |||
lamda, u = np.linalg.eig(G) | |||
# test decomposition of G. | |||
G_hat = np.dot(np.dot(u, np.diag(lamda)), u.T) | |||
#G_hat = np.dot(np.dot(u, np.diag(lamda)), u.T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave this comment (for later testing)
if (theta): | ||
return theta_from_eigendecomp(factor, u) | ||
else: | ||
return x_from_eigendecomp(factor, u, dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice find, didn't notice this ugly duplicate code.
Avoided creation of unneeded copy of `x_est` in mds.py by performing subtraction of its minimum in place.
Thank you for the contrubution, oleksandr-pavlyk. Changes are now merged |
I just stumbled into this project, and notices that functions in
basics
could be sped-up a bit with more efficient use of NumPy. For examplemse
on vectors is 9X faster, and 5X faster on matrices.Also sped-up code in MDS.
Please let me know what you think