Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed improvements #18

Merged
merged 5 commits into from
May 7, 2018
Merged

Speed improvements #18

merged 5 commits into from
May 7, 2018

Conversation

oleksandr-pavlyk
Copy link
Contributor

I just stumbled into this project, and notices that functions in basics could be sped-up a bit with more efficient use of NumPy. For example mse on vectors is 9X faster, and 5X faster on matrices.

Also sped-up code in MDS.

Please let me know what you think

1. Achived 5X speed improvement in `mse` by using `np.square`
   rather than `np.power(x, 2)`, and by reusing intermediate buffers
   via use of universal function's out= keyword.

```
In [2]: # %load mse.py
   ...: import numpy as np
   ...:
   ...: def mse(x, xhat):
   ...:     """ Calcualte mse between vector or matrix x and xhat """
   ...:     sum_ = np.sum(np.power(x - xhat, 2))
   ...:     return sum_ / x.size
   ...:
   ...: def mse2(x, xhat):
   ...:     """ Calculate mse between vector or matrix x and xhar """
   ...:     sum_ = np.sum(np.square(x - xhat))
   ...:     return sum_ / x.size
   ...:
   ...: def mse3(x, xhat):
   ...:     """ Calcualte mse between vector or matrix x and xhat """
   ...:     sum_ = x - xhat
   ...:     np.square(sum_, out=sum_)
   ...:     return np.sum(sum_) / x.size
   ...:
   ...:

In [3]: v = np.random.uniform(0,1, size=10**6)

In [4]: m = np.random.uniform(0,1, size=(2048, 2048))

In [5]: %timeit mse(v, 0.5)
14.7 ms ± 6.88 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [6]: %timeit mse2(v, 0.5)
4.4 ms ± 2.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [7]: %timeit mse3(v, 0.5)
1.58 ms ± 734 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [8]: %timeit mse(m, 0.5)
67 ms ± 44.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [9]: %timeit mse2(m, 0.5)
24.7 ms ± 37.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [10]: %timeit mse3(m, 0.5)
15.1 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [11]: np.equal(mse(v, 0.5), mse2(v, 0.5))
Out[11]: True

In [12]: np.equal(mse(v, 0.5), mse3(v, 0.5))
Out[12]: True

In [13]: np.all(np.equal(mse(m, 0.5), mse3(m, 0.5)))
Out[13]: True

In [14]: np.all(np.equal(mse(m, 0.5), mse2(m, 0.5)))
Out[14]: True

In [15]: def mse4(x, xhat):
    ...:     """ Calcualte mse between vector or matrix x and xhat """
    ...:     buf_ = x - xhat
    ...:     np.square(buf_, out=buf_)
    ...:     sum_ = np.sum(buf_)
    ...:     sum_ /= x.size
    ...:     return sum_
    ...:
    ...:

In [16]: %timeit mse4(m, 0.5)
14.4 ms ± 9.14 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [17]: %timeit mse4(v, 0.5)
1.6 ms ± 392 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [18]: %timeit mse4(v, 0.5)
1.59 ms ± 430 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

2. Routine `get_rotation_matrix` was computing composition of 3 rotations
   in the YZ plane through dot product of 3x3 matrices.

   Since it is equivalent to rotation in YZ plane though the sum of 3
   angles, construct the rotation matrix directly.

   If the routine was intended to construct rotation matrix by Euler angles,
   it did not do it right.

   I left the old code commented out.

3. Modified other routines to utilize intermediate buffers to speed them up.
Less copying, use of broadcast_to to avoid creation of intermediate
arrays.

Improved efficiency of `theta_from_eigendecomp` and `x_from_eigendecom`
by means of efficient use of broadcasting in numpy
@oleksandr-pavlyk
Copy link
Contributor Author

ping.

@duembgen
Copy link
Collaborator

duembgen commented May 2, 2018

Hi thank you very much for the contribution! I am reviewing it now, will get back asap

@duembgen
Copy link
Collaborator

duembgen commented May 2, 2018

Very valuable contribution, learned a few tricks while reading this! If you could just remove the few lines of code that you commented out to replace them with something more efficient, that'd be great. I added notes for the two places where the comments should be left.

Also sorry for the late answer, I was not notified about this pull request (thank you @fakufaku for the notice).

# cz, sz = np.cos(theta_z), np.sin(theta_z)
# Rz = np.array([[1, 0, 0], [0, cz, sz], [0, -sz, cz]])
# return Rx.dot(Ry.dot(Rz))
# N.B.:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. All above "N.B.:" can be deleted

@@ -62,20 +72,21 @@ def eigendecomp(G, d):
N = G.shape[0]
lamda, u = np.linalg.eig(G)
# test decomposition of G.
G_hat = np.dot(np.dot(u, np.diag(lamda)), u.T)
#G_hat = np.dot(np.dot(u, np.diag(lamda)), u.T)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave this comment (for later testing)

if (theta):
return theta_from_eigendecomp(factor, u)
else:
return x_from_eigendecomp(factor, u, dim)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find, didn't notice this ugly duplicate code.

Avoided creation of unneeded copy of `x_est` in mds.py by
performing subtraction of its minimum in place.
@duembgen duembgen merged commit f08187a into LCAV:master May 7, 2018
@duembgen
Copy link
Collaborator

duembgen commented May 7, 2018

Thank you for the contrubution, oleksandr-pavlyk. Changes are now merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants