Benchmarking experiments for rotation compositions

Aditya Bharti edited this page Jun 11, 2018 · 6 revisions

Benchmarking for rotation composition approaches

This table contains the benchmarks for the rotation composition process. There are 2 cases which we will support:

  • N rotations composed with N rotations. This case is straightforward as converting to dcm will only incur extra computation cost. The already implemented compose_quat function will suffice.
  • N rotations composed with a single rotation. For this we have 2 options:
    • Broadcast the single quaternion to N x 4 and use compose_quat
    • Prevent unnecessary copying by converting the single quaternion to appropriate 4 x 4 matrix and use np.dot() for a matrix vector multiplication.

Replies

  • Updated quaternion benchmarks without literate broadcast
  • Updated matrix approach to handle general case of N by N rotations
  • Will try np.einsum and np.matmul for composition of N by N but using np.matmul for the final implementation is probably not a good idea since it is not supported by numpy v1.8, and is still a preliminary function. (EDIT: given the current implementation where the ith row is being multiplied with the ith slice, I don't think np.matmul can be used)

Here is the updated quaternion approach:

def _compose_quat(p, q):
    product = np.empty_like(p)
    # Scalar part of result
    product[:, 3] = p[:, 3] * q[:, 3] - np.sum(p[:, :3] * q[:, :3], axis=1)
    # Vector part of result
    product[:, :3] = (p[:, None, 3] * q[:, :3] + q[:, None, 3] * p[:, :3] +
                      np.cross(p[:, :3], q[:, :3]))
    return product

def quat_composition(P, Q):
    return R.from_quaternion(_compose_quat(P._quat, Q._quat), normalized=True)

And here is the updated matrix approach to handle the general case of N by N compositions.

def _rep_mat(q):
    if q._single:
        num_quat = 1
        q = q[None, :]
    else:
        num_quat = q.shape[0]
    """
    (x, y, z, w) = Q.as_quaternion()
    q_mat = np.array([
        [w, -z, y, -x],
        [z, w, -x, -y],
        [-y, x, w, -z],
        [x,  y, z,  w],
    ])
    """
    q_mat = np.empty((num_quat, 4, 4))
    q_mat[:, 0, 0] = q_mat[:, 1, 1] = q_mat[:, 2, 2] = q_mat[:, 3, 3] = q[:, 3]  # w
    
    q_mat[:, 3, 0] = q_mat[:, 2, 1] = q[:, 0]  # x
    q_mat[:, 1, 2] = q_mat[:, 0, 3] = -q[:, 0]  # -x
    
    q_mat[:, 0, 2] = q_mat[:, 3, 1] = q[:, 1]  # y
    q_mat[:, 2, 0] = q_mat[:, 1, 3] = -q[:, 1]  # -y
    
    q_mat[:, 1, 0] = q_mat[:, 3, 2] = q[:, 2]  # z
    q_mat[:, 0, 1] = q_mat[:, 2, 3] = -q[:, 2]  # -z
    
    return q_mat

def mat_composition(P, Q, dot=False):  # dot = True
    p = P.as_quaternion()
    q_mat = _rep_mat(Q.as_quaternion())
    if q_mat.shape[0] != 1:
        # Multiply ith row of p with ith slice of q_mat
        result = np.einsum('...ij,...ijk->...ik', p, q_mat)
    else:
        if dot:
            result = np.dot(p, q_mat[0])
        else:
            result = np.einsum('...ij,...jk->...ik', p, q_mat[0])
    return R.from_quaternion(result, normalized=True)

Here are the benchmarks for the n by 1 case. They were generated with the following code:

q = R.from_quaternion(np.random.normal(size=(4,)))
n = 2
p = R.from_quaternion(np.random.normal(size=(n, 4)))
%timeit quat_composition(p, q)
%timeit mat_composition(p, q)
n_p n_q Quaternion Matrix (np.einsum) Matrix (np.dot)
2 1 79.8 µs ± 2.87 µs 17.8 µs ± 150 ns 15 µs ± 328 ns
6 1 81.9 µs ± 1.82 µs 18.4 µs ± 229 ns 15.2 µs ± 616 ns
10 1 79.6 µs ± 1.53 µs 19 µs ± 631 ns 15.6 µs ± 521 ns
50 1 88.6 µs ± 1.24 µs 20.9 µs ± 486 ns 16.5 µs ± 770 ns
100 1 87.7 µs ± 778 ns 22.2 µs ± 828 ns 17.2 µs ± 685 ns
1000 1 165 µs ± 2.05 µs 53.8 µs ± 1.6 µs 27.3 µs ± 938 ns
10000 1 882 µs ± 21 µs 336 µs ± 12 µs 113 µs ± 1.36 µs
100000 1 15.1 ms ± 649 µs 3.39 ms ± 37.8 µs 1.25 ms ± 171 µs
1000000 1 150 ms ± 6.89 ms 39.9 ms ± 2.08 ms 19.8 ms ± 1.55 ms

RESULT: By removing the literate broadcast, we do shave off a few microseconds but overall np.dot() approach is still better.

Here are the timings for the multiple compositions case

n = 2
p = R.from_quaternion(np.random.normal(size=(n, 4)))
q = R.from_quaternion(np.random.normal(size=(n, 4)))
%timeit quat_composition(p, q)
%timeit mat_composition(p, q)
n_p n_q Quaternion Matrix (np.einsum)
2 2 82.1 µs ± 5.4 µs 17.8 µs ± 425 ns
6 6 83.7 µs ± 3.37 µs 18.7 µs ± 569 ns
10 10 80.1 µs ± 3.03 µs 19.1 µs ± 603 ns
50 50 86.2 µs ± 2.25 µs 21 µs ± 501 ns
100 100 92.5 µs ± 5.75 µs 25.4 µs ± 1.41 µs
1000 1000 179 µs ± 6.81 µs 97.9 µs ± 769 ns
10000 10000 958 µs ± 5.25 µs 1.01 ms ± 18.3 µs
100000 100000 16.5 ms ± 893 µs 31.4 ms ± 2.26 ms
1000000 1000000 177 ms ± 2.85 ms 351 ms ± 3.15 ms

[OLD]

Comments from a mentor

  • If it's not too much trouble I suggest you to try conversion and np.einsum (compare with np.matmul as well) when we compose N and N rotations.
  • I think doing literate broadcast of 1 quaternion is not a great idea. Try implementation without it, I guess that np.cross can do smart broadcasting inside.

Here I have benchmarked the second case only. Here is the quaternion approach

def quat_composition(P, Q):
    # p contains N rotations
    # q contains a single rotation: q._single == True
    # (Handle 2D single quaternion later)
    p = P.as_quaternion()
    q = np.broadcast_to(Q.as_quaternion(), p.shape)
    return R.from_quaternion(compose_quat(p, q), normalized=True)

def compose_quat(p, q):
    # p and q should have same shape (N, 4)
    product = np.empty_like(p)
    # Scalar part of result
    product[:, 3] = p[:, 3] * q[:, 3] - np.sum(p[:, :3] * q[:, :3], axis=1)
    # Vector part of result
    product[:, :3] = (p[:, None, 3] * q[:, :3] + q[:, None, 3] * p[:, :3] +
                      np.cross(p[:, :3], q[:, :3]))
    return product

and here is the matrix composition approach

def mat_composition(P, Q):
    # p contains N rotations
    # q contains a single rotation: q._single == True
    # (Handle 2D single quaternion later)
    p = P.as_quaternion()
    (x, y, z, w) = Q.as_quaternion()
    q_mat = np.array([
        [w, -z, y, -x],
        [z, w, -x, -y],
        [-y, x, w, -z],
        [x, y, z, w],
    ])
    return R.from_quaternion(np.dot(p, q_mat), normalized=True)

The functions were times using the %timeit command in a jupyter notebook:

q = R.from_quaternion(np.random.normal(size=(4,)))
n = 2  # Variable for testing
p = R.from_quaternion(np.random.normal(size=(n, 4)))
%timeit quat_composition(p, q)
%timeit mat_composition(p, q)

Here are the times:

n Quaternion Matrix
2 95.8 µs ± 3.7 µs 8.67 µs ± 149 ns
6 90.9 µs ± 3.1 µs 8.75 µs ± 178 ns
10 89.1 µs ± 2.32 µs 9.02 µs ± 236 ns
50 97.6 µs ± 1.56 µs 10 µs ± 161 ns
100 99.1 µs ± 4.72 µs 10.3 µs ± 106 ns
1000 176 µs ± 2.35 µs 19.9 µs ± 138 ns
10000 932 µs ± 37.6 µs 108 µs ± 408 ns

At least for multiple rotations composed with a single one, the matrix approach appears to be better.

Comments from a mentor

From the looks of it the Matrix approach is consistently and considerably better. Although you should eliminate the literate broadcasting of quaternions and check again.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.