# Benchmarking experiments for rotation compositions

## Benchmarking for rotation composition approaches

This table contains the benchmarks for the rotation composition process. There are 2 cases which we will support:

• `N` rotations composed with `N` rotations. This case is straightforward as converting to dcm will only incur extra computation cost. The already implemented `compose_quat` function will suffice.
• `N` rotations composed with a single rotation. For this we have 2 options:
• Broadcast the single quaternion to `N x 4` and use `compose_quat`
• Prevent unnecessary copying by converting the single quaternion to appropriate `4 x 4` matrix and use `np.dot()` for a matrix vector multiplication.

### Replies

• Updated quaternion benchmarks without literate broadcast
• Updated matrix approach to handle general case of `N` by `N` rotations
• Will try `np.einsum` and `np.matmul` for composition of `N` by `N` but using `np.matmul` for the final implementation is probably not a good idea since it is not supported by numpy `v1.8`, and is still a preliminary function. (EDIT: given the current implementation where the `ith` row is being multiplied with the `ith` slice, I don't think `np.matmul` can be used)

Here is the updated quaternion approach:

```def _compose_quat(p, q):
product = np.empty_like(p)
# Scalar part of result
product[:, 3] = p[:, 3] * q[:, 3] - np.sum(p[:, :3] * q[:, :3], axis=1)
# Vector part of result
product[:, :3] = (p[:, None, 3] * q[:, :3] + q[:, None, 3] * p[:, :3] +
np.cross(p[:, :3], q[:, :3]))
return product

def quat_composition(P, Q):
return R.from_quaternion(_compose_quat(P._quat, Q._quat), normalized=True)```

And here is the updated matrix approach to handle the general case of `N by N` compositions.

```def _rep_mat(q):
if q._single:
num_quat = 1
q = q[None, :]
else:
num_quat = q.shape[0]
"""
(x, y, z, w) = Q.as_quaternion()
q_mat = np.array([
[w, -z, y, -x],
[z, w, -x, -y],
[-y, x, w, -z],
[x,  y, z,  w],
])
"""
q_mat = np.empty((num_quat, 4, 4))
q_mat[:, 0, 0] = q_mat[:, 1, 1] = q_mat[:, 2, 2] = q_mat[:, 3, 3] = q[:, 3]  # w

q_mat[:, 3, 0] = q_mat[:, 2, 1] = q[:, 0]  # x
q_mat[:, 1, 2] = q_mat[:, 0, 3] = -q[:, 0]  # -x

q_mat[:, 0, 2] = q_mat[:, 3, 1] = q[:, 1]  # y
q_mat[:, 2, 0] = q_mat[:, 1, 3] = -q[:, 1]  # -y

q_mat[:, 1, 0] = q_mat[:, 3, 2] = q[:, 2]  # z
q_mat[:, 0, 1] = q_mat[:, 2, 3] = -q[:, 2]  # -z

return q_mat

def mat_composition(P, Q, dot=False):  # dot = True
p = P.as_quaternion()
q_mat = _rep_mat(Q.as_quaternion())
if q_mat.shape[0] != 1:
# Multiply ith row of p with ith slice of q_mat
result = np.einsum('...ij,...ijk->...ik', p, q_mat)
else:
if dot:
result = np.dot(p, q_mat[0])
else:
result = np.einsum('...ij,...jk->...ik', p, q_mat[0])
return R.from_quaternion(result, normalized=True)```

Here are the benchmarks for the `n by 1` case. They were generated with the following code:

```q = R.from_quaternion(np.random.normal(size=(4,)))
n = 2
p = R.from_quaternion(np.random.normal(size=(n, 4)))
%timeit quat_composition(p, q)
%timeit mat_composition(p, q)```
n_p n_q Quaternion Matrix (`np.einsum`) Matrix (`np.dot`)
2 1 `79.8 µs ± 2.87 µs` `17.8 µs ± 150 ns` `15 µs ± 328 ns`
6 1 `81.9 µs ± 1.82 µs` `18.4 µs ± 229 ns` `15.2 µs ± 616 ns`
10 1 `79.6 µs ± 1.53 µs` `19 µs ± 631 ns` `15.6 µs ± 521 ns`
50 1 `88.6 µs ± 1.24 µs` `20.9 µs ± 486 ns` `16.5 µs ± 770 ns`
100 1 `87.7 µs ± 778 ns` `22.2 µs ± 828 ns` `17.2 µs ± 685 ns`
1000 1 `165 µs ± 2.05 µs` `53.8 µs ± 1.6 µs` `27.3 µs ± 938 ns`
10000 1 `882 µs ± 21 µs` `336 µs ± 12 µs` `113 µs ± 1.36 µs`
100000 1 `15.1 ms ± 649 µs` `3.39 ms ± 37.8 µs` `1.25 ms ± 171 µs`
1000000 1 `150 ms ± 6.89 ms` `39.9 ms ± 2.08 ms` `19.8 ms ± 1.55 ms`

RESULT: By removing the literate broadcast, we do shave off a few microseconds but overall `np.dot()` approach is still better.

Here are the timings for the multiple compositions case

```n = 2
p = R.from_quaternion(np.random.normal(size=(n, 4)))
q = R.from_quaternion(np.random.normal(size=(n, 4)))
%timeit quat_composition(p, q)
%timeit mat_composition(p, q)```
n_p n_q Quaternion Matrix (`np.einsum`)
2 2 `82.1 µs ± 5.4 µs` `17.8 µs ± 425 ns`
6 6 `83.7 µs ± 3.37 µs` `18.7 µs ± 569 ns`
10 10 `80.1 µs ± 3.03 µs` `19.1 µs ± 603 ns`
50 50 `86.2 µs ± 2.25 µs` `21 µs ± 501 ns`
100 100 `92.5 µs ± 5.75 µs` `25.4 µs ± 1.41 µs`
1000 1000 `179 µs ± 6.81 µs` `97.9 µs ± 769 ns`
10000 10000 `958 µs ± 5.25 µs` `1.01 ms ± 18.3 µs`
100000 100000 `16.5 ms ± 893 µs` `31.4 ms ± 2.26 ms`
1000000 1000000 `177 ms ± 2.85 ms` `351 ms ± 3.15 ms`

[OLD]

• If it's not too much trouble I suggest you to try conversion and `np.einsum` (compare with `np.matmul` as well) when we compose `N` and `N` rotations.
• I think doing literate broadcast of 1 quaternion is not a great idea. Try implementation without it, I guess that `np.cross` can do smart broadcasting inside.

Here I have benchmarked the second case only. Here is the `quaternion` approach

```def quat_composition(P, Q):
# p contains N rotations
# q contains a single rotation: q._single == True
# (Handle 2D single quaternion later)
p = P.as_quaternion()
return R.from_quaternion(compose_quat(p, q), normalized=True)

def compose_quat(p, q):
# p and q should have same shape (N, 4)
product = np.empty_like(p)
# Scalar part of result
product[:, 3] = p[:, 3] * q[:, 3] - np.sum(p[:, :3] * q[:, :3], axis=1)
# Vector part of result
product[:, :3] = (p[:, None, 3] * q[:, :3] + q[:, None, 3] * p[:, :3] +
np.cross(p[:, :3], q[:, :3]))
return product```

and here is the matrix composition approach

```def mat_composition(P, Q):
# p contains N rotations
# q contains a single rotation: q._single == True
# (Handle 2D single quaternion later)
p = P.as_quaternion()
(x, y, z, w) = Q.as_quaternion()
q_mat = np.array([
[w, -z, y, -x],
[z, w, -x, -y],
[-y, x, w, -z],
[x, y, z, w],
])
return R.from_quaternion(np.dot(p, q_mat), normalized=True)```

The functions were times using the `%timeit` command in a `jupyter notebook`:

```q = R.from_quaternion(np.random.normal(size=(4,)))
n = 2  # Variable for testing
p = R.from_quaternion(np.random.normal(size=(n, 4)))
%timeit quat_composition(p, q)
%timeit mat_composition(p, q)```

Here are the times:

n Quaternion Matrix
2 `95.8 µs ± 3.7 µs` `8.67 µs ± 149 ns`
6 `90.9 µs ± 3.1 µs` `8.75 µs ± 178 ns`
10 `89.1 µs ± 2.32 µs` `9.02 µs ± 236 ns`
50 `97.6 µs ± 1.56 µs` `10 µs ± 161 ns`
100 `99.1 µs ± 4.72 µs` `10.3 µs ± 106 ns`
1000 `176 µs ± 2.35 µs` `19.9 µs ± 138 ns`
10000 `932 µs ± 37.6 µs` `108 µs ± 408 ns`

At least for multiple rotations composed with a single one, the matrix approach appears to be better.