Skip to content

Commit

Permalink
perf: Tile 8×8 covariance matrix multiplication (#1181)
Browse files Browse the repository at this point in the history
Currently, we are multiplying an 8x8 covariance matrix with an 8x8 transport matrix, and we see that Eigen is failing to optimize this properly, because it is calling a generalized GEMM method rather than an optimized small matrix method. In order to resolve this, we change the code to use a tiled multiplication method which splits the matrices into 4x4 sub-matrices which can be multiplied and added to achieve the desired effect. This has two advantages:

  1. It allows Eigen to use its hand-rolled optimized 4x4 matrix multiplication methods.
  2. It allows us to perform some trickery with matrix identities to reduce the number of floating point operations.


Co-authored-by: Andreas Stefl <487211+andiwand@users.noreply.github.com>
  • Loading branch information
stephenswat and andiwand committed Mar 4, 2024
1 parent a1b40bc commit 6182aef
Show file tree
Hide file tree
Showing 2 changed files with 57 additions and 24 deletions.
37 changes: 35 additions & 2 deletions Core/include/Acts/Propagator/EigenStepper.ipp
Original file line number Diff line number Diff line change
Expand Up @@ -233,8 +233,41 @@ Acts::Result<double> Acts::EigenStepper<E, A>::step(
return EigenStepperError::StepInvalid;
}

// for moment, only update the transport part
state.stepping.jacTransport = D * state.stepping.jacTransport;
// See the documentation of Acts::blockedMult for a description of blocked
// matrix multiplication. However, we can go one further. Let's assume that
// some of these sub-matrices are zero matrices 0₈ and identity matrices
// I₈, namely:
//
// D₁₁ = I₈, J₁₁ = I₈, D₂₁ = 0₈, J₂₁ = 0₈
//
// Which gives:
//
// K₁₁ = I₈ * I₈ + D₁₂ * 0₈ = I₈
// K₁₂ = I₈ * J₁₂ + D₁₂ * J₂₂ = J₁₂ + D₁₂ * J₂₂
// K₂₁ = 0₈ * I₈ + D₂₂ * 0₈ = 0₈
// K₂₂ = 0₈ * J₁₂ + D₂₂ * J₂₂ = D₂₂ * J₂₂
//
// Furthermore, we're constructing K in place of J, and since
// K₁₁ = I₈ = J₁₁ and K₂₁ = 0₈ = D₂₁, we don't actually need to touch those
// sub-matrices at all!
if ((D.topLeftCorner<4, 4>().isIdentity()) &&
(D.bottomLeftCorner<4, 4>().isZero()) &&
(state.stepping.jacTransport.template topLeftCorner<4, 4>()
.isIdentity()) &&
(state.stepping.jacTransport.template bottomLeftCorner<4, 4>()
.isZero())) {
state.stepping.jacTransport.template topRightCorner<4, 4>() +=
D.topRightCorner<4, 4>() *
state.stepping.jacTransport.template bottomRightCorner<4, 4>();
state.stepping.jacTransport.template bottomRightCorner<4, 4>() =
(D.bottomRightCorner<4, 4>() *
state.stepping.jacTransport.template bottomRightCorner<4, 4>())
.eval();
} else {
// For safety purposes, we provide a full matrix multiplication as a
// backup strategy.
state.stepping.jacTransport = D * state.stepping.jacTransport;
}
} else {
if (!state.stepping.extension.finalize(state, *this, navigator, h)) {
return EigenStepperError::StepInvalid;
Expand Down
44 changes: 22 additions & 22 deletions Examples/Python/tests/root_file_hashes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,9 @@ test_truth_tracking_kalman[odd-0.0]__performance_track_finder.root: 39aec6316cce
test_truth_tracking_kalman[odd-1000.0]__trackstates_fitter.root: 72c79be1458c4f9c9a1661778c900f0875d257f2d391c4183a698825448919a1
test_truth_tracking_kalman[odd-1000.0]__tracksummary_fitter.root: 3d424dec9b172f253c8c4ffbda470f678fd1081a3d36dcfea517ab0f94995ae4
test_truth_tracking_kalman[odd-1000.0]__performance_track_finder.root: 39aec6316cceb90e314e16b02947faa691c18f57c3a851a25e547a8fc05a4593
test_truth_tracking_gsf[generic]__trackstates_gsf.root: b1ff88d67cb89cc76850e242aa5ad3bcfd041bf5e4a62e614a878bb974967242
test_truth_tracking_gsf[generic]__trackstates_gsf.root: e971da3dcc1997bc9f8a74e0f75a46d4b92c9cf6286b0e05dc8a1bd34df75444
test_truth_tracking_gsf[generic]__tracksummary_gsf.root: ec3ef893f7138392bcdce96322e2dd31951737271b72ac9dd3282fc187700d03
test_truth_tracking_gsf[odd]__trackstates_gsf.root: 4117e585b59e5332bc54d1928096d8d2477a9f5b9cacb2f010b4df25106ad768
test_truth_tracking_gsf[odd]__trackstates_gsf.root: c28b23358cf88fb40eeb3c47db9af62462cc7cdd9d70dec13d0615f136065191
test_truth_tracking_gsf[odd]__tracksummary_gsf.root: 3425c0974943e0dac06dfedceaf1e9e3bec3bdc6ff73588d849a234de017e230
test_particle_gun__particles.root: 7eec62018b6944fea565dad75aa41ef87d1f2737b2a814fbab189817ac8180fe
test_material_mapping__material-map_tracks.root: 6e1441c418ff0b17983c2d0483248cc1dee6b77b09d0ca9d03c742c9d1373630
Expand All @@ -44,22 +44,22 @@ test_digitization_example_input[smeared]__particles.root: 7eec62018b6944fea565da
test_digitization_example_input[smeared]__measurements.root: 0c168d371d0130c68d1ee44bd77eeeb3cf702a77c2afbf12bed8354b61a29262
test_digitization_example_input[geometric]__particles.root: 7eec62018b6944fea565dad75aa41ef87d1f2737b2a814fbab189817ac8180fe
test_digitization_example_input[geometric]__measurements.root: 0c6d88b4de3ee7365103b8f0d6be6b4db3d7b7f2a59d3db58a1e5f89fa8130b3
test_ckf_tracks_example[generic-full_seeding]__trackstates_ckf.root: aa727b8c2c30a3d2ea155917abad1a77071da16e361c6ada500405839b697310
test_ckf_tracks_example[generic-full_seeding]__tracksummary_ckf.root: e122dd3afe20a1ebf870dab5a42ea154c3b5eedc13c60e2e99efc6bb8a417c21
test_ckf_tracks_example[generic-full_seeding]__trackstates_ckf.root: 52fe86b966e4bfe05039d17025d15e360f25ef5041abe6729e02e8e64e0d2fb2
test_ckf_tracks_example[generic-full_seeding]__tracksummary_ckf.root: 29d861eb98477288adebf347be91aeed63942351bfba59cf504cda4ebf6ced78
test_ckf_tracks_example[generic-full_seeding]__performance_seeding_trees.root: 0e0676ffafdb27112fbda50d1cf627859fa745760f98073261dcf6db3f2f991e
test_ckf_tracks_example[generic-truth_estimated]__trackstates_ckf.root: 50eb0539a059b5e7c51ad4953629e82d8b08159d179181ca39570f910be47168
test_ckf_tracks_example[generic-truth_estimated]__tracksummary_ckf.root: a474c6fbd5af9d43d973783288f980984df694a8eb8dd3ebe9e5c8df4bbd7924
test_ckf_tracks_example[generic-truth_estimated]__trackstates_ckf.root: 40e52eb4b31997c6d2c41e13d10cddf2921224f5bcb1f58a16b47ea79b973234
test_ckf_tracks_example[generic-truth_estimated]__tracksummary_ckf.root: 09a470d45be79b33910e81a39400d7bf617d05ddc33fc85ad3a6043597813143
test_ckf_tracks_example[generic-truth_estimated]__performance_seeding.root: 1facb05c066221f6361b61f015cdf0918e94d9f3fce2269ec7b6a4dffeb2bc7e
test_ckf_tracks_example[generic-truth_smeared]__trackstates_ckf.root: 9c7dc2b10e4d52a565e2ecd72077c4c90862a0c6ef3b4bad0b5dbfb50d8b91df
test_ckf_tracks_example[generic-truth_smeared]__tracksummary_ckf.root: 987c47fc59851e6307715ceea261aa650aef352876febdd228778c4d030a8653
test_ckf_tracks_example[odd-full_seeding]__trackstates_ckf.root: 1340a741753a4a1d6fed0de1867282e055e76fc5762655698d50c9978ba87219
test_ckf_tracks_example[odd-full_seeding]__tracksummary_ckf.root: a033bf22d4cc4c715fb948610151fcfeddc601a6ce427d21d4885fd7c1c971d4
test_ckf_tracks_example[generic-truth_smeared]__trackstates_ckf.root: a94c3f7d2495f6bdb4259e051d35dc020ed81f42a98ecae4dec4513bc99aad09
test_ckf_tracks_example[generic-truth_smeared]__tracksummary_ckf.root: ee5714ee81d5f3f6cf87e1fe792bfdd633566bab96fa82f2d65843193c463789
test_ckf_tracks_example[odd-full_seeding]__trackstates_ckf.root: 7149db7a21da9dd51ccb11bf58002bcd3201989fef1483946ac4adeca42391ab
test_ckf_tracks_example[odd-full_seeding]__tracksummary_ckf.root: 50e44670cc6691adf1a127e92bc79568bfb96cf272aafab473a990f88373a0db
test_ckf_tracks_example[odd-full_seeding]__performance_seeding_trees.root: 43c58577aafe07645e5660c4f43904efadf91d8cda45c5c04c248bbe0f59814f
test_ckf_tracks_example[odd-truth_estimated]__trackstates_ckf.root: 991050c7e3a2bf83f129e5725eac5a8e00a21c162199b29b69db38405fea0a34
test_ckf_tracks_example[odd-truth_estimated]__tracksummary_ckf.root: 5590db074051ab5726fcc2d30d9391f61d9beee2083a51375540d2cf9d38211c
test_ckf_tracks_example[odd-truth_estimated]__trackstates_ckf.root: ca9be15cf58f36a4e43142dc093471bddf9ff115cd2465f51e8c3ed23cf2bc67
test_ckf_tracks_example[odd-truth_estimated]__tracksummary_ckf.root: c69d5b9ba378d162df5726c2abc5c7159ef15de8c33e18575ace15cd66cfe2d2
test_ckf_tracks_example[odd-truth_estimated]__performance_seeding.root: 1a36b7017e59f1c08602ef3c2cb0483c51df248f112e3780c66594110719c575
test_ckf_tracks_example[odd-truth_smeared]__trackstates_ckf.root: 3ac3b12d50c2882d5e03fadb82884b1897dddda6366cece6a3ac433c4e156d25
test_ckf_tracks_example[odd-truth_smeared]__tracksummary_ckf.root: 3ec2e3dc3dc6dc2ed4dc112d668256f6ee0672c8e04baf7895ae473ad78b5870
test_ckf_tracks_example[odd-truth_smeared]__trackstates_ckf.root: 03fdef333c5451ccecc129c6325367d7145f34a0788a6bbad148c844f3c3a835
test_ckf_tracks_example[odd-truth_smeared]__tracksummary_ckf.root: 0aeb9a337af75bfb25509cb18e585e1567c185d8b16c4fbd8ff2eeec16cfdf7d
test_vertex_fitting_reading[Truth-False-100]__performance_vertexing.root: 76ef6084d758dfdfc0151ddec2170e12d73394424e3dac4ffe46f0f339ec8293
test_vertex_fitting_reading[Iterative-False-100]__performance_vertexing.root: 60372210c830a04f95ceb78c6c68a9b0de217746ff59e8e73053750c837b57eb
test_vertex_fitting_reading[Iterative-True-100]__performance_vertexing.root: e34f217d524a5051dbb04a811d3407df3ebe2cc4bb7f54f6bda0847dbd7b52c3
Expand All @@ -85,19 +85,19 @@ test_exatrkx[cpu-torch]__performance_track_finding.root: e0875db5eb3aa6b46ad4baf
test_exatrkx[gpu-onnx]__performance_track_finding.root: 4845dc9f62e287e20c80d479e246eb69d01b279964cfdff83543bea0ea9afbed
test_exatrkx[gpu-torch]__performance_track_finding.root: e0875db5eb3aa6b46ad4baf846d39b37b68f1efd3436448cc75919b637f8d8d9
test_ML_Ambiguity_Solver__performance_ambiML.root: 284ff5c3a08c0b810938e4ac2f8ba8fe2babb17d4c202b624ed69fff731a9006
test_truth_tracking_kalman[generic-False-0.0]__trackstates_fitter.root: b598a07612c58866563ae862e651d9f42c13249d83d7176a103a565fa60f9cb3
test_truth_tracking_kalman[generic-False-0.0]__tracksummary_fitter.root: 11b2e2a50343c636fa977175a30220805412d3200e164ae4c3f439fe2087fb88
test_truth_tracking_kalman[generic-False-0.0]__trackstates_fitter.root: f258c7ae1cd64efd48b4840d19890f08afdc02a692ea392d050cdb818b6d70c2
test_truth_tracking_kalman[generic-False-0.0]__tracksummary_fitter.root: 385a763e16f55d20570c5bfc15f51c87e39f4ca8e5b012a3333ce59a91cd5d34
test_truth_tracking_kalman[generic-False-1000.0]__trackstates_fitter.root: bdc70d86ba6e717307e32c147401dce80f76dc9f544e7ed0f26ebf2f9e5c3057
test_truth_tracking_kalman[generic-False-1000.0]__tracksummary_fitter.root: fc82abfc4e3016cda806e743a270bf78b6d4cc404cd52145ea1eabed85d32feb
test_truth_tracking_kalman[generic-True-0.0]__trackstates_fitter.root: b598a07612c58866563ae862e651d9f42c13249d83d7176a103a565fa60f9cb3
test_truth_tracking_kalman[generic-True-0.0]__tracksummary_fitter.root: 11b2e2a50343c636fa977175a30220805412d3200e164ae4c3f439fe2087fb88
test_truth_tracking_kalman[generic-True-0.0]__trackstates_fitter.root: f258c7ae1cd64efd48b4840d19890f08afdc02a692ea392d050cdb818b6d70c2
test_truth_tracking_kalman[generic-True-0.0]__tracksummary_fitter.root: 385a763e16f55d20570c5bfc15f51c87e39f4ca8e5b012a3333ce59a91cd5d34
test_truth_tracking_kalman[generic-True-1000.0]__trackstates_fitter.root: bdc70d86ba6e717307e32c147401dce80f76dc9f544e7ed0f26ebf2f9e5c3057
test_truth_tracking_kalman[generic-True-1000.0]__tracksummary_fitter.root: fc82abfc4e3016cda806e743a270bf78b6d4cc404cd52145ea1eabed85d32feb
test_truth_tracking_kalman[odd-False-0.0]__trackstates_fitter.root: b9d96ea97b967c4fb40edc93b00fc307b80fc509309625c8e2ccc136c3e1a40a
test_truth_tracking_kalman[odd-False-0.0]__tracksummary_fitter.root: f42d1cd850b78909b07cd6e412ae03c6e1ea8c99a02000892dd957319f0a825c
test_truth_tracking_kalman[odd-False-0.0]__trackstates_fitter.root: d0a9203c917be9caacd1c6c4244538e2038e2472b4707b61bb5c0bb1c5e2a3ad
test_truth_tracking_kalman[odd-False-0.0]__tracksummary_fitter.root: a5af8190f5981e7fa2556f20027af32b383bc07259927a58116d2676e5971dbe
test_truth_tracking_kalman[odd-False-1000.0]__trackstates_fitter.root: b858d09a75e852a7c23f2f3a512de8c334c593e700319d4f935e844ec00c379f
test_truth_tracking_kalman[odd-False-1000.0]__tracksummary_fitter.root: 3d424dec9b172f253c8c4ffbda470f678fd1081a3d36dcfea517ab0f94995ae4
test_truth_tracking_kalman[odd-True-0.0]__trackstates_fitter.root: b9d96ea97b967c4fb40edc93b00fc307b80fc509309625c8e2ccc136c3e1a40a
test_truth_tracking_kalman[odd-True-0.0]__tracksummary_fitter.root: f42d1cd850b78909b07cd6e412ae03c6e1ea8c99a02000892dd957319f0a825c
test_truth_tracking_kalman[odd-True-0.0]__trackstates_fitter.root: d0a9203c917be9caacd1c6c4244538e2038e2472b4707b61bb5c0bb1c5e2a3ad
test_truth_tracking_kalman[odd-True-0.0]__tracksummary_fitter.root: a5af8190f5981e7fa2556f20027af32b383bc07259927a58116d2676e5971dbe
test_truth_tracking_kalman[odd-True-1000.0]__trackstates_fitter.root: b858d09a75e852a7c23f2f3a512de8c334c593e700319d4f935e844ec00c379f
test_truth_tracking_kalman[odd-True-1000.0]__tracksummary_fitter.root: 3d424dec9b172f253c8c4ffbda470f678fd1081a3d36dcfea517ab0f94995ae4

0 comments on commit 6182aef

Please sign in to comment.