feat: Linear sum assignment by gcanat · Pull Request #85 · Smirkey/powerboxes

gcanat · 2026-03-28T23:18:45Z

I thought it would be cool to have some assignment functionality for when we need to perform matching between predictions and GTs for example, or assignment between current frame and previous frame ?

Anyway, I started with hungarian matching but it ended up being too slow when the number of boxes are above 1000. So I took inspiration from scipy.optimize.linear_sum_assignment to implement the shortest augmented path algorithm.
Side note/digression: it's funny how all these ML papers claim to use hungarian matching while in fact they just use linear_sum_assignement in their code, which is a different algorithm...

To speed-up the computation on dense cost-matrices, I asked claude to help me write a SIMD implementation (using the pulp crate). However, when the cost matrix is sparse, in the sense that most boxes dont overlap, the non-SIMD (aka scalar) function is much faster due to the SIMD overhead of having to move the data back and forth between memory and vector registers.

Also, I noticed that parallel_iou_distance_slice was missing, it was only implemented under the ndarray feature. So I modified this to be able to use it in lsap_iou_slice.

I did alot of testing to verify correctness and benchmarking against scipy but also lapjv and lap, and we are faster in many cases. When not faster, the speed is similar.

Using the _random_xyxy_boxes function from test_torch.py. We can generate a very dense case:

boxes1 = _random_xyxy_boxes(rng, 5000, 100)
boxes2 = _random_xyxy_boxes(rng, 5000, 100)

or very sparse case

boxes1 = _random_xyxy_boxes(rng, 5000, 1000)
boxes2 = _random_xyxy_boxes(rng, 5000, 1000)

And compare results of the different libs.
The bigger the image size, the more "sparse" is the cost matrix, ie more boxes dont overlap.

method	N boxes	image_size	duration (sec)
scipy	5000	2500	0.236
lapjv	5000	2500	5.474
lap	5000	2500	12.391
powerboxes	5000	2500	0.236
scipy	5000	300	1.027
lapjv	5000	300	1.125
lap	5000	300	2.458
powerboxes	5000	300	0.522
scipy	5000	100	1.831
lapjv	5000	100	2.167
lap	5000	100	6.128
powerboxes	5000	100	0.843

codecov · 2026-03-28T23:23:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.61%. Comparing base (db09f08) to head (888edee).
⚠️ Report is 34 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #85      +/-   ##
==========================================
- Coverage   98.77%   93.61%   -5.17%     
==========================================
  Files          17       15       -2     
  Lines        2703     3414     +711     
==========================================
+ Hits         2670     3196     +526     
- Misses         33      218     +185

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codspeed-hq · 2026-03-28T23:28:14Z

Merging this PR will improve performance by 22.57%

⚡ 15 improved benchmarks
❌ 4 regressed benchmarks
✅ 156 untouched benchmarks
⏩ 1 skipped benchmark¹

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
❌	`test_giou_distance[int16]`	224.5 µs	375.2 µs	-40.16%
❌	`test_giou_distance[uint16]`	239.9 µs	375.3 µs	-36.08%
⚡	`test_iou_distance[float32]`	230.4 µs	167.9 µs	+37.23%
⚡	`test_iou_distance[float64]`	232.6 µs	170 µs	+36.76%
❌	`test_parallel_giou_distance[int16]`	224.4 µs	375.2 µs	-40.18%
❌	`test_parallel_giou_distance[uint16]`	239.9 µs	375.2 µs	-36.07%
⚡	`test_parallel_iou_distance[int64]`	356.6 µs	293.7 µs	+21.43%
⚡	`test_parallel_iou_distance[uint16]`	393.2 µs	293.9 µs	+33.78%
⚡	`test_parallel_iou_distance[uint32]`	342.8 µs	280.5 µs	+22.2%
⚡	`test_parallel_iou_distance[uint64]`	363.1 µs	301.1 µs	+20.59%
⚡	`test_parallel_iou_distance[uint8]`	365 µs	296.5 µs	+23.1%
⚡	`test_rtree_nms_many_boxes[10000]`	18.7 ms	10 ms	+87.09%
⚡	`test_rtree_nms_many_boxes[1000]`	1,422.1 µs	842.1 µs	+68.86%
⚡	`test_rtree_nms_many_boxes[20000]`	41.4 ms	21.3 ms	+93.97%
⚡	`test_rtree_nms_many_boxes[5000]`	8.7 ms	4.6 ms	+87.88%
⚡	`test_rtree_rotated_nms_many_boxes[10000]`	22.7 ms	13.6 ms	+67.1%
⚡	`test_rtree_rotated_nms_many_boxes[1000]`	1.7 ms	1.1 ms	+49.8%
⚡	`test_rtree_rotated_nms_many_boxes[5000]`	10.6 ms	6.4 ms	+66.23%
⚡	`iou distance benchmark`	241.2 µs	195.5 µs	+23.35%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing gcanat:hungarian_matching (888edee) with main (bb94719)}

1 benchmark was skipped, so the baseline result was used instead. If it was deleted from the codebase, click here and archive it to remove it from the performance reports. ↩

Smirkey · 2026-05-25T10:51:33Z

hey
sorry was busy with a bunch of work related stuff ...
very nice speedup thanks !
im just applying a few patches but overall lgtm
i think having a lsap feature is indeed quite handy !

gcanat added 9 commits March 22, 2026 00:25

add parallel_iou_distance_slice

f4cf351

add hungarian_matching

096625f

hungarian_matching_iou python bindings

e2c8c48

remove dead code

98cfef5

replace kuhn_munkres with lsap

7db3735

add simd implementation

5cf3f8f

add simple tests to lsap and lsap_simd

d7c6728

small typos

b939add

formatting

dd46b93

more formatting

7f551d9

gcanat added 3 commits March 29, 2026 11:05

fix sparsity calculation

8447f7d

improve simd implementation

07784e7

lsap core functions as a separate lib

6abba95

Smirkey marked this pull request as ready for review May 25, 2026 10:50

Smirkey added 6 commits May 25, 2026 12:53

review fixes: docstring, error propagation, python tests

297ff80

fix: seed lsap_iou test, bound coords for uint8 cast

324ee39

fix: tighter wh bound in lsap_iou test to avoid u8 area overflow

f17b0ae

test: cover parallel_iou_distance_slice directly

76c7f46

fmt

debde19

test: cover parallel iou branch in lsap_iou_slice

888edee

Smirkey merged commit 6f1c650 into Smirkey:main May 25, 2026
17 of 18 checks passed

Smirkey mentioned this pull request May 25, 2026

perf: apply init-and-skip pattern to ciou and diou distance #88

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Linear sum assignment#85

feat: Linear sum assignment#85
Smirkey merged 19 commits into
Smirkey:mainfrom
gcanat:hungarian_matching

gcanat commented Mar 28, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

Smirkey commented May 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gcanat commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq Bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 22.57%

Performance Changes

Footnotes

Uh oh!

Smirkey commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gcanat commented Mar 28, 2026 •

edited

Loading

codecov Bot commented Mar 28, 2026 •

edited

Loading

codspeed-hq Bot commented Mar 28, 2026 •

edited

Loading

Smirkey commented May 25, 2026 •

edited

Loading