Skip to content

feat: Linear sum assignment#85

Merged
Smirkey merged 19 commits into
Smirkey:mainfrom
gcanat:hungarian_matching
May 25, 2026
Merged

feat: Linear sum assignment#85
Smirkey merged 19 commits into
Smirkey:mainfrom
gcanat:hungarian_matching

Conversation

@gcanat
Copy link
Copy Markdown
Collaborator

@gcanat gcanat commented Mar 28, 2026

I thought it would be cool to have some assignment functionality for when we need to perform matching between predictions and GTs for example, or assignment between current frame and previous frame ?

Anyway, I started with hungarian matching but it ended up being too slow when the number of boxes are above 1000. So I took inspiration from scipy.optimize.linear_sum_assignment to implement the shortest augmented path algorithm.
Side note/digression: it's funny how all these ML papers claim to use hungarian matching while in fact they just use linear_sum_assignement in their code, which is a different algorithm...

To speed-up the computation on dense cost-matrices, I asked claude to help me write a SIMD implementation (using the pulp crate). However, when the cost matrix is sparse, in the sense that most boxes dont overlap, the non-SIMD (aka scalar) function is much faster due to the SIMD overhead of having to move the data back and forth between memory and vector registers.

Also, I noticed that parallel_iou_distance_slice was missing, it was only implemented under the ndarray feature. So I modified this to be able to use it in lsap_iou_slice.

I did alot of testing to verify correctness and benchmarking against scipy but also lapjv and lap, and we are faster in many cases. When not faster, the speed is similar.

Using the _random_xyxy_boxes function from test_torch.py. We can generate a very dense case:

boxes1 = _random_xyxy_boxes(rng, 5000, 100)
boxes2 = _random_xyxy_boxes(rng, 5000, 100)

or very sparse case

boxes1 = _random_xyxy_boxes(rng, 5000, 1000)
boxes2 = _random_xyxy_boxes(rng, 5000, 1000)

And compare results of the different libs.
The bigger the image size, the more "sparse" is the cost matrix, ie more boxes dont overlap.

method N boxes image_size duration (sec)
scipy 5000 2500 0.236
lapjv 5000 2500 5.474
lap 5000 2500 12.391
powerboxes 5000 2500 0.236
scipy 5000 300 1.027
lapjv 5000 300 1.125
lap 5000 300 2.458
powerboxes 5000 300 0.522
scipy 5000 100 1.831
lapjv 5000 100 2.167
lap 5000 100 6.128
powerboxes 5000 100 0.843

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.61%. Comparing base (db09f08) to head (888edee).
⚠️ Report is 34 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #85      +/-   ##
==========================================
- Coverage   98.77%   93.61%   -5.17%     
==========================================
  Files          17       15       -2     
  Lines        2703     3414     +711     
==========================================
+ Hits         2670     3196     +526     
- Misses         33      218     +185     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Mar 28, 2026

Merging this PR will improve performance by 22.57%

⚡ 15 improved benchmarks
❌ 4 regressed benchmarks
✅ 156 untouched benchmarks
⏩ 1 skipped benchmark1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
test_giou_distance[int16] 224.5 µs 375.2 µs -40.16%
test_giou_distance[uint16] 239.9 µs 375.3 µs -36.08%
test_iou_distance[float32] 230.4 µs 167.9 µs +37.23%
test_iou_distance[float64] 232.6 µs 170 µs +36.76%
test_parallel_giou_distance[int16] 224.4 µs 375.2 µs -40.18%
test_parallel_giou_distance[uint16] 239.9 µs 375.2 µs -36.07%
test_parallel_iou_distance[int64] 356.6 µs 293.7 µs +21.43%
test_parallel_iou_distance[uint16] 393.2 µs 293.9 µs +33.78%
test_parallel_iou_distance[uint32] 342.8 µs 280.5 µs +22.2%
test_parallel_iou_distance[uint64] 363.1 µs 301.1 µs +20.59%
test_parallel_iou_distance[uint8] 365 µs 296.5 µs +23.1%
test_rtree_nms_many_boxes[10000] 18.7 ms 10 ms +87.09%
test_rtree_nms_many_boxes[1000] 1,422.1 µs 842.1 µs +68.86%
test_rtree_nms_many_boxes[20000] 41.4 ms 21.3 ms +93.97%
test_rtree_nms_many_boxes[5000] 8.7 ms 4.6 ms +87.88%
test_rtree_rotated_nms_many_boxes[10000] 22.7 ms 13.6 ms +67.1%
test_rtree_rotated_nms_many_boxes[1000] 1.7 ms 1.1 ms +49.8%
test_rtree_rotated_nms_many_boxes[5000] 10.6 ms 6.4 ms +66.23%
iou distance benchmark 241.2 µs 195.5 µs +23.35%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing gcanat:hungarian_matching (888edee) with main (bb94719)

Open in CodSpeed

Footnotes

  1. 1 benchmark was skipped, so the baseline result was used instead. If it was deleted from the codebase, click here and archive it to remove it from the performance reports.

@Smirkey Smirkey marked this pull request as ready for review May 25, 2026 10:50
@Smirkey
Copy link
Copy Markdown
Owner

Smirkey commented May 25, 2026

hey
sorry was busy with a bunch of work related stuff ...
very nice speedup thanks !
im just applying a few patches but overall lgtm
i think having a lsap feature is indeed quite handy !

@Smirkey Smirkey merged commit 6f1c650 into Smirkey:main May 25, 2026
17 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants