RSDK-10765 - Let mlvision resize to mlmodel's input size #5002

kharijarrett · 2025-05-21T18:34:56Z

Hey, so this fix comes from a bug Nick ran into. Basically if someone uses a custom mlmodel module with a set input size different from the original image, we mess up the bounding box. Like, any absolute bbox coordinates that were read in were "normalized" using the original image and not the resized one. I think we've been so far getting lucky either with proportional bbox tensors or mlmodels that have [1 -1 -1 3] (variable) size input.

Here we just fix it by checking if we resized the image and using the appropriate image dims for normalization.

Tested via Python SDK with Nick's funky ncnn-ml module (0.1.1-rc1) and with our classic EffDet mlmodel setup. Nick's module used to not work now it does! And EffDet didn't break!

HipsterBrown · 2025-05-21T18:41:06Z

services/vision/mlvision/detector.go

@@ -110,7 +112,7 @@ func attemptToBuildDetector(mlm mlmodel.Service,
 			return nil, err
 		}

-		boundingBoxes, err := ml.FormatDetectionOutputs(outNameMap, outMap, origW, origH, boxOrder, labels)
+		boundingBoxes, err := ml.FormatDetectionOutputs(outNameMap, outMap, origW, origH, boxOrder, labels, wasResized, resizeW, resizeH)


Since resizeW and resizeH could be set to the origW and origH, then just passing resizeW and resizeH in place of origW and origH in this method should fix the issue without any change to the method.

Suggested change

boundingBoxes, err := ml.FormatDetectionOutputs(outNameMap, outMap, origW, origH, boxOrder, labels, wasResized, resizeW, resizeH)

boundingBoxes, err := ml.FormatDetectionOutputs(outNameMap, outMap, resizeW, resizeH, boxOrder, labels)

bhaney

Create a unit test with a fake ML Model service that does this exact thing:

has a fixed input size which requires resize
outputs absolute bounding box coordinates relative to the resized image
Check that the bounding boxes from the vision service are what you expect

bhaney

Another thing -- public functions in the ml directory in RDK are used by ML on cloud inference, Do Not change these public functions, as it will introduce a breaking change.

Instead, feed in the resized dimensions into FormatDetectionOutputs's origW and origH

bhaney · 2025-05-21T18:47:50Z

ml/detections.go

@@ -21,7 +21,7 @@ const (

 // FormatDetectionOutputs formats the output tensors from a model into detections.
 func FormatDetectionOutputs(outNameMap *sync.Map, outMap Tensors, origW, origH int,
-	boxOrder []int, labels []string,
+	boxOrder []int, labels []string, wasResized bool, resizedW, resizedH int,


You cannot change the signature of this public function - I believe it used by ML @tahiyasalam

kharijarrett · 2025-05-21T19:53:47Z

Sure. Made the fix without changing the method. Still working on writing the test.

bhaney

Thanks a lot! LGTM!

kharijarrett requested a review from bhaney May 21, 2025 18:35

viambot added the safe to test This pull request is marked safe to test from a trusted zone label May 21, 2025

HipsterBrown reviewed May 21, 2025

View reviewed changes

bhaney requested changes May 21, 2025

View reviewed changes

kharijarrett added 2 commits May 21, 2025 15:57

send in resized for bbox formatting

89c26c3

requested changes

6a5ae1a

kharijarrett force-pushed the RSDK-10765 branch from f07b829 to 6a5ae1a Compare May 21, 2025 21:08

viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels May 21, 2025

fix test

cb6bc6c

viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels May 27, 2025

kharijarrett requested a review from bhaney May 27, 2025 19:30

bhaney approved these changes May 27, 2025

View reviewed changes

kharijarrett merged commit fea8c59 into viamrobotics:main May 27, 2025
18 checks passed

kharijarrett deleted the RSDK-10765 branch May 27, 2025 21:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RSDK-10765 - Let mlvision resize to mlmodel's input size #5002

RSDK-10765 - Let mlvision resize to mlmodel's input size #5002

Uh oh!

kharijarrett commented May 21, 2025

Uh oh!

HipsterBrown May 21, 2025

Uh oh!

bhaney left a comment

Uh oh!

bhaney left a comment •

edited

Loading

Uh oh!

bhaney May 21, 2025

Uh oh!

kharijarrett commented May 21, 2025

Uh oh!

bhaney left a comment

Uh oh!

Uh oh!

Uh oh!

	boundingBoxes, err := ml.FormatDetectionOutputs(outNameMap, outMap, origW, origH, boxOrder, labels, wasResized, resizeW, resizeH)
	boundingBoxes, err := ml.FormatDetectionOutputs(outNameMap, outMap, resizeW, resizeH, boxOrder, labels)

RSDK-10765 - Let mlvision resize to mlmodel's input size #5002

RSDK-10765 - Let mlvision resize to mlmodel's input size #5002

Uh oh!

Conversation

kharijarrett commented May 21, 2025

Uh oh!

HipsterBrown May 21, 2025

Choose a reason for hiding this comment

Uh oh!

bhaney left a comment

Choose a reason for hiding this comment

Uh oh!

bhaney left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bhaney May 21, 2025

Choose a reason for hiding this comment

Uh oh!

kharijarrett commented May 21, 2025

Uh oh!

bhaney left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bhaney left a comment •

edited

Loading