Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: fix nms & use views instead of owned arrays #53

Merged
merged 19 commits into from
Apr 30, 2024
Merged

Conversation

Smirkey
Copy link
Owner

@Smirkey Smirkey commented Apr 25, 2024

#51

Copy link

codspeed-hq bot commented Apr 28, 2024

CodSpeed Performance Report

Merging #53 will improve performances by 79.02%

Comparing start/using-views (2448561) with main (f52253e)

Summary

⚡ 67 improvements
✅ 82 untouched benchmarks

Benchmarks breakdown

Benchmark main start/using-views Change
test_box_convert_cxcywh_xywh[float32] 68.3 µs 56.5 µs +20.74%
test_box_convert_cxcywh_xywh[float64] 81.7 µs 65.3 µs +25.22%
test_box_convert_cxcywh_xywh[int16] 67.4 µs 53 µs +27.09%
test_box_convert_cxcywh_xywh[int32] 69.5 µs 56.1 µs +23.96%
test_box_convert_cxcywh_xywh[int64] 83.1 µs 65.1 µs +27.58%
test_box_convert_cxcywh_xywh[uint16] 65.5 µs 52.1 µs +25.62%
test_box_convert_cxcywh_xywh[uint32] 68.9 µs 55.6 µs +23.91%
test_box_convert_cxcywh_xywh[uint64] 82.1 µs 56.6 µs +45.04%
test_box_convert_cxcywh_xywh[uint8] 64.8 µs 51.8 µs +25.07%
test_box_convert_cxcywh_xyxy[float64] 81.8 µs 65.3 µs +25.31%
test_box_convert_cxcywh_xyxy[int16] 66.3 µs 53 µs +25.21%
test_box_convert_cxcywh_xyxy[int32] 68.7 µs 56.1 µs +22.47%
test_box_convert_cxcywh_xyxy[int64] 82.2 µs 57.1 µs +43.78%
test_box_convert_cxcywh_xyxy[uint16] 66.7 µs 52.3 µs +27.64%
test_box_convert_cxcywh_xyxy[uint32] 68.8 µs 55.6 µs +23.79%
test_box_convert_cxcywh_xyxy[uint64] 82.1 µs 57.2 µs +43.37%
test_box_convert_cxcywh_xyxy[uint8] 65.5 µs 51.8 µs +26.48%
test_box_convert_xywh_cxcywh[float32] 68 µs 56.3 µs +20.88%
test_box_convert_xywh_cxcywh[float64] 82.1 µs 58.4 µs +40.51%
test_box_convert_xywh_cxcywh[int16] 67.7 µs 52.9 µs +28.02%
... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

@Smirkey Smirkey marked this pull request as ready for review April 28, 2024 19:13
@Smirkey
Copy link
Owner Author

Smirkey commented Apr 28, 2024

Hi @BTOdell I've extended a bit the use of array views and also fixed the python bindings. Overall things are well speed up especially NMS which is really cool !
I'm a bit puzzled by the IoUs functions results (iou/diou/giou...). In all of them a new block is taking a lot of time: <ndarray::ArrayBase<S,D> as numpy::convert::ToPyArray>::to_pyarray (37.99%) (new) (cf the benchmarks by codspeed in the profiling tab).

I've tried a bunch of things but couldn't figure out why things are slowed down there. A theory could be that in the case where our output is significantly bigger than our input (which is the case for ious functions since they return a matrix from 2 1d vectors) it's better to deal with OwneRepr ... do you have any idea ?

@BTOdell
Copy link
Contributor

BTOdell commented Apr 28, 2024

When you pass in an owned ndarray to a function, you give up ownership of that data and the caller won't use it anymore, so when it's converted to a PyArray, the internal data can simply be moved into the PyArray without doing an element-by-element copy. You can see in the new flamegraph, it's called __memcpy_avx_unaligned_erms that does the data copy instead of the old code just does a data move with PyArray::from_owned_array.

Also, I noticed you added back the EPS value to "fix" the divide by zero in the NMS algorithm. I don't think that the union value can ever be zero. Can you provide an example when this value would ever be zero?

@BTOdell
Copy link
Contributor

BTOdell commented Apr 28, 2024

To overcome the issue with the element-by-element copying, you could try accepting a CowArray instead of an ArrayView. That way, if the user has an owned array that they won't need after calling the function, they can pass in the owned array to regain the original performance. But if they need the array after the call, then they would pass in a view (and incur the cost of copying from the view into the PyArray).

@BTOdell
Copy link
Contributor

BTOdell commented Apr 29, 2024

Also, I noticed you added back the EPS value to "fix" the divide by zero in the NMS algorithm. I don't think that the union value can ever be zero. Can you provide an example when this value would ever be zero?

@Smirkey Can you comment on this?

Repository owner deleted a comment from Buillaume Apr 29, 2024
Repository owner deleted a comment from Buillaume Apr 29, 2024
@Smirkey
Copy link
Owner Author

Smirkey commented Apr 29, 2024

Also, I noticed you added back the EPS value to "fix" the divide by zero in the NMS algorithm. I don't think that the union value can ever be zero. Can you provide an example when this value would ever be zero?

@Smirkey Can you comment on this?

yeah sure sorry ! I added it back because it broke the test (division by zero errors), but will look further into what you've done for fixing nms later, my goal at first was to get everything working along with the bindings
At a first glance I don't see what was wrong with the NMS implementation nor why your changes fixed it, it might be linked to these zero division errors, do you have more insights on this ?

As for the slow down observed in codspeed benchmarks, it was linked to the fact that I use to_pyarray instead of into_pyarray, which is the function to use with OwnedRepr

@Smirkey
Copy link
Owner Author

Smirkey commented Apr 29, 2024

The zero division errors only occur whit integer and unsigned integers

@BTOdell
Copy link
Contributor

BTOdell commented Apr 29, 2024

At a first glance I don't see what was wrong with the NMS implementation nor why your changes fixed it, it might be linked to these zero division errors, do you have more insights on this ?

I believe it's either related to all of the f64::min operations you were doing between the area1/2 and the intersection. It was completely messing up the IOU calculation.

Also, I think it's possible that adding the EPS value to the denominator could also result in accuracy issues, which is why I've been concerned about it being added back.

@BTOdell
Copy link
Contributor

BTOdell commented Apr 29, 2024

But it looks like everything is passing even with the EPS removed! Congrats! 🎉

@Smirkey Smirkey merged commit 3d615a6 into main Apr 30, 2024
23 checks passed
@Smirkey Smirkey deleted the start/using-views branch April 30, 2024 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants