-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keypoint data structures are too slow. #426
Comments
I actually was able to make a POC pretty quickly and I got a 77x speedup.
Results:
I will submit a PR with those modifications and then we can decide where to go from there. |
I see the point of using vectorized implementations. So far I have evaded changing the codebase in that direction, as it is likely a lot of work and the keypoint augmentation felt fast enough to me. There is also the problem that users may want to associate information with each individual keypoint, such as the label ("left eye", "right hand", ...) or an identifier (e.g. a filename). That can become a significant headache to implement with an array-based approach if the number or order of keypoints changes during augmentation. Regarding the approach that you proposed: As far as I can tell, it has the advantage of limiting the changes to the codebase, while still gaining the speed increases. It has the disadvantages of inducing a lot of An alternative (but fairly similar) way forward would be to introduce a parallel class to I also wonder if there is a way to make the transformation from |
Just a FYI I'm experiencing similar slowdowns with transforming keypoints taking about 5x longer than heavy augmentations for images... I was a bit horrified to find out that the keypoint augmentations are done inside a list comp for each point :D this is definitely a big blocker for me. Thanks @Erotemic for the vectorized implementation! I'll give that a try. IMO all operations should be vectorized by default and having extra info per keypoint should be the exception. Then again, using numpy string arrays should be pretty fast too (e.g. casting to Other than this issue, I really like imgaug. Thanks @aleju for a great library! |
@harpone Did you ever solve this? Did you manage to speed up keypoint augmentations, or find another library? Albumentations is ok, but they don't offer most augmentations for keypoints e.g. elastic. |
using albumentations nowadays most of the time, and haven't had much need for keypoint augs since. |
But albumentations doesn’t support elastic deformations for key points.
Unless I fork/branch and add that feature myself, I think I’m stuck with
imagaug.
…On Fri, 25 Mar 2022 at 07:53, Heikki Arponen ***@***.***> wrote:
using albumentations nowadays most of the time, and haven't had much need
for keypoint augs since.
—
Reply to this email directly, view it on GitHub
<#426 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFYSMI7CBC4PJUHJ5AAXMB3VBVWJHANCNFSM4IXEXBYA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I've been noticing massive slowdowns as I enable more augmenters in my training scripts. There is a lot of Python overhead incurred in the general architecture of the library.
Looking into it I found that keypoint augmentation is taking up most of the time. This is because each keypoint is augmented one at a time, even though all of the keypoints experience the same transformation. Also most every operation on keypoints seems to cause a deep copy, which incurs overhead of constructing new python objects. This may be fine for <10 keypoints, but it really starts to add up.
Demo of Speed Issue
Here is a MWE demonstrating that keypoint augmentation takes the majority of time in a CropAndPad augmenter.
The output is:
Augmenting the keypoints took 10x longer than augmenting an image!
Methods for Mitigation: numpy vectorization
This problem could be avoided if KeypointsOnImage had a better internal representation. Namely, instead of being a list of Keypoint object, it could simply store a single 2D numpy array of the keypoint locations. (Note this same optimization can be applied to bounding boxes).
Even with a conversion overhead I can demonstrate a 10x speedup by simply modifying the
_crop_and_pad_kpsoi
function. Instead of using internal KeypointsOnImage method I convert to a numpy array, perform all operations, and then convert back.I time the old versus the new and assert they result in the same outputs.
This gives us a 10x speedup!
You might note that this example was for 1000 keypoints, and you might think the vectorized solution would be slower for an item with only a few keypoints. It turns out this is untrue. For even 2 keypoints the vectorized solution is faster (44us vs 33us), and for a single keypoint the times are effectively the same.
Conversions are costly!
And note we can do MUCH better than the above implementation. If we take a look at the line profile results we see that the majority of the overhead is simply in doing the conversion from the
List[Keypoint]
backend to a numpy one:This means that if imgaug was able to improve its backends it wouldn't need to convert between formats as many times, so it would see massive speedups.
Proposal for Speedup: Vectorized objects with numpy backends
Ultimately, I think imgaug would benefit from moving to data structure primitives with vectorized backends (like numpy).
It may seem like this would come at the cost of code re-use, but that is not necessarily the case (although this will mean a codebase overhaul). Currently single-item objects like Keypoint/Box/etc are the data structure primitive, and XsOnImage reuses code by iterating over the single-item objects. However, if you were to use multi-item objects as primitives (using numpy backends of course, here is an example of how I implemented something similar with boxes) then you could implement the single-item objects as special cases of the multi-item objects.
That being said, that's probably too big of a change to ask for, however, I think a more realistic goal would be to just implement a
Keypoints
object that stores multiple keypoints as a numpy array, but exposes a__getitem__
to return aKeypoint
object soKeypointsOnImage
logic would work as-is. However, incremental improvements could be made toKeypointsOnImage
such that when it sees thatisinstance(self.keypoints, Keypoints) is True
, it can use faster methods of theKeypoints
object instead of looping over each individual item and incurring so much Python overhead.For instance
KeypointsOnImage.shift
might look like this:and
Keypoints
might have ashift
method that looks like this:Implementing it this way would allow for a gradual shift to the new vectorized style, provide immediate speedups
The text was updated successfully, but these errors were encountered: