-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can we optimize the ring buffer performance ? #123
Comments
Hi! I am curious how you measured these statistics. not at all because I don't believe them, I'm certain there are ways to still optimise ringbuffer. Did you use our own benchmark and extrapolate that to number of cycles or did you measure it yourself? If so, what compiler flags did you use for that? Looking at dpdk they are even atomic (i.e. they work in multithreaded contexts). Although I'm sure they optimised that very well, I'm surprised that that's faster (especially when congested) than our few bit operations, a single branch (of which they seem to have plenty as well) and a write. Though maybe I'm looking at the wrong code. Which code did you benchmark? About bulk insert apis: possibly! We're developing ringbuffer extremely part time so it's something I'd love to do if I find some time but it could take another month or so and I can't promise much. I made an issue for it #124 |
I have measured the statistics by simply fetching the time register value before and after the operations. Also, both the rings have been compared in the same environment so I doubt, something could be wrong there. |
I see, well the first thing you can try is to compile with |
Yes, I had compiled with --release flag for benchmarking. Also, apart from this, I did not use any optimization flags on rust side. The optimization setting are mostly the ones provided by default. |
--release will optimise with optimisation level 3 so that's good. Another option you can try is enabling LTO |
Ohk sure. I will try with that. Also, is there any performance optimization available from the ring side. For e.g : some ring configurations that could be used? |
There is not any configuration you can pass to a ringbuffer to make it faster. ConstGenericRingBuffer is a bit faster than AllocRingBuffer but it depends on your usecase if that's useful to you. One more thing to know if that RingBuffer stores full elements, not references to elements. If your elements are large, they're copied around into and out of the RingBuffer. You could have a ringbuffer of 'static references (or whichever lifetime available to you) and your performance may change. |
Ok sure. I will check the ConstGenericRingBuffer once if that could be useful. Thanks a lot for the info:) Also, I wanted to keep static references in the ring buffer, but since the ring needs to have ownership of the objects (its a requirement on my side) it contains, I don't think I can do much on that part. However, would be open to know if there is some way to transfer the ownership of the objects to the ring as well as avoid copying the complete data chunk back and forth. |
How do you do that in your C version? If I saw it correctly, it also mainly stores pointers or not? |
With Box you can allocate first (which is expensive) and then pass only references around. If you're passing around the same references over and over again that may be worth it. You can also allocate in other places. Not on the heap, but maybe in an arena/bump allocator |
If your references are really static, ringbuffer doesn't need ownership. It just depends on the generic type you're using. If you make a |
Yes, the C version also stores pointers, but my application is a bit different, in rust I have wrapped objects around those pointers that are being passed around. |
no worries! |
Hi everyone,
I have been trying to compare performance of dpdk rings to AllocRingbuffer and have noticed that the results are significantly different,
Following are some stats:
buffer No. of cycles to enqueue No. of cycles to dequeue
dpdk ring 4 6
rust ring 41 68
So, as you can see there is difference of approximately 10 times, which makes it significantly slower. Hence, I wanted to know, if there is any way to optimize the ring buffer performance further, by using some flags etc. ?
Also, one difference the AllocRingBuffer has with the dpdk ring is that it does not have any bulk enqueue or dequeue APIs. So, is there any plan of having such APIs implemented?
The text was updated successfully, but these errors were encountered: