Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can we optimize the ring buffer performance ? #123

Closed
codehobbyist06 opened this issue Sep 15, 2023 · 13 comments
Closed

How can we optimize the ring buffer performance ? #123

codehobbyist06 opened this issue Sep 15, 2023 · 13 comments

Comments

@codehobbyist06
Copy link

Hi everyone,
I have been trying to compare performance of dpdk rings to AllocRingbuffer and have noticed that the results are significantly different,

Following are some stats:
buffer No. of cycles to enqueue No. of cycles to dequeue
dpdk ring 4 6
rust ring 41 68

So, as you can see there is difference of approximately 10 times, which makes it significantly slower. Hence, I wanted to know, if there is any way to optimize the ring buffer performance further, by using some flags etc. ?
Also, one difference the AllocRingBuffer has with the dpdk ring is that it does not have any bulk enqueue or dequeue APIs. So, is there any plan of having such APIs implemented?

@jdonszelmann
Copy link
Collaborator

jdonszelmann commented Sep 15, 2023

Hi!

I am curious how you measured these statistics. not at all because I don't believe them, I'm certain there are ways to still optimise ringbuffer. Did you use our own benchmark and extrapolate that to number of cycles or did you measure it yourself? If so, what compiler flags did you use for that?

Looking at dpdk they are even atomic (i.e. they work in multithreaded contexts). Although I'm sure they optimised that very well, I'm surprised that that's faster (especially when congested) than our few bit operations, a single branch (of which they seem to have plenty as well) and a write. Though maybe I'm looking at the wrong code. Which code did you benchmark?

About bulk insert apis: possibly! We're developing ringbuffer extremely part time so it's something I'd love to do if I find some time but it could take another month or so and I can't promise much. I made an issue for it #124

@codehobbyist06
Copy link
Author

I have measured the statistics by simply fetching the time register value before and after the operations. Also, both the rings have been compared in the same environment so I doubt, something could be wrong there.
And regarding flags, actually I am relatively a bit new to rust so have not tried using any optimization flags as such. However, would like to know if I can use any such flags for better performance.
Also, the enqueue and deque operations are happening in different dpdk threads for the benchmarking, so that both the rings are on the same page.
I would really appreciate your inputs on how the ring performance could be optimized.

@jdonszelmann
Copy link
Collaborator

jdonszelmann commented Sep 15, 2023

I see, well the first thing you can try is to compile with --release if you didn't do that yet. That might make a large difference, though it depends a bit on what you've tried already what further advice I can give

@codehobbyist06
Copy link
Author

Yes, I had compiled with --release flag for benchmarking. Also, apart from this, I did not use any optimization flags on rust side. The optimization setting are mostly the ones provided by default.

@jdonszelmann
Copy link
Collaborator

--release will optimise with optimisation level 3 so that's good. Another option you can try is enabling LTO

@codehobbyist06
Copy link
Author

Ohk sure. I will try with that. Also, is there any performance optimization available from the ring side. For e.g : some ring configurations that could be used?

@jdonszelmann
Copy link
Collaborator

There is not any configuration you can pass to a ringbuffer to make it faster. ConstGenericRingBuffer is a bit faster than AllocRingBuffer but it depends on your usecase if that's useful to you. One more thing to know if that RingBuffer stores full elements, not references to elements. If your elements are large, they're copied around into and out of the RingBuffer. You could have a ringbuffer of 'static references (or whichever lifetime available to you) and your performance may change.

@codehobbyist06
Copy link
Author

Ok sure. I will check the ConstGenericRingBuffer once if that could be useful. Thanks a lot for the info:)

Also, I wanted to keep static references in the ring buffer, but since the ring needs to have ownership of the objects (its a requirement on my side) it contains, I don't think I can do much on that part. However, would be open to know if there is some way to transfer the ownership of the objects to the ring as well as avoid copying the complete data chunk back and forth.

@jdonszelmann
Copy link
Collaborator

How do you do that in your C version? If I saw it correctly, it also mainly stores pointers or not?

@jdonszelmann
Copy link
Collaborator

With Box you can allocate first (which is expensive) and then pass only references around. If you're passing around the same references over and over again that may be worth it. You can also allocate in other places. Not on the heap, but maybe in an arena/bump allocator

@jdonszelmann
Copy link
Collaborator

If your references are really static, ringbuffer doesn't need ownership. It just depends on the generic type you're using. If you make a RingBuffer::<&'static T> RingBuffer doesn't need ownership. It needs references

@codehobbyist06
Copy link
Author

Yes, the C version also stores pointers, but my application is a bit different, in rust I have wrapped objects around those pointers that are being passed around.
Also, I guess I can consider the point of just passing references around and keeping the ownership of the objects in some dump collector.
Thanks a lot for your inputs.

@jdonszelmann
Copy link
Collaborator

no worries!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants