-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenOnload on non solarflare adapter #28
Comments
Hi. Yes, Onload has an AF_XDP backend, which makes it possible to run it on any NIC. AF_XDP support comes in 3 flavours, in increasing performance: generic, in-driver, and zero-copy. Last time I looked (admittedly a while ago) the ena driver supported neither in-driver nor zero-copy, so you're stuck with the slowest option, provided generically by Linux. Onload should still be faster than native sockets, though. |
Tried a couple of hours today to get the setup working. Noticed that the default ENA provided by Amazon was a bit out of date. After updating, compiling from this repository and running it I had the onload_cp_server running. Adding my eth0 yielded the following issue in dmesg I was under the impression that after updating both the kernel (Red Hat) and the ENA driver I should have XDP support. Also I bit unsure how to check if my current setup supports XDP, grepping for Any idea why it looks like my eth0 isn't supporting XDP? Any help is appreciated, bit stuck atm. Thanks in advance |
I'm sorry, it looks like I gave you bad info - re-checking the code it looks like we do currently need in-driver support for XDP. As I see it you have a few options:
|
OK, perhaps I ought to stop talking, since every time I (figuratively) open my mouth I give you bad advice, but I'm going to try again: The current ena driver (in Linus's tree) does have in-driver support for XDP, so it should work with Onload. The specific error logging you gave ( |
Hi Richard, first of all thanks for the detailed response! Really appreciate you reaching out on this GH issue! Lots of information already for me to try out. Bit of background info on the previously mentioned points: For the ENA driver im currently running version 2.5.0g, which should have support for XDP. So I'm either doing something incorrect when compiling and configuring the newer ENA driver or something went south with that specific version. Ill recheck on the amazon driver repository for any directions. One last thing I noticed is the following when starting up open onload and loading the drivers into the kernel:
The process appears to be running fine, just not sure whether this |
PS: from a different issue on the amazon driver repository (amzn/amzn-drivers#173) I read the following:
So even though XDP is supported by the ENA driver, it doesn't support AF_XDP yet? Might be the reason why its currently failing. |
@hjastenger Fwiw (Hi from amzn/amzn-drivers#173), the XDP related APIs should work on EC2 assuming you have recent kernel and ENA versions (I have tested a 5.11 kernel recently with ENA 2.5.0). Its just that its not particular fast at this time due to the issue you quoted. |
Hi @eugeneia, thanks for the info. I'm currently using RHEL 4.18, latest kernel from RH. I've tried upgrading the kernel using something like Elrepo to 5.12.9 but this results in not being able to build onload properly. Any idea? |
We really ought to find some way of hiding those messages. They confuse everybody and they're absolutely harmless. The |
Onload build and works with linux-5.12 if CONFIG_VDPA is not set. I've filed an internal bug for CONFIG_VDPA issue. It usually takes a few weeks to fix such an issue. In the best case we'll get a fix next Tuesday, |
Thanks for the replies all! Super helpful. Reverted my kernel upgrade back to 4.18 and swapped out the ENA for Intel (Intel(R) 10 Gigabit Virtual Function Network Driver, ixgbevf). Adding my eth0 interface doesn't yield me the
I hope hardware init failed due to the fact that im currently not using one of the Solarflare NIC's? Running something like
|
Have you typed |
You are referring to the step 'adding your interface' with |
Do you see |
So I guess you only add
|
Cool. Is the interface up? |
Yes the interface is up
|
Im still seeing a lot of context switches with |
The canonical way to determine whether Onload is functioning properly is with the Note that |
Was only running it with ping to verify onload was running by checking the banner message. Read somewhere on the onload forum that you could easily check onload was functioning by checking for the sucess info banner.
Have been trying out the code used by Cloudflare in one of their blog posts (https://blog.cloudflare.com/how-to-receive-a-million-packets), but ill try to validate this with the |
this thread is really helpful so thanks! just small question where i can find |
|
No, I meant |
thank you @rhughes-xilinx @hjastenger |
Good day, gentlemens!
May somebody know the reason why it could appear? |
Hello again following this thread - when i am running i am getting:
am i missing anything? |
Is the question "I definitely think I'm not running this command under Onload", "I don't understand what this is saying" or "how can I tell whether I am or not?"? My first guess would be that somewhere way up your terminal you ran |
actually was trying to verify that the installation works. but i made a mistake and was trying to install it from but when trying to build it from source the build fails with this error:
still trying to figure out why it fails to build - any ideas? |
Any ideas how I can download that Amazon's kernel NB Onload is tested with Ubuntu's |
Doesn't fully answer your question, as I am not sure 5.4 is available now, but you can download the kernel source from within a running instance: |
@hjastenger have you had any success with this? Are you able to share an update of the current state of things and a rough guide on getting this up and running? |
Ok, so I managed to compile, install and enable the latest code (commit
Onload is then seemingly accelerating applications. However, when inspecting @rhughes-xilinx @hjastenger @eugeneia PS: |
Please share |
@ol-alexandra please see below. There are no accelerated sockets... Comparatively, running the same application (just a udpsender) on a machine with X2522 card shows accelerated socket and u_polls is increasing as expected.
|
@ol-alexandra ok, I managed to find something more. When I run the application with onload, I can see the following on
|
|
|
@abower-xilinx @maciejj-xilinx I do not see n-tuple filtering being mentioned in https://github.com/Xilinx-CNS/onload/blob/master/README.md |
@ol-alexandra thanks a lot for clarifying. Indeed, the driver doesn't seem to support n-tuple filtering, as per https://lore.kernel.org/lkml/1486646499-13682-2-git-send-email-netanel@annapurnalabs.com/ |
In principle, n-tuple supports would not be required if the interface was configured with single channel.
Currently, Onload strictly requires n-tuple support. However, it would be pretty simple change to allow it to run without filters when single channel is set on the NIC. Would this be your usecase? |
@maciejj-xilinx it's definitely worth a shot. Our goal is to optimize the ec2 network stack on top of ENA adapters by using kernel bypass. It's very difficult to asses how big of an improvement we'd be looking at without testing first. If this is an easy fix, I'm happy to test and report back. |
@maciejj-xilinx is there any chance to drop n-tuple hard requirement in case interface is configured with single channel? |
Hi aneagoe, Thanks for getting in touch. I have submitted a change that allows disabling use of ntuple filters. Disabling ntuple filter can be achieved with We have not tested the change specifically with ENA, but we are hopeful that this should get you further. Hope that helps and let us know whether this works for you. Maciej |
@maciejj-xilinx thanks a lot for this, I'll test with it shortly. |
That definitely did the trick and now I see u_polls being ~10x k_polls. So acceleration definitely works. However, it seems to be slower that without onload by ~1us (just observational testing with udpserver/udpclient and isolcpus/pinning). In both cases (with and without onload) I've used the same channel configuration (1). |
We have not characterized Onload on ENA specifically, however there are some general tuning guidelines to optimize Onload for latency. On onload side the most important latency-wise tunables are: EF_POLL_USEC=100000 From our observations with Onload over AF_XDP latency was better than with kernel stack. Though this would be largely depending on internals of particular network driver and its setting such as interrupt moderation. In general we saw really good results with applications such as Redis or Nginx proxy With these the focus is throughput, or latency under load. Can I ask what sort of application you are working with? |
The only testing that I've done (where I observed the difference in performance) was using udpclient/udpserver from here: https://github.com/majek/dump/tree/master/how-to-receive-a-packet. I haven't tested with
Both on AWS and on the physical servers I've used isolcpus and kernel parameter optimizations. |
@aneagoe I have been able to run Onload-on-AF_XDP on Mellanox NiCs (Connect X 4 Lx and beyond with mlx5_core and AF_XDP ZC support) and Intel NiCs (ixgbe/i40e) in Openstack and bare-metals (Azure in progress) and the numbers are amazing for Redis and Memcached. Redis: 2X vis-a-vis kernel One major issue is Onload-on-AF_XDP by default does not use all hidden queues which are mapped to Onload, like how it does for Onload-on-ef_vi in Solarflare NiCs. You can check that by:
and you will see just one queue with all packets being bounced into Onload. Auto sensing of multi-threaded apps is a problem in AF_XDP by design. Someone from Onload team, please correct me if I am wrong. |
Trying the latest Onload with ENA, and this seems not working for me.
queue_0_rx_xdp_redirect is growing when I run some traffic. queue_0_rx_xdp_pass growing as well. Does this imply that it works? dmesg still have FIXME AF_XDP
My measurements with and without onload are the same, so I strongly believe onload is not working in my case. How to trouble shoot this? |
Ignore those FIXME_AF_XDP. You need to see if XDP_REDIRECT clause is implemented in the ENA driver or not. This is the same problem I see in Azure. Onload works on all drivers where the XDP_REDIRECT clause is implemented like Intel (ixgbe/i40e), Mellanox(mlx5_core). You can try in AWS with EC2 instances which give PCI passthrough to Intel 82599 NiCs |
EC2 with Intel NIC is not working either.
And it's not just a log line to ignore, onload reports the same error.
Still would prefer to sort the issue with ENA. These Intel NIC instances don't look good for our needs. |
Please use this setup in AWS with Ubuntu 21.04 and report. Maybe I can help from there.
|
Hi there is three command has error message
|
@Ventus5566 yes its ixgbevf and not ixgbe direct! You will need PCI passthrough access to Intel NiCs at minimum, which means ixgbe is required. I am not sure of ixgbevf. @rhughes-xilinx once said that ixgbevf needs patches. I have tested on:
|
Hi all. This post is a bit stale. I've run into a few issues with setting up onload with ena on aws. I've created a new issue here. Any help would be much appreciated. |
I feel like an idiot for asking this here, could not find a conclusive answer anywhere. Is it possible to run OpenOnload on non solarflare / Xilinx hardware. Aka if a setup my hosting from AWS, is it possible to configure OpenOnload to work on the ENA driver that automatically comes with that machine?
The text was updated successfully, but these errors were encountered: