-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance impact? #52
Comments
I've now tested this again and indeed the performance impact is huge. On a gigabit LAN, running the passthru example slows the network speed down to about 30%! I'm losing almost 70% of the network speed on a fast machine (Intel i7). The other network packet filter "WinpkFilter" does not show that issue. It lets the traffic pass at full speed with a fraction of the CPU load WinDivert uses... Is WinDivert really so slow? |
Firstly, which version of WinDivert did you use? Some of the older versions have performance problems that have been fixed. These are my test results for WinDivert1.2.0-rc:
200Mbps = ~25MB/s is less than your 90MB/s, and I have not yet tested anything higher. There is a performance hit for outbound traffic (206 vs 178Mbps, about ~15%). This is something I was aware of but have never found the exact cause. A possible culprit is the checksum recalculation & this may also explain some of the CPU usage. Unfortunately correct checksums are a requirement of the underlying WFP framework as far as I can tell. WinPkFilter is a lower-level NDIS intermediate driver and probably does not need checksum recalculation for a passthru-type example. The other thing is that WinDivert has always been a convenience versus performance trade-off. For best performance, you are better off implementing a specialized filtering driver for your application. |
I'm using WinDivert v1.18. It's a pity that it slows down gigabit networks, because otherwise it seems really great! Maybe you can test it on a gigabit LAN to see yourself. |
Did you ever find anything to improve the performance? I'm currently experiencing a similar performance drop and high CPU usage. In my case my download speed goes from ~6MB/s to 4.5MB/s with 15% CPU (probably depends on the CPU). The application spends most of its time in the WinDivertSendEx method. I already increased both available parameters (WINDIVERT_PARAM_QUEUE_LEN and WINDIVERT_PARAM_QUEUE_TIME) but that does not make a lot of difference I'm afraid. |
Hi! Sorry, but no. This problem seems to be by design and the developers don’t seem to be interested to fix it. Regards! From: Areithus [mailto:notifications@github.com] Did you ever find anything to improve the performance? I'm currently experiencing a similar performance drop and high CPU usage. The application spends most of its time in the WinDivertSendEx method. I already increased both available parameters (WINDIVERT_PARAM_QUEUE_LEN and WINDIVERT_PARAM_QUEUE_TIME) but that does not make a lot of difference I'm afraid. — |
Hi @FCrane and @basil00 ,
Please comment about my scheme, would it work? |
Generally you want to divert as little traffic as possible to get the job done. Diverting only SYN packets is a good approach and should have minimal impact, although this will not affect established TCP connections. For UDP, that does not have a SYN equivalent, you'd be stuck with diverting everything or implementing something complex (e.g. update the filter string to whitelist established UDP flows). |
Nothing can be done until #53 is fixed anyway. |
@basil00 , |
For latency use ping, and for throughput just use any file transfer tool ( The performance impact of WinDivert is usually minimal unless you are attempting to divert megabytes per second of data through a user application. This is especially true for latency, where the the lag introduced by the user application is usually insignificant compared to the normal network lag. One danger for throughput is if the WinDivert packet queue getting overwhelmed resulting in packet loss. |
It would be helpful to know if the performance hit is constant with handicapped transfers with varying granularity of handicapping. |
The latest WinDivert source code seems to be about ~4x faster than older versions 1.1.X and 1.2.Y, at least with my quick-and-dirty testing. This might not be quite gigabit speeds but at least it is a lot closer. The better performance is mainly due to internal driver optimizations such as avoiding copying packets (where possible) and instant injection. |
Nice @basil00! |
@Areithus In my experience CPU is nearly all on the user, for the diversion process CPU usage is nil when using overlapped functions and tracking TCP packet flows. Also I believe, given the context of the thread, it's about throughput. This is exciting news. I've had word @basil00 that EV cert was granted and in the mail to someone I'm working with so we should be able sign shortly. |
Yes I meant 4x throughput, although it was a very rough test. I was testing 1Gbps speed, and version 1.2.0 choked at about 170Mbps, whereas the new version managed 630Mbps (still not perfect but much better). But this is just one quick test.
Let me know when you are ready and I can assist. My other sponsor signed version 1.3.0 but it was a long and painful process, but we gained much experience. From the project's perspective there is no harm in more than one sponsor :) |
What was the CPU and it's load when you tested it? Did you test |
Yes |
Some benchmarks for
Notes:
|
There is no MSVC build for WinDivert 1.3.0? |
@kelvinomolumo did you check the releases page? |
Nice test @basil00, I did some testing here as well (just a little with reading) and can confirm that 1.3.0 is faster than 1.4.0. Not just the throughput but also CPU usage is a little less (about 1-2% less). |
No, try to link against the MINGW version.
Version 1.4.0 has a more complicated pipeline, so is probably a bit slower as a result. The details are somewhat technical, but version 1.3.0 queues packets (by deep copying) at A more optimal design (in terms of performance) would be to not to use deep copying for queueing packets at all, but rather keep a reference to the original packet ( |
@basil00 |
That might be something to look into. I also remembered that there are other complications to consider. Specifically, while deep copying sounds slow, it also has the benefit of freeing up the original buffer. This means that |
The latest WinDivert-1.4-dev has reverted back to deep copying rather than referencing packets. It appears this mode is actually slightly faster:
So there is no reason not to continue using this mode for the immediate future. I hope to release version 1.4 shortly. |
@basil00 is the WinDivertSend back to how it was working in v1.3 as well with no error if injection fails? |
Since version 1.3.0 the |
I had evaluated the WinDivert 2.0 performance as part of testing, so it is probably worthwhile to make some quick notes here. One problem was that I was unable to replicate the pervious performance numbers for older versions of WinDivert. It is possible that WinDivert performance took a hit from the Meltdown mitigation, and especially since my test box uses older hardware. I was also unable to replicate the top speeds for the unfiltered connection either, which may be related, or may have been a temporary network issue. Nevertheless, we can relative evaluate the performance of WinDivert 2.0, and it essentially matches 1.4.3 using the same parameters (i.e., same thread count), which is in line with expectations. WinDivert 2.0 also introduces "batch mode" using the
This suggests that "batch mode" is the most important factor in terms of performance improvement in recent versions of WinDivert. |
Hi basil, I am suffering from the performance issue now. Actually, we focus on the SMB performance (aka network share). Without using Windivert, the file copying speed can be 150MByte/s (the virtual network adapter is 10Gbps) Although this is tested under virtual machines, but I got similar results with Physical machines with 1Gbps connection. Could you please shed some light on this? How can I debug this issue? Thanks |
Just so you know, its a known issue to some of us that SMB and other windows services suffer degraded performance. However I haven't retested using batching. Personally I exempt such traffic but that may not be an option based on your use case. |
Try disabling throttling in windows: https://serverfault.com/questions/4409/windows-networking-performance-smb-cifs One of the answers shows what reg key to set. Please let us know your results. |
Thanks, I will try it. |
It seems DisableBandwidthThrottling does not have much impact on this behavior. My observation is more threads lead to a lower and unstable speed, one thread can get the best and stable speed |
Also check your CPU usage, e.g., is one core running at 100%? Also, how big is each transfer? What is the latency between the source/destination?
If more threads does not help then it probably means the user application is not the bottleneck. |
Hi basil, I am now trying to understand Windivert and find the root cause, could you please share the pdbs (public or private) when you release the drivers in the future? It will help a lot when others try to investigate issues. (Although I can build it myself, it will be convenient anyway) |
For clarity, you want the pdb files are for performance profiling yes? |
Yes, for the current situation. All kinds of debugging and profiling tools need symbols. |
The pdb files are very large (relative to the rest of the binaries) and most users don't need them, so are not included. I could put them in a separate package but never get around to it. |
Thanks~ That would be great |
If you are using multiple threads then it is likely that the packet reordering is affecting performance. Note in general packet reordering is allowed but it has drawbacks in performance. Packet reordering should be avoided where possible. The only reason packet reordering was allowed back in the day was to allow different packets to go over different links to the same destination. One takes the scenic route with increased latency and arrives sometime later. That also allowed for stateless comparison and links randomly going offline. However these days many core routers put all packets related to the same flow down the same path even if there are multiple links load balanced in a round robin... this is precisely to avoid packets being reordered. There are simple ways to keep packets in order and one way is hashing the destination address. A really simple hashing scheme is to take the least significant bit (for two threads) or least two significant bits (for 4 threads) ect. and only allow one particular thread to handle one particular flow. This will give a fairly uniform load for multiple flows towards different destinations. There are other strategies like using src and dst port numbers or xor'ing the src ip address too for example but you do need to make it as simple as possible to keep the overheads low. Bottom line is don't let packets get reordered for no reason. |
@majibow Thanks for sharing that info here. After reading it, seems quite logical to me. Great comment. |
I guess you could get 8 threads easily with 8 WinDivert handles Unfortunately one thread will be hit a lot more than the others, mostly Since WinDivert already supports >, <, >=, <= operators you could simply do combinations of Would be nice if we could get bitwise operations in the filter language &, |, ^, ~, at minimum bitwise and would be super useful and far more efficient and and uniformly distributed.
Note: 3 = 0x3 = 0b00000011 |
Hi!
I'm testing the passthru example with "true" and "8" as parameters (also tried 1) and it works fine. However, copying a file over the network that usually runs at about 90 MB/s slows down to 25 MB/s. CPU load of the passthru program is between 20 and 25%.
Does WinDivert really slow down network traffic that much? Can this be improved? Other solutions, like WinpkFilter, have a much smaller impact (e.g. just 5% CPU load, just 10% drop in transfer rate).
Thanks!
The text was updated successfully, but these errors were encountered: