Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
BSOD. KMODE_EXCEPTION_NOT_HANDLED in FwppInjectComplete #129
Driver version is 1.4
I compiled WebFilter sample and ran it. To check driver's speed I wrote a tool that makes parallel web requests (100 threads, each request uses a new port) and measures average response time. System crash was occurred (KMODE_EXCEPTION_NOT_HANDLED, EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%p referenced memory at 0x%p. The memory could not be %s).
I started to investigate the reason of BSOD and found out that there are no crashes when the driver works in Sniff mode. In other words it seems that something going on during packet injection (perhaps only when callouts work).
in most dump files a process that causes BSOD is a process that works with WinDivert driver, in some cases it is System process
Any chance something like virtual box is installed or anything else that would create a virtual adapter? If I had to guess in my ignorance, I'd say you have a virtual adapter driver that is modifying it's packet buffer chain or doing some other no-no like not properly releasing such resources.
They get an exclusive lock they need to release and there's other rules they need to follow where such mistakes wouldn't become an issue in sniff mode but would be an issue where WinDivert or any other similar driver is actively capturing, popping and then reinjecting.
If you can, use the dev console that afaik comes with WDDK to enumerate all drivers, bring that dump here, then remove any such adapters and make sure their drivers are gone/unloaded, re-run your test and see if the issue doesn't go away.
Btw I'm just some dude not the driver author or anything so take my remarks in that context. :)
Thanks for the report. As you noted, the problem seems to only occur when other network related (antivirus?) tools are active, which means the issue is likely related to #128. Strange that both issues appear so suddenly, so perhaps a recent windows update has exposed a new bug?
The stack trace also indicates a similar problem (although the bug check code is different). It appears to be a use-after-free problem of some kind. The symptoms are also similar to #110 but that problem was fixed.
This might be difficult to track down. The bug can be in WinDivert, the antivirus driver, the NIC driver (in this case appears to be
Some follow-up questions:
WFP is built on top of NDIS so there is a lot of overlap. From the stack trace, it appears that a packet has been dereferenced, and this is invoking some cleanup code, which is crashing. The packet is probably one constructed by WinDivert, or the antivirus driver, so there is a connection. That said, it seems the crash itself occurs outside of any driver.
Assuming everything is working as it should, and the packet is from WinDivert, then I assume the
@TechnikEmpire , I'm not sure that source code of the tool will help us to understand what's going on. This tool has been written on .Net and has high-level logic. in any case the source code look like this
so, as you can see it just create a lot of small requests in both directions.
I guess it is because WinDivert driver uses the same structures that NDIS level operates. For example, WinDivert creates NET_BUFFER_LIST in some cases and injects it into lower level.
I reproduce this issue on VM, but i don't think that is important because I know that the issue is also reproduced on physical machine. The stack trace almost the same, for instance
In case when (in my case) HP Velocity is disabled or unloaded WinDivert works absolutely stably.
@basil00 I found out this behaviour a time ago (maybe months ago). First time, i was faced with this behaviour on 1.2 version. After that I started to read forums, source code and tickets on github. Nowadays I've also reproduced the issue both on 1.3 and on 1.4. as i can see, the logic of injection wasn't being changed a lot between versions.
Of course, this is why i tried to understand the reason by myself. Moreover, also i didn't find any signs in literature that can say that Windivert performs injection in a wrong way. but from the other side the fact that there are a lot of wide-spreaded tools that have problems with WinDivert driver says us that perhaps WinDIvert driver could have problems.
I haven't made this test. i'll try to do it in a nearest future.
Also I have to mentioned that i'm not a driver specialist and all my knowledge is based on WinDIvert source code and MS topics.
I'm really worry that WinDivert separates NET_BUFFER_LIST into numerous of new one. I understand that the driver creates deep copies of NET_BUFFER and inject only newly allocated structures. But why filter's (FWPM_FILTER0) condition isn't used? Would this help to prevent splitting of NET_BUFFER_LIST structures?
Thanks, this is useful information to know. Actually, the mention of the
Eyeballing the code, there are no obvious issues. The purpose of the completion routine is to cleanup packets that were allocated, so at least I think the design should be OK. But this doesn't mean there are no subtle bugs.
WinDivert has always used its own filter mechanism rather than the WFP version (otherwise would need to translate the filter language into WFP filters, which I am not sure if it is worth the complexity). Actually, I am not sure what WFP does if a
That said, it is difficult to trigger the path where WinDivert will split
@basil00 , I've performed several tests that results are mentioned below. To prevent any kind of misunderstanding I'm not perform any analysing.
all tests had the next conditions: Windows Server 2012 R2 Essentials, WebFilter (+WinDivert64), HP Velocity and HttpMeter (tool to generate requests, it was mentioned earlier). WinDivert64 was recompiled for each test case in apropriate way (source code was based on WinDivert 1.4).
Test case 1: only inbound callouts were installed
Test case 1 and Test case 2 have done successfully in each iteration.
Test case 3 was being terminated with the next exceptions.
Test case 4 was being terminated with the next exceptions.
Thanks. There are actually a lot of different types of errors. Error 4.4 is interesting since it occurs during a call to
One possibility is that the state of injected packets is being corrupted somehow. This can happen if another driver is taking a reference of the packet (not "officially" taking ownership of it) and modifying it out-of-band. This would explain why there are different kinds of crashes, since the modification can occur at random places. This can even be a use-after-free, e.g., the original packet is free'ed, then reallocated as a different packet, only then is modified via the old reference causing a random crash.
Otherwise, I cannot think of any other obvious explanation for the observed behavior.
It appears that the
BSODs also stop even if HP Velocity is disabled (it has three modes: active, monitoring,disabled).
I'd like somehow to proof this theory, therefore I'm going to ask LiveQoS specialists during the next several days to join to the conversation to help us find the reason of the issue.
As mentioned above, if you just search for
If the bug is something like "completing" packets twice, then this would explain the symptoms. Without WinDivert loaded, this unlikely to be fatal since the buffer belongs to the user application, which is most likely blocking until the system call completes anyway. But with WinDivert loaded, this results in a double free error resulting in "random" crashes, which explains the diversity of the BSOD messages. But this is partly based on speculation, so might be something else.
On the other forums, it seems that the consensus is to uninstall or disable HP Velocity. I intend to close this issue unless further information comes to light.
@basil00 , it's not a good practise when issues are closed basing on predictions. it's much better to wait an answer from all participants and then make a decision. Because in this way, the issues will be unresolved and customers will reopen them a lot of times.
FIY, nevertheless, I had a conversation with LiveQoS team and they said they've started analysing the issue.
Alex from LiveQoS here. We are the ISV responsible for the HP Velocity component distributed by HP.
The problem seems to be related to the way our NDIS filter driver handles packets generated by the WFP, so any driver using WFP to inject packets will cause the BSOD some time after our filter sent or completed the NBL chain.
Turning HP Velocity off disables packet processing and the packets are then sent as is without triggering the issue. Use that mode for that time being until we provide you guys with a solution.
In the meantime, any users affected by this issue are advised to:
Otherwise, disable HP Velocity in the meantime.