-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception caught in FormClosing / FormClosed lead to crash in coreclr.dll #41125
Comments
I've added a new empty form to our app with only 1 texbox, BindingSource and simple DataSource class with one double filed. |
I assume the debugger just adds instrumentation which reports the issue, but it probably also exists without debugger, just nobody notices it. Is it possible to reproduce with WinDbg instead of VS? (Available as app in the store, no need to install any sdk.) Then it might be possible to record a TTD trace which was immensely helpful when looking at memory corruptions. |
@weltkante Thanks for the advice, but not repro with WinDbg. App not crash, and no exception in the log other than three |
That makes it hard to diagnose if it only happens in the VS debugger and at the point it happens you see a seemingly random stack trace. You still can look for similarities in the dumps, like were the random call sites always calling into the same infrastructure? is there another thread which is consistently in the same infrastructure? etc. If you want to self-diagnose more you could also try disabling or short-circuiting parts of your application which you think are unrelated to narrow down where the problem comes from - especially if you have unsafe/interop code of your own. Aside from that general advice I don't really have any more ideas, and I'm not sure anyone on the repo would have either if they aren't able to look at a repro themselves. Maybe you have more success asking in the dotnet runtime repo, they are working on a lower level and may be aware how to diagnose things like this, a stack corruption seems serious enough to warrant looking at, especially if you are somewhat confident you aren't causing it yourself. |
Without a repro it is hard to pinpoint the culprit, and direct the issue to the relevant team. It can be something broken in Windows Forms, or it can be something broken in CoreCLR, or .NET Runtime, or VS... |
Thanks guys, I will continue in this direction:
... |
Got it! It's related to Nlog (simple logger instance - no logging at all) and this line of code in constructor of main form: public Form2()
{
InitializeComponent();
_BoldDefFont = new Font(DefaultFont, FontStyle.Bold);
} and one more condition: |
Must have:
Repro proj: bindingSource.zip So, for now candidates:
What do you think - how is dangerous to run such an app in production? |
Great work getting a repro, I've reduced it further. Any caught exception in
Sounds like a runtime issue to me.
Depends on the nature of the stack corruption, considering that its only noticeable when the debugger/runtime are specifically looking for stack corruptions this might not be dangerous. Generally stack corruptions can be security issues so this should definitely be looked at to make sure. (After all we don't know if this bug can also be triggered in an ASP.NET Core server application.) static class Program
{
[STAThread]
static void Main()
{
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
bool first = true;
var form = new Form { Text = "Try closing me", Width = 400 };
form.FormClosing += (sender, e) =>
{
if (first)
{
first = false;
form.Text = "Wait for crash (or close to exit)";
e.Cancel = true;
// caught exception triggers the bug
try { throw new InvalidOperationException(); }
catch { }
// together with a nontrivial workload
Application.Idle += delegate
{
object data;
for (int i = 0; i < 100; i++)
data = new byte[10000];
};
}
};
Application.Run(form);
MessageBox.Show("Closed successfully (no crash).");
}
} |
Not only in |
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Transferring to VM team per @RussKie |
Since this only repros with Debugger attached moving to Diagnostics for now. @kirsan31 are you able to verify that it doesnt occur in .net 5? |
Tagging subscribers to this area: @tommcdon |
@mangod9 I've tested the @weltkante's repro against 3.1.7 and the latest 5.0 RC1 (5.0.100-rc.1.20425.5) in 16.8.0 Preview 3.0 [30421.201.main]. |
Repro with: Doesn't repro with: Can't test .net 5.0 with v16.7.2 :( |
erm, maybe I'm misunderstanding, but just because a debugger notifies you of a bug doesn't mean its causing the bug, or the bug isn't present when nobody is looking. This shouldn't be about fixing the diagnostic/debugger experience, this should be about investigating whether this is a security-relevant stack corruption present in a .NET Core 3 LTS release. |
Sorry, maybe I misunderstood the original issue. Does the original scenario (or the smaller repro) AV without the debugger attached? |
no, the diagnostic instrumentation detects a stack overrun, there is never any AV reported, not even with debugger attached
So something writes the stack in some place that its not supposed to write in, typical stack corruption. Not all kinds of stack corruptions lead to AV. Its unclear what the consequences are, i.e. if this can have any negative effect or just happens to always overwrite unused stack memory. IMHO the first step is to figure out what this "stack instrumentation" is - some CLR runtime feature enabled when the debugger is attached? something VS instruments by itself? Can it be enabled with windbg attached instead of VS? Once its clear what diagnostic feature is detecting the overrun it may be easier to enable it in context of a TTD time travel session and look who does the bad write. |
Ok, thanks for clarifying. Will leave in the Diagnostics area for now, to get a better understanding of the stack instrumentation. @tommcdon assume this is Diagnostics or Tracing? |
Thanks for sending this. I’m able to reproduce the issue on 3.1.7 x86. Since the problem does not (edit) appear to reproduce on 5.0, I’ll change the milestone to 6.0 and we will investigate the root cause and potential fix for 3.1 |
Callstack of debuggee at point of failure:
|
I don't think it is, only 3.1 appears to be affected. |
I don't have answers yet, but it is not due to #40637. What's happening here is we expect that the linked list of Frames we keep will only have either valid frames or FRAME_TOP (-1) but we end up with a NULL Frame in the list. CrawlFrame::SetCurGSCookie will call DoJITFailFast (and therefore __report_gsfailure) if provided a NULL pGSCookie. |
This is caused by #2240. I was able to debug to the point where I saw that the frame chain was corrupted, and then by searching through issues that were fixed between 3.1 and 5.0 found this suspiciously similar issue. I don't know exactly why this only repros under the debugger, but I validated that applying the fix for #2240 makes the provided app no longer crash so I am pretty confident this is the same issue. @janvorli is there anything preventing us from porting the fix back to 3.1? |
It looks like the port should just work (only the change in |
@kirsan31 @weltkante would one of you be able to verify the fix for #2240 locally? I verified that the sample app provided no longer crashes, but it would be nice to make sure that it solves the original issue for you before going through the servicing process. I created a release on my fork of coreclr with binaries with the fix: https://github.com/davmason/coreclr/releases/tag/0.0.1 To test it out you would have to:
Please let me know if you run in to any issues |
I've tested with the original and the reduced repro scenario, thats both fixed, but lets wait for @kirsan31 in case he wants to test in his actual environment. As far as I'm concerned the fix looks good, great work. |
@davmason |
Thank you both for validating the fix. I opened a PR to port it back to 3.1 in dotnet/coreclr#28090. |
Fixed in dotnet/coreclr#28090 |
---UPD---
Small repro and further investigations from this comment.
---UPD---
.NET Core Version: 3.1.7
Have you experienced this same bug with .NET Framework?: NO
Problem description:
Hi guys, need some help here. I just figure out very strange behavior of our app.
Pre requirements:
BindingSource
with data source is some class. e.g.bindingSource1.DataSource = typeof(app.Params);
with all above, when we input some wrong data (for example non numbers in int filed) in textbox and than attempt to close form - app will crash.
Event log:
From crash dumps:
The error always the same and in ANY random place of our program:
It's happens with all our forms where BindingSource present. No matter what code (if any) in
OnFormClosing
. WithoutOnFormClosing
or even withExpected behavior:
No crash.
Minimal repro:
I spend about 2 hours trying to create simple repro, but failed :(
bindingSource.zip this is not reproduce the problem, but show the structure.
I dunno is this a bug in our app, or in WinForms, or in core, or in V.S?
Windows 10, 2004 x64.
VS 16.7.1
Any advice?
P.S.
When trying to create a repro, I found 2 more bugs in VS 16.7.1 with multi targeting and designer :( Now with data sources. I will update my issue tomorrow...
The text was updated successfully, but these errors were encountered: