Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dart hangs randomly on Windows #26400

Closed
kendfinger opened this issue May 4, 2016 · 21 comments

Comments

@kendfinger
Copy link
Contributor

commented May 4, 2016

We have had really serious issues with Dart (1.15.0 and 1.16.0) on Windows.
On Windows 10, we are not having issues, and our HTTP Server is running for a whole week so far.
On Windows 7 and Windows Server 2012 R2 Datacenter, the Dart VM hangs with no errors every day. It is running as a Windows Service in this case. No errors are ever reported.

I filed this separately from #25582, as that seems to be more development-tools related.

@mezoni

This comment has been minimized.

Copy link

commented May 4, 2016

The same problem. Atom.io plugin called dartlang almost always stops to analyze the code.
As far as I know it (plugin) uses Dart analysis server (which is the problem).
I don't think that the problem in the analysis server but problem is in that the server stops working after some time (it is impossible to describe in detail because it is not subject to the understanding).

@mezoni

This comment has been minimized.

Copy link

commented May 4, 2016

What about the thing if the Dart developers should compile Dart VM with source symbols, run some tools and when it "hangs" then attach .NET debugger and seeing it where it loops?

P.S.
I like debuggers!
My favorites was these very cool tools:

  • Turbo Debugger
  • SoftIce
  • WinIce

Interactive disassembler IDA Pro also was my coolest and useful tool!
...but this was a long time ago.

Как говорится "отладчик вам в руки" и проблема будет решена в мгновенье ока.

@kendfinger

This comment has been minimized.

Copy link
Contributor Author

commented May 4, 2016

@mezoni That's probably the easiest way to figure it out.

@floitschG floitschG added the area-vm label May 4, 2016

@zengyun261

This comment has been minimized.

Copy link

commented May 15, 2016

+1
I don't know how to solve the problem.

OS: Windows Server 2012, Windows 10
Dart version: 1.14,1.15,1.16...

@kendfinger

This comment has been minimized.

Copy link
Contributor Author

commented May 15, 2016

@zengyun261 Is that a debugging session with a stacktrace when Dart froze?

@zengyun261

This comment has been minimized.

Copy link

commented May 15, 2016

runtime/vm/os_thread_win.cc: void OSThread::Join(ThreadJoinId id)

Because the thread handle was closed,after the thread exits, the thread ID may be used by other processes, OpenThread returns the thread ID from other processes, WaitForSingleObject can not be returned.
By Google Translate :(

@zanderso

This comment has been minimized.

Copy link
Member

commented May 15, 2016

@zengyun261 Many thanks for tracking this down. There are actually two problems here. First, the thread id recycling problem, and second: a worker thread shouldn't add itself to the idle list until after reaping exited threads. I will take a look at these.

@kaendfinger If you could still verify that you aren't hitting a different problem, that would be helpful. Thanks!

@zanderso zanderso self-assigned this May 15, 2016

@kendfinger

This comment has been minimized.

Copy link
Contributor Author

commented May 15, 2016

@zanderso Will do.

zanderso added a commit that referenced this issue May 17, 2016
Uses an open thread handle as the ThreadJoinId on Windows.
Also:
- Reaps exited threads in the thread pool before putting
a thread on the idle list so that a new arriving task
isn't blocked on a supposedly idle thread in the middle
of a join.
- Stops trying to join eventhandler threads on
Windows. Now that we're using the correct exit() call,
we probably don't have to worry about exit code pollution,
so joining the threads is unnecessary.

related #26400

R=asiva@google.com, iposva@google.com

Review URL: https://codereview.chromium.org/1978153002 .
@zanderso

This comment has been minimized.

Copy link
Member

commented May 18, 2016

Update: The above change should fix @zengyun261's hang. @zengyun261 if you're able to verify the fix that would be very helpful! @kaendfinger we think it's likely that you were hitting the same problem, so it might be worthwhile seeing if it's gone after the change. This change will be in the next dev release, you can also grab a recent bleeding edge release e.g. from here: https://gsdview.appspot.com/dart-archive/channels/be/raw/138232/sdk/dartsdk-windows-x64-release.zip

@kendfinger

This comment has been minimized.

Copy link
Contributor Author

commented May 18, 2016

@zanderso Sounds good, we will do some testing and verify the issue is no longer present.

@julemand101

This comment has been minimized.

Copy link

commented May 20, 2016

Seems to have fixed the problem I had with random hangs on my Windows 8 machine. My WebSocket server has now run in over 24 hours (before the fix, my server hangs after a few hours). I will let the server run over the weekend and see if it still runs on Monday.

@kendfinger

This comment has been minimized.

Copy link
Contributor Author

commented May 20, 2016

@zanderso Things are looking better for us as well.

@zanderso

This comment has been minimized.

Copy link
Member

commented May 20, 2016

Great! I filed an issue to get the fix merged into the stable channel.

@mit-mit

This comment has been minimized.

Copy link
Member

commented May 23, 2016

The next stable release is roughly a week from now -- can the fix in stable wait until then?

whesse added a commit that referenced this issue May 24, 2016
Uses an open thread handle as the ThreadJoinId on Windows.
Also:
- Reaps exited threads in the thread pool before putting
a thread on the idle list so that a new arriving task
isn't blocked on a supposedly idle thread in the middle
of a join.
- Stops trying to join eventhandler threads on
Windows. Now that we're using the correct exit() call,
we probably don't have to worry about exit code pollution,
so joining the threads is unnecessary.

related #26400

R=asiva@google.com, iposva@google.com

Review URL: https://codereview.chromium.org/1978153002 .
@mit-mit

This comment has been minimized.

Copy link
Member

commented May 25, 2016

Closing this as per above the fix is in. It will be available in 1.17 stable scheduled for next week.

@mit-mit mit-mit closed this May 25, 2016

@mit-mit mit-mit added this to the 1.17 milestone May 25, 2016

@zanderso

This comment has been minimized.

Copy link
Member

commented May 25, 2016

Reopening to wait for customers to try the 1.16 patch release when it is available.

@zanderso zanderso reopened this May 25, 2016

@zanderso zanderso removed this from the 1.17 milestone May 25, 2016

@mit-mit

This comment has been minimized.

Copy link
Member

commented May 25, 2016

@zanderso per my comment two days ago we are not expecting to do a 1.16 stable patch release, buy rather release this in the first 1.17 stable build scheduled for next week. Does that not work?

@zanderso

This comment has been minimized.

Copy link
Member

commented May 25, 2016

@mit-mit no one responded to your question in the affirmative, so I assumed we were still doing a patch release. I don't believe this change was cherry-picked into the dev branch, so I don't believe it will show up in 1.17 stable unless we patch it into 1.16 stable. @whesse please confirm that you are still working on a 1.16 patch release?

@zanderso

This comment has been minimized.

Copy link
Member

commented May 31, 2016

1.16.1 stable has been released with this fix. @kaendfinger if you could give it a shot that'd be great. Thanks!

@mit-mit mit-mit closed this Jun 1, 2016

@zanderso zanderso reopened this Jun 1, 2016

@zanderso

This comment has been minimized.

Copy link
Member

commented Jun 1, 2016

@mit-mit Thanks for your concern about this issue. I will close it myself after @kaendfinger verifies that the 1.16.1 release does not exhibit the problem.

@zanderso

This comment has been minimized.

Copy link
Member

commented Jun 8, 2016

@kaendfinger reported offline that there have been no hangs since switching to 1.16.1, so I will close.

@zanderso zanderso closed this Jun 8, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.