Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ctrl-C randomly breaks out of Julia #9544

Closed
ncnc opened this issue Jan 1, 2015 · 11 comments
Closed

Ctrl-C randomly breaks out of Julia #9544

ncnc opened this issue Jan 1, 2015 · 11 comments
Labels
kind:bug Indicates an unexpected problem or unintended behavior system:windows Affects only Windows

Comments

@ncnc
Copy link
Contributor

ncnc commented Jan 1, 2015

Windows signal handling seems to be slightly broken in that Ctrl-C randomly exits Julia without any messages, like this:

julia> A = rand(500,500);

julia> A*A
# Pressed CTRL-C
D:\[path]\usr\bin>

Based on some debugging this is related to the jl_defer_signal flag in the sense that non-deferred signals work whereas deferred signals cause an exit to the command prompt.

My take is that the code translates the Windows signal CTRL_C_EVENT to SIGINT and later, in case of deferred signals, calls libc raise() in the JL_SIGATOMIC_END macro with the translated signal, which for some reason causes a different behavior than the original signal would have.

The following patch seems to remove the symptoms, but since I guess it effectively removes the signal translation layer I'm not all that sure of the correctness:

diff --git a/src/init.c b/src/init.c
index ae0ded2..5bb15b6 100644
--- a/src/init.c
+++ b/src/init.c
@@ -267,7 +267,7 @@ static BOOL WINAPI sigint_handler(DWORD wsig) //This needs winapi types to guara
         default: sig = SIGTERM; break;
     }
     if (jl_defer_signal) {
-        jl_signal_pending = sig;
+        jl_signal_pending = wsig;
     }
     else {
         jl_signal_pending = 0;

Edit: I'm using master
Edit2: 32-bit build

@jiahao jiahao added the system:windows Affects only Windows label Jan 1, 2015
@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 1, 2015

thanks for that analysis!

we have no less than 3 independent mechanisms for trapping async signals on windows (unix has 1). but raise uses a different mechanism. also, it's documented that you can't recover from that one (http://msdn.microsoft.com/en-us/library/xdkz3x12.aspx), so the solution presumably must be to not call raise. but that should save a syscall anyways (and be better for multi-threading). i'm just waiting for my builds to finish to test and commit the following patch:

diff --git a/src/julia.h b/src/julia.h
index 5115517..567ef59 100644
--- a/src/julia.h
+++ b/src/julia.h
@@ -1112,8 +1112,10 @@ DLLEXPORT extern volatile sig_atomic_t jl_defer_signal;
 #define JL_SIGATOMIC_END()                                      \
     do {                                                        \
         jl_defer_signal--;                                      \
-        if (jl_defer_signal == 0 && jl_signal_pending != 0)     \
-            raise(jl_signal_pending);                           \
+        if (jl_defer_signal == 0 && jl_signal_pending != 0) {   \
+            jl_signal_pending = 0;                              \
+            jl_throw(jl_interrupt_exception);                   \
+        }                                                       \
     } while(0)

@vtjnash vtjnash added the kind:bug Indicates an unexpected problem or unintended behavior label Jan 1, 2015
@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 1, 2015

what terminal client are you using (e.g. mintty, cygwin, conemu, command prompt, powershell, etc)? they each have their own quirks about how they deliver signals

also, a more direct test:

begin; while true; Base.sigatomic_begin(); sleep(1); Base.sigatomic_end(); end; end
^C

@ncnc
Copy link
Contributor Author

ncnc commented Jan 1, 2015

That patch works equally well for me. Are there any potential problems related to not using the value of jl_signal_pending, which contains the actual signal code?

While we're at it, I also seem to be able to fairly consistently trigger OpenBLAS-related EXCEPTION_ACCESS_VIOLATION exceptions using the same approach (i.e. hacking Ctrl-C while doing things).

julia> A*A
Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x6dcee2c0 -- DCABS1 at D:\[path]\usr\bin\libopenblas.DLL (unknown line)
DCABS1 at D:\[path]\usr\bin\libopenblas.DLL (unknown line)
unknown function (ip: 161958304)
unknown function (ip: 161958304)

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x623ab09f -- utf8proc_NFKC at D:\[path]\usr\bin\libjulia.dll (unknown line)
utf8proc_NFKC at D:\[path]\usr\bin\libjulia.dll (unknown line)
utf8proc_NFKC at D:\[path]\usr\bin\libjulia.dll (unknown line)

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 1, 2015

the signal code is always set to SIGINT, so no it's not a concern (if it ever becomes a concern, we could write a stub function that dispatches as needed, and still avoids the need to call raise)

the second issue you mention is probably #2622

@ncnc
Copy link
Contributor Author

ncnc commented Jan 1, 2015

I'm running your direct test, but for the first few minutes I see nothing special with either patched or unpatched versions. Maybe deferred signals are rare under this setup?

BTW, I'm using the command prompt, but getting somewhat fed up with the copy-paste behavior.

@ncnc
Copy link
Contributor Author

ncnc commented Jan 1, 2015

Sorry, I missed the ^C part. Now I see a consistent different between patched and unpatched.

I agree about the second issue.

@tkelman
Copy link
Contributor

tkelman commented Jan 2, 2015

@vtjnash seems like this can cause a segfault even on linux https://travis-ci.org/JuliaLang/julia/jobs/45657098

@ncnc
Copy link
Contributor Author

ncnc commented Jan 2, 2015

Master works for me now, thanks for fixing.

@tkelman
Copy link
Contributor

tkelman commented Jan 3, 2015

though the intermittent codegen segfaults on CI are no good...

@tkelman
Copy link
Contributor

tkelman commented Jan 6, 2015

Given the CI instability this one has caused, I don't think backporting this for 0.3.5 is a good idea. We can give it more testing time for 0.3.6.

@tkelman tkelman added this to the 0.3.6 milestone Jan 6, 2015
@tkelman tkelman modified the milestones: 0.3.7, 0.3.6 Feb 18, 2015
@tkelman
Copy link
Contributor

tkelman commented Mar 11, 2015

Likewise here, I don't plan on backporting the fix for this myself because it has messy conflicts that I don't know how to resolve. If anyone would like to see this resolved on 0.3.7 or later, please prepare a PR against release-0.3.

@tkelman tkelman removed this from the 0.3.7 milestone Jul 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug Indicates an unexpected problem or unintended behavior system:windows Affects only Windows
Projects
None yet
Development

No branches or pull requests

4 participants