New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PhantomJS immediately crashing after launch through Karma #903

Closed
thedavidharris opened this Issue Aug 15, 2016 · 54 comments

Comments

Projects
None yet
@thedavidharris

thedavidharris commented Aug 15, 2016

Please use the following bug reporting template to help produce actionable and reproducible issues:

  • A brief description

PhantomJS seems to be able to be launched standalone, but cannot be used with Karma. This issue has also been posted to the karma-phantomjs-launcher issues page with no workaround found. karma-runner/karma-phantomjs-launcher#124

I'm using a workaround to get past #468 since Webpack requires os.interfaces. Part of me wonders if this could be the cause.

  • Expected results

Headless execution of tests with PhantomJS

  • Actual results (with terminal output if applicable)
    `> NODE_ENV=test ./node_modules/karma/bin/karma start karma.conf.js

Info: No mac address - uniqid() falls back to uniqid.process(). no interfaces found
15 08 2016 17:30:27.960:INFO [karma]: Karma v1.2.0 server started at http://localhost:9876/
15 08 2016 17:30:27.963:INFO [launcher]: Launching browser PhantomJS with unlimited concurrency
15 08 2016 17:30:27.977:INFO [launcher]: Starting browser PhantomJS
15 08 2016 17:30:29.554:ERROR [phantomjs.launcher]: PhantomJS has crashed. Please read the bug reporting guide at
http://phantomjs.org/bug-reporting.html and file a bug report.

15 08 2016 17:30:29.556:ERROR [launcher]: Cannot start PhantomJS

15 08 2016 17:30:29.563:INFO [launcher]: Trying to start PhantomJS again (1/2).
15 08 2016 17:30:30.035:ERROR [phantomjs.launcher]: PhantomJS has crashed. Please read the bug reporting guide at
http://phantomjs.org/bug-reporting.html and file a bug report.

15 08 2016 17:30:30.036:ERROR [launcher]: Cannot start PhantomJS

15 08 2016 17:30:30.039:INFO [launcher]: Trying to start PhantomJS again (2/2).
15 08 2016 17:30:30.507:ERROR [phantomjs.launcher]: PhantomJS has crashed. Please read the bug reporting guide at
http://phantomjs.org/bug-reporting.html and file a bug report.

15 08 2016 17:30:30.508:ERROR [launcher]: Cannot start PhantomJS

15 08 2016 17:30:30.511:ERROR [launcher]: PhantomJS failed 2 times (cannot start). Giving up.`

  • Your Windows build number

Anniversary Update

  • Steps / commands required to reproduce the error

Run Karma with PhantomJS

  • Strace of the failing command

I can't figure out how to get a strace on this, but if someone knows how to I'll get it up.

  • Required packages and commands to install

Webpack, Karma, PhantomJS-prebuilt through npm, karma-webpack, karma-phantomjs-launcher is I think all needed to reproduce. The use case for this project is using Chai and Mocha.

@abdelkrimdev

This comment has been minimized.

abdelkrimdev commented Sep 1, 2016

the same issue here

@lephyrus

This comment has been minimized.

lephyrus commented Dec 12, 2016

I'm running into this issue as well, using build 14965. It is one of very few remaining issues that prevent me from switching to Windows on my job. I'm not very experienced with strace, but to the best of my knowledge, this might be the relevant bit:

clock_gettime(CLOCK_MONOTONIC, {575, 503701000}) = 0
socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 8
setsockopt(8, SOL_SOCKET, SO_OOBINLINE, [1], 4) = 0
connect(8, {sa_family=AF_INET, sin_port=htons(9876), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
getsockname(8, {sa_family=AF_INET, sin_port=htons(49794), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
getpeername(8, {sa_family=AF_INET, sin_port=htons(9876), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
getsockopt(8, SOL_SOCKET, SO_TYPE, [1], [4]) = 0
gettimeofday({1481549413, 920730}, NULL) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1072, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1072, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1072, ...}) = 0
gettimeofday({1481549413, 921534}, NULL) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1072, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1072, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1072, ...}) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0x400000077} ---
write(2, "PhantomJS has crashed. Please re"..., 127) = 127
tgkill(330, 330, SIGSEGV)               = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_TKILL, si_pid=330, si_uid=1000} ---
+++ killed by SIGSEGV (core dumped) +++

Then again, this might be a trace of the process that crashes because PhantomJS has crashed. Anyway, it's my best effort to hopefully drive this issue forward.

@lephyrus

This comment has been minimized.

lephyrus commented Dec 21, 2016

Following up on my own trace, my guess ist that SO_OOBINLINE is the most likely candidate for an unsupported socket option. Would be nice to get that acknowledged.

@moanrose

This comment has been minimized.

moanrose commented Dec 23, 2016

The same issue here

I am using build 14986. PhantomJS is working fine for both localhost and external hosts, but as soon as i point it to my karma webserver it crashes.

It could be related to the use of websockets, because karma uses socket.io

EDIT: It seems to be related to an underlying bug/shortcoming in WSL https://bugs.launchpad.net/keystoneauth/+bug/1614688

@aseering

This comment has been minimized.

Contributor

aseering commented Dec 23, 2016

To save folks a click-through: The underlying issue is WSL's lack of support for TCP_KEEPCNT as a socket flag.

@sunilmut

This comment has been minimized.

Member

sunilmut commented Dec 27, 2016

Thanks @aseering for chiming in. TCP_KEEPCNT will be available in the next insider build, whenever that is released.

@sunilmut sunilmut added the fixinbound label Dec 27, 2016

@lephyrus

This comment has been minimized.

lephyrus commented Dec 28, 2016

@moanrose I don't get how this is related to a socket flag used by keystoneauth. Seems random to me. I certainly don't see any use of TCP_KEEPCNT in my trace of PhantomJS crashing.

@sunilmut I'd be happy if a fix was inbound, but I'm not convinced TCP_KEEPCNT is the problem here. Can you clarify if the socket flags shown in the trace above (SO_OOBINLINE specifically) are supported at this time?

@moanrose

This comment has been minimized.

moanrose commented Dec 28, 2016

@lephyrus You are right, i have no empiric evidence whatsoever for my thesis. it was merely a suggestion. I stubled upon a issue about WSL & websockets, and i assumed it was related.

@sunilmut

This comment has been minimized.

Member

sunilmut commented Dec 28, 2016

@lephyrus - SO_OOBINLINE is supported, as you can also see from the strace setsockopt(8, SOL_SOCKET, SO_OOBINLINE, [1], 4) = 0 (return value of 0). I don't see anything else alarming in the trace. If someone can provide detailed repro steps (or a link for setup), I am happy to try it out at my end.

@lephyrus

This comment has been minimized.

lephyrus commented Dec 29, 2016

@sunilmut Thanks for the explanation - I didn't even realize that is the return value... 🙄 If you could take a look, that would be great.

Reproduction steps (assuming installed NodeJS):

  • mkdir karma-phantomjs-repro && cd karma-phantomjs-repro
  • npm install karma karma-phantomjs-launcher
  • ./node_modules/.bin/karma start --browsers PhantomJS

Expected output:
You need to include some adapter that implements __karma__.start method!

Actual output:
PhantomJS has crashed. Please [...] (three times)

@sunilmut sunilmut added the network label Jan 16, 2017

@ForsakenHarmony

This comment has been minimized.

ForsakenHarmony commented Jan 28, 2017

how long until the fix is in the insider build and how long until it's in a normal update?
(not an insider, because forced spying)

@bmayen

This comment has been minimized.

bmayen commented Feb 1, 2017

TCP_KEEPCNT went into 15002. Is this issue resolved now?

@ForsakenHarmony

This comment has been minimized.

ForsakenHarmony commented Feb 1, 2017

@bmayen test it

@moanrose

This comment has been minimized.

moanrose commented Feb 2, 2017

@sunilmut

This comment has been minimized.

Member

sunilmut commented Feb 2, 2017

@moanrose - thanks for trying it out. Can you share more details? Such as, which build did you try it on? And, repro steps? Preferably, also, a strace of the failing command.

@bmayen

This comment has been minimized.

bmayen commented Feb 2, 2017

I'll try to find some time today or tomorrow over my lunch break to kick the tires and report back

@lephyrus

This comment has been minimized.

lephyrus commented Feb 2, 2017

I'm not on 15002, but I've already provider (very simple) repro steps (#903 (comment)) and strace output (#903 (comment)). If that's not good enough, please be more specific.

@sunilmut

This comment has been minimized.

Member

sunilmut commented Feb 4, 2017

@lephyrus - Apologize for the confusion. My response was for @moanrose to make sure he was not seeing something else. I have tried your steps (thanks) locally, and, was able to repro the issue (crash) on 15022. But, I haven't gotten a chance to dig into the traces yet.

@sunilmut sunilmut removed the fixinbound label Feb 4, 2017

@lephyrus

This comment has been minimized.

lephyrus commented Feb 6, 2017

@sunilmut Sorry, my mistake. I just assumed that @moanrose is having the exact same issue, which I still think is probably the case.

@bmayen

This comment has been minimized.

bmayen commented Mar 10, 2017

@sunilmut, any progress on this? This bug is one of the main reasons I still have to keep a VM running :)

@rafaelubal

This comment has been minimized.

rafaelubal commented Apr 9, 2017

Hi all, just wanted to ping on this thread as well. The latests Windows 10 builds seem to have fixed a good number of issues in socket-related system calls, which have let me run most of my NodeJS web app on Windows bash. Except for my client-side unit tests based on PhantomJS, which seem to depend on this issue. Since I don't see much activity on here, I was wondering what the state is. Thanks all!

@aseering

This comment has been minimized.

Contributor

aseering commented Apr 9, 2017

Hi @rafaelubal -- I think the status is that we're waiting on someone who's experiencing this bug to reply to @sunilmut 's request here for an strace.

@sunilmut

This comment has been minimized.

Member

sunilmut commented Apr 10, 2017

I think there is enough information here for a repro. Just needs to be looked at and triaged.

@sebolio

This comment has been minimized.

sebolio commented Apr 11, 2017

so Creators Update was released but this problem persists, i can't run karma-phantomjs ! any ideas to fix this?

@aseering

This comment has been minimized.

Contributor

aseering commented Apr 11, 2017

@sebdelvalle -- that's a good question, but I think the answer is the same as it was yesterday :-) See the previous few posts.

Creators has actually been out for a.while now via Insider builds, which a bunch of us here are using.

If someone does come up with a workaround, it would be great if they posted it here.

@mbland

This comment has been minimized.

mbland commented Jun 3, 2017

Sorry for the thread spam (I spent so long editing that first reply!), but here's proof of my gdb run, FWIW:

In one tmux pane:

$ node_modules/.bin/live-server
Serving ".../wsl-websocket-crash-demo" at http://127.0.0.1:8080
GET /favicon.ico 404 34.852 ms - 24
Ready for changes

In another tmux pane:

$ gdb node_modules/phantomjs-prebuilt/lib/phantom/bin/phantomjs
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
...snip...
Reading symbols from node_modules/phantomjs-prebuilt/lib/phantom/bin/phantomjs...(no debugging symbols found)...done.
(gdb) set args visit-site.js
(gdb) break getsockopt
Breakpoint 1 at 0x40e080
(gdb) r
Starting program: .../wsl-websocket-crash-demo/node_modules/phantomjs-prebuilt/lib/phantom/bin/phantomjs visit-site.js
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffffd5a0700 (LWP 23006)]
[New Thread 0x7fffbcc70700 (LWP 23007)]
[New Thread 0x7fffb7ff0700 (LWP 23008)]
[New Thread 0x7fffb77e0700 (LWP 23009)]
[New Thread 0x7fffb6be0700 (LWP 23010)]
[Switching to Thread 0x7fffb6be0700 (LWP 23010)]

Thread 6 "Qt HTTP thread" hit Breakpoint 1, getsockopt () at
../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) c
Continuing.
[Switching to Thread 0x7fffff760740 (LWP 23002)]

Thread 1 "phantomjs" hit Breakpoint 1, getsockopt () at
../sysdeps/unix/syscall-template.S:84
84      in ../sysdeps/unix/syscall-template.S
(gdb) c
Continuing.

Thread 1 "phantomjs" received signal SIGSEGV, Segmentation fault.
0x0000000000bea7e9 in ?? ()
@therealkenc

This comment has been minimized.

Collaborator

therealkenc commented Jun 3, 2017

Did you run node_modules/.bin/live-server before starting gdb

I did not. I should have paid closer attention to all your steps. [I was just doing a quick look-see.]

here's proof of my gdb run, FWIW:

That's good progress. And your pselect6 theory sounds entirely plausible. If you can do a repro test case that will certainly earn a gold star.

@mbland

This comment has been minimized.

mbland commented Jun 4, 2017

I've managed to build a debug PhantomJS 2.1.1 binary and reproduce the crash. I've added a new GDB Logs section to my demo README, included below for easy reference. I'll try tracing through the code some more over the next few days to see if any other insights emerge.

GDB logs from a debug build of the PhantomJS 2.1.1 binary are in the gdb/ directory. gdb/stack-trace.txt shows that the crash is definitely WebSocket-related, but the crash now happens due to a failing ASSERT, ostensibly because of a null pointer reference:

ASSERTION FAILED: handle == m_handle
Modules/websockets/WebSocketChannel.cpp(257) : virtual void WebCore::WebSocketChannel::didOpenSocketStream(WebCore::SocketStreamHandle*)
...snip...

Thread 1 "phantomjs" received signal SIGSEGV, Segmentation fault.
0x000000000a1edf70 in WTFCrash () at wtf/Assertions.cpp:345
345     wtf/Assertions.cpp: No such file or directory.

(gdb) where
#0  0x000000000a1edf70 in WTFCrash () at wtf/Assertions.cpp:345#1  0x000000000907f702 in WebCore::WebSocketChannel::didOpenSocketStream (this=0xdd396b0, handle=0xdd2dcb0) at Modules/websockets/WebSocketChannel.cpp:257
#2  0x0000000009098158 in WebCore::SocketStreamHandlePrivate::socketConnected (this=0xdd2dd50) at platform/network/qt/SocketStreamHandleQt.cpp:107
...snip...

(gdb) up
#1  0x000000000907f702 in WebCore::WebSocketChannel::didOpenSocketStream (this=0xdce4b90, handle=0xdcc0a50) at Modules/websockets/WebSocketChannel.cpp:257
257     Modules/websockets/WebSocketChannel.cpp: No such file or directory.
(gdb) print handle
$1 = (WebCore::SocketStreamHandle *) 0xdcc0a50
(gdb) print m_handle
$2 = {m_ptr = 0x0}

gdb/connect.txt shows the location of the connect() calls leading up to the crash. The first specifies QAbstractSocket::IPv4Protocol:

#0  connect () at ../sysdeps/unix/syscall-template.S:84
#1  0x000000000a6bdebd in qt_safe_connect (sockfd=7, addr=0x7fffb631f380, addrlen=16)
    at ../../include/QtNetwork/5.5.1/QtNetwork/private/../../../../../src/network/socket/qnet_unix_p.h:141
#2  0x000000000a6be856 in QNativeSocketEnginePrivate::nativeConnect (this=0x7fffa00048c0, addr=..., port=8080) at socket/qnativesocketengine_unix.cpp:389
#3  0x000000000a6bb68a in QNativeSocketEngine::connectToHost (this=0x7fffa00047c0, address=..., port=8080) at socket/qnativesocketengine.cpp:541
#4  0x000000000a6b1167 in QAbstractSocketPrivate::_q_connectToNextAddress (this=0x7fffa0003970) at socket/qabstractsocket.cpp:1130
#5  0x000000000a6b0ef9 in QAbstractSocketPrivate::_q_startConnecting (this=0x7fffa0003970, hostInfo=...) at socket/qabstractsocket.cpp:1067
#6  0x000000000a6b2711 in QAbstractSocket::connectToHost (this=0x7fffa00032c0, hostName=..., port=8080, openMode=..., protocol=QAbstractSocket::IPv4Protocol)
    at socket/qabstractsocket.cpp:1652
#7  0x000000000a7008a2 in QHttpNetworkConnectionChannel::ensureConnection (this=0x7fffa00017c8) at access/qhttpnetworkconnectionchannel.cpp:354

the second, QAbstractSocket::AnyIPProtocol:

#0  connect () at ../sysdeps/unix/syscall-template.S:84
#1  0x000000000a6bdebd in qt_safe_connect (sockfd=8, addr=0x7ffffffdb350, addrlen=16)
    at ../../include/QtNetwork/5.5.1/QtNetwork/private/../../../../../src/network/socket/qnet_unix_p.h:141
#2  0x000000000a6be856 in QNativeSocketEnginePrivate::nativeConnect (this=0xdcc6310, addr=..., port=8080) at socket/qnativesocketengine_unix.cpp:389
#3  0x000000000a6bb68a in QNativeSocketEngine::connectToHost (this=0xdd215b0, address=..., port=8080) at socket/qnativesocketengine.cpp:541
#4  0x000000000a6b1167 in QAbstractSocketPrivate::_q_connectToNextAddress (this=0xdd3e490) at socket/qabstractsocket.cpp:1130
#5  0x000000000a6b0ef9 in QAbstractSocketPrivate::_q_startConnecting (this=0xdd3e490, hostInfo=...) at socket/qabstractsocket.cpp:1067
#6  0x000000000a6b2711 in QAbstractSocket::connectToHost (this=0xdd3cb40, hostName=..., port=8080, openMode=..., protocol=QAbstractSocket::AnyIPProtocol)
    at socket/qabstractsocket.cpp:1652
#7  0x0000000009097aa6 in WebCore::SocketStreamHandlePrivate::SocketStreamHandlePrivate (this=0xdd29ba0, streamHandle=0xdd29b00, url=...)
    at platform/network/qt/SocketStreamHandleQt.cpp:70
#8  0x0000000009098cc1 in WebCore::SocketStreamHandle::SocketStreamHandle (this=0xdd29b00, url=..., client=0xdd452b0)
    at platform/network/qt/SocketStreamHandleQt.cpp:190
#9  0x000000000907e2c0 in WebCore::SocketStreamHandle::create (url=..., client=0xdd452b0) at platform/network/qt/SocketStreamHandle.h:58
#10 0x000000000907e91c in WebCore::WebSocketChannel::connect (this=0xdd452b0, url=..., protocol=...) at Modules/websockets/WebSocketChannel.cpp:114
#11 0x000000000907b899 in WebCore::WebSocket::connect (this=0xdd44f60, url=..., protocols=..., ec=@0x7ffffffdb8cc: 0) at Modules/websockets/WebSocket.cpp:289
#12 0x000000000907abf7 in WebCore::WebSocket::create (context=0xdcc3830, url=..., protocols=..., ec=@0x7ffffffdb8cc: 0) at Modules/websockets/WebSocket.cpp:186
#13 0x000000000907aaba in WebCore::WebSocket::create (context=0xdcc3830, url=..., ec=@0x7ffffffdb8cc: 0) at Modules/websockets/WebSocket.cpp:173

gdb/pselect.txt shows that the program makes use of the pselect() system call, which may or may not be related to the issue, via qt_safe_select in qt/qtbase/src/corelib/kernel/qcore_unix.cpp.

@mbland

This comment has been minimized.

mbland commented Jun 4, 2017

Yet more: I've collected stack traces from connect() breakpoints on native Ubuntu and added them to my repo as gdb/connect-native.txt. Updated analysis from the README follows:

gdb/connect.txt shows the location of the connect() calls leading up to the crash on WSL, and gdb/connect-native.txt shows the connect() calls for a successful run on native Ubuntu. Of note:

  • Whereas WSL spawns one connection from clone(), native Ubuntu spawns two.
  • Both spawn a connection for a WebSocket.
  • Native Ubuntu appears to spawn one more connection from an incoming event,
    while WSL appears to crash while spawining the previous connection (see
    gdb/stack-trace.txt).
  • The WSL connect() and crash stacks begin to diverge in QAbstractSocketPrivate::_q_connectToNextAddress:
    • The connect() stack points to socket/qabstractsocket.cpp:1130
    • The crash stack points to socket/qabstractsocket.cpp:1132

Top of the connect() stack before the crash:

#0  connect () at ../sysdeps/unix/syscall-template.S:84
#1  0x000000000a6bdebd in qt_safe_connect (sockfd=8, addr=0x7ffffffdb350, addrlen=16)
    at ../../include/QtNetwork/5.5.1/QtNetwork/private/../../../../../src/network/socket/qnet_unix_p.h:141
#2  0x000000000a6be856 in QNativeSocketEnginePrivate::nativeConnect (this=0xdcc6310, addr=..., port=8080) at socket/qnativesocketengine_unix.cpp:389
#3  0x000000000a6bb68a in QNativeSocketEngine::connectToHost (this=0xdd215b0, address=..., port=8080) at socket/qnativesocketengine.cpp:541
#4  0x000000000a6b1167 in QAbstractSocketPrivate::_q_connectToNextAddress (this=0xdd3e490) at socket/qabstractsocket.cpp:1130
#5  0x000000000a6b0ef9 in QAbstractSocketPrivate::_q_startConnecting (this=0xdd3e490, hostInfo=...) at socket/qabstractsocket.cpp:1067
#6  0x000000000a6b2711 in QAbstractSocket::connectToHost (this=0xdd3cb40, hostName=..., port=8080, openMode=..., protocol=QAbstractSocket::AnyIPProtocol)
    at socket/qabstractsocket.cpp:1652
#7  0x0000000009097aa6 in WebCore::SocketStreamHandlePrivate::SocketStreamHandlePrivate (this=0xdd29ba0, streamHandle=0xdd29b00, url=...)
    at platform/network/qt/SocketStreamHandleQt.cpp:70
#8  0x0000000009098cc1 in WebCore::SocketStreamHandle::SocketStreamHandle (this=0xdd29b00, url=..., client=0xdd452b0)
    at platform/network/qt/SocketStreamHandleQt.cpp:190
#9  0x000000000907e2c0 in WebCore::SocketStreamHandle::create (url=..., client=0xdd452b0) at platform/network/qt/SocketStreamHandle.h:58
#10 0x000000000907e91c in WebCore::WebSocketChannel::connect (this=0xdd452b0, url=..., protocol=...) at Modules/websockets/WebSocketChannel.cpp:114
#11 0x000000000907b899 in WebCore::WebSocket::connect (this=0xdd44f60, url=..., protocols=..., ec=@0x7ffffffdb8cc: 0) at Modules/websockets/WebSocket.cpp:289

Top of the stack trace from the crash:

#0  0x000000000a1edf70 in WTFCrash () at wtf/Assertions.cpp:345#1  0x000000000907f702 in WebCore::WebSocketChannel::didOpenSocketStream (this=0xdd396b0, handle=0xdd2dcb0) at Modules/websockets/WebSocketChannel.cpp:257
#2  0x0000000009098158 in WebCore::SocketStreamHandlePrivate::socketConnected (this=0xdd2dd50) at platform/network/qt/SocketStreamHandleQt.cpp:107
#3  0x000000000909931d in WebCore::SocketStreamHandlePrivate::qt_static_metacall (_o=0xdd2dd50, _c=QMetaObject::InvokeMetaMethod, _id=0, _a=0x7ffffffdb360)
    at .moc/moc_SocketStreamHandlePrivate.cpp:108
#4  0x000000000a94484e in QMetaObject::activate (sender=0xdce5270, signalOffset=7, local_signal_index=1, argv=0x0) at kernel/qobject.cpp:3713
#5  0x000000000a94403e in QMetaObject::activate (sender=0xdce5270, m=0xd731240 <QAbstractSocket::staticMetaObject>, local_signal_index=1, argv=0x0)
    at kernel/qobject.cpp:3578
#6  0x000000000a6b5b23 in QAbstractSocket::connected (this=0xdce5270) at .moc/moc_qabstractsocket.cpp:367
#7  0x000000000a6b1a10 in QAbstractSocketPrivate::fetchConnectionParameters (this=0xdcb9720) at socket/qabstractsocket.cpp:1321
#8  0x000000000a6b1177 in QAbstractSocketPrivate::_q_connectToNextAddress (this=0xdcb9720) at socket/qabstractsocket.cpp:1132
#9  0x000000000a6b0ef9 in QAbstractSocketPrivate::_q_startConnecting (this=0xdcb9720, hostInfo=...) at socket/qabstractsocket.cpp:1067
#10 0x000000000a6b2711 in QAbstractSocket::connectToHost (this=0xdce5270, hostName=..., port=8080, openMode=..., protocol=QAbstractSocket::AnyIPProtocol)
    at socket/qabstractsocket.cpp:1652
#11 0x0000000009097aa6 in WebCore::SocketStreamHandlePrivate::SocketStreamHandlePrivate (this=0xdd2dd50, streamHandle=0xdd2dcb0, url=...)
    at platform/network/qt/SocketStreamHandleQt.cpp:70
#12 0x0000000009098cc1 in WebCore::SocketStreamHandle::SocketStreamHandle (this=0xdd2dcb0, url=..., client=0xdd396b0)
    at platform/network/qt/SocketStreamHandleQt.cpp:190
#13 0x000000000907e2c0 in WebCore::SocketStreamHandle::create (url=..., client=0xdd396b0) at platform/network/qt/SocketStreamHandle.h:58
#14 0x000000000907e91c in WebCore::WebSocketChannel::connect (this=0xdd396b0, url=..., protocol=...) at Modules/websockets/WebSocketChannel.cpp:114
#15 0x000000000907b899 in WebCore::WebSocket::connect (this=0xdd20ae0, url=..., protocols=..., ec=@0x7ffffffdb8cc: 0) at Modules/websockets/WebSocket.cpp:289
@mbland

This comment has been minimized.

mbland commented Jun 4, 2017

Musing some more on the fact the following appears in logs/Ubuntu-17.04-debug/strace.log, and not in the crashing WSL log:

connect(8, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16)
  = -1 EINPROGRESS (Operation now in progress)
write(3, "\1\0\0\0\0\0\0\0", 8)         = 8
pselect6(9, [3], [8], [], {0, 0}, {NULL, 8}) = 2 (in [3], out [8], left {0, 0})
read(3, "\1\0\0\0\0\0\0\0", 8)          = 8

I did a quick search for "WSL EINPROGRESS" and found #1584. Of particular interest is the comment that all connect() calls are nonblocking:

WSL currently does not support non-blocking connect. All connect request are treated as blocking.

Indeed, our WebSockets appear to be nonblocking:

socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 8

Will dig deeper on the suspicion that the lack of EINPROGRESS support may be wreaking havoc with the low-level QT socket libs.

@mbland

This comment has been minimized.

mbland commented Jun 4, 2017

OK, pretty sure I nailed it: it is the lack of nonblocking connect() that's causing the crash. What follows is my analysis from the "Culprit" section of my demo repo.

cc: @sunilmut @therealkenc

At this point I'm reasonably certain that the failure is due to the present lack of support for nonblocking connect() calls on WSL (#1584) throwing off the low-level QT socket libraries. The presence of pselect() calls in the successful, native Ubuntu strace logs was a hint: nonblocking connect() calls typically return EINPROGRESS, after which clients should use select() (or pselect()) to wait for the socket to be writable.

Compare the system calls from logs/Ubuntu-17.04-debug/strace.log:

socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 8
setsockopt(8, SOL_SOCKET, SO_OOBINLINE, [1], 4) = 0
connect(8, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
write(3, "\1\0\0\0\0\0\0\0", 8)         = 8
pselect6(9, [3], [8], [], {0, 0}, {NULL, 8}) = 2 (in [3], out [8], left {0, 0})
read(3, "\1\0\0\0\0\0\0\0", 8)          = 8
connect(8, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
getsockname(8, {sa_family=AF_INET, sin_port=htons(53438), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
getpeername(8, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
getsockopt(8, SOL_SOCKET, SO_TYPE, [1], [4]) = 0
write(3, "\1\0\0\0\0\0\0\0", 8)         = 8
write(1, "STATUS: success\n", 16STATUS: success
)       = 16

to those from logs/wsl-debug/strace.log:

socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 8
setsockopt(8, SOL_SOCKET, SO_OOBINLINE, [1], 4) = 0
connect(8, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
getsockname(8, {sa_family=AF_INET, sin_port=htons(62351), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
getpeername(8, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
getsockopt(8, SOL_SOCKET, SO_TYPE, [1], [4]) = 0
write(2, "ASSERTION FAILED: handle == m_ha"..., 37ASSERTION FAILED: handle == m_handle
) = 37
write(2, "Modules/websockets/WebSocketChan"..., 137Modules/websockets/WebSocketChannel.cpp(257) : virtual void WebCore::WebSocketChannel::didOpenSocketStream(WebCore::SocketStreamHandle*)
) = 137

...snip...

--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xbbadbeef} ---
write(2, "PhantomJS has crashed. Please re"..., 127PhantomJS has crashed. Please read the bug reporting guide at
<http://phantomjs.org/bug-reporting.html> and file a bug report.
) = 127
tgkill(25568, 25568, SIGSEGV)           = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_TKILL, si_pid=25568, si_uid=1000} ---
+++ killed by SIGSEGV (core dumped) +++
.../wsl-websocket-crash-demo/run-demo: line 79: 25566 Segmentation fault      (core dumped) "${PHANTOMJS_CMD[@]}"

We'll piece together three related stack traces from gdb/socket.txt, gdb/connect.txt, and gdb/stack-trace.txt to demonstrate how this series of system calls were generated and led to the crash. All of them come from a call to WebCore::WebSocket::create, which results in a call to QAbstractSocket::connectToHost:

#10 0x000000000a6b2711 in QAbstractSocket::connectToHost (this=0xdce5270, hostName=..., port=8080, openMode=..., protocol=QAbstractSocket::AnyIPProtocol)
    at socket/qabstractsocket.cpp:1652
#11 0x0000000009097aa6 in WebCore::SocketStreamHandlePrivate::SocketStreamHandlePrivate (this=0xdd2dd50, streamHandle=0xdd2dcb0, url=...)
    at platform/network/qt/SocketStreamHandleQt.cpp:70
#12 0x0000000009098cc1 in WebCore::SocketStreamHandle::SocketStreamHandle (this=0xdd2dcb0, url=..., client=0xdd396b0)
    at platform/network/qt/SocketStreamHandleQt.cpp:190
#13 0x000000000907e2c0 in WebCore::SocketStreamHandle::create (url=..., client=0xdd396b0) at platform/network/qt/SocketStreamHandle.h:58
#14 0x000000000907e91c in WebCore::WebSocketChannel::connect (this=0xdd396b0, url=..., protocol=...) at Modules/websockets/WebSocketChannel.cpp:114
#15 0x000000000907b899 in WebCore::WebSocket::connect (this=0xdd20ae0, url=..., protocols=..., ec=@0x7ffffffdb8cc: 0) at Modules/websockets/WebSocket.cpp:289
#16 0x000000000907abf7 in WebCore::WebSocket::create (context=0xdcc3610, url=..., protocols=..., ec=@0x7ffffffdb8cc: 0) at Modules/websockets/WebSocket.cpp:186
#17 0x000000000907aaba in WebCore::WebSocket::create (context=0xdcc3610, url=..., ec=@0x7ffffffdb8cc: 0) at Modules/websockets/WebSocket.cpp:173

The paths to QT source files that follow are relative to the root of the PhantomJS repo at tag 2.1.1 after running python ./build.py -d. The common starting point for each stack trace is this line from QAbstractSocket::connectToHost at src/qt/qtbase/src/network/socket/qabstractsocket.cpp:1652:

        d->_q_startConnecting(info);

The stack trace for the WSL socket() call (gdb/socket.txt) file shows that at this point, the the underlying socket is created by a call to QNativeSocketEnginePrivate::createNewSocket at src/qt/qtbase/src/network/socket/qnativesocketengine_unix.cpp:219. This line creates a socket with the nonblocking option hardcoded:

  int socket = qt_safe_socket(protocol, type, 0, O_NONBLOCK);

The stack trace for the WSL connect() call (gdb/connect.txt) shows a call to QAbstractSocketPrivate::_q_connectToNextAddress at src/qt/qtbase/src/network/socket/qabstractsocket.cpp:1130, which is the beginnning of this block:

        // Tries to connect to the address. If it succeeds immediately
        // (localhost address on BSD or any UDP connect), emit
        // connected() and return.
        if (socketEngine->connectToHost(host, port)) {
            //_q_testConnection();
            fetchConnectionParameters();
            return;
        }

This QNativeSocketEngine::connectToHost call (note: different from the QAbstractSocket::connectToHost call higher in the stack) in turn calls QNativeSocketEnginePrivate::nativeConnect at src/qt/qtbase/src/network/socket/qnativesocketengine_unix.cpp:389, which points to a call to qt_safe_connect:

    int connectResult = qt_safe_connect(socketDescriptor, sockAddrPtr, sockAddrSize);
#if defined (QNATIVESOCKETENGINE_DEBUG)
    int ecopy = errno;
#endif
    if (connectResult == -1) {
        switch (errno) {
        // ...snip ...
        case EINPROGRESS:
        case EALREADY:
            setError(QAbstractSocket::UnfinishedSocketOperationError,
InvalidSocketErrorString);
            socketState = QAbstractSocket::ConnectingState;
            break;
        // ...snip...
        default:
            break;
        }

        if (socketState != QAbstractSocket::ConnectedState) {
            // ...snip...
            return false;
        }
    }

    // ...snip...

    socketState = QAbstractSocket::ConnectedState;
    return true;

If the underlying connect() and qt_safe_connect had returned EINPROGRESS, the correct socketState would've been set, NativeSocketEnginePrivate::nativeConnect would've returned false, and QAbstractSocketPrivate::_q_connectToNextAddress would've set up the pselect() notification. Instead, connect() returns zero and the function returns true, leading to the QAbstractSocketPrivate::fetchConnectionParameters call at src/qt/qtbase/src/network/socket/qabstractsocket.cpp:1132:

        // Tries to connect to the address. If it succeeds immediately
        // (localhost address on BSD or any UDP connect), emit
        // connected() and return.
        if (socketEngine->connectToHost(host, port)) {
            //_q_testConnection();
            fetchConnectionParameters();
            return;
        }

This in turn produces our final stack trace:

#0  0x000000000a1edf70 in WTFCrash () at wtf/Assertions.cpp:345#1  0x000000000907f702 in WebCore::WebSocketChannel::didOpenSocketStream (this=0xdd396b0, handle=0xdd2dcb0) at Modules/websockets/WebSocketChannel.cpp:257
#2  0x0000000009098158 in WebCore::SocketStreamHandlePrivate::socketConnected (this=0xdd2dd50) at platform/network/qt/SocketStreamHandleQt.cpp:107
#3  0x000000000909931d in WebCore::SocketStreamHandlePrivate::qt_static_metacall (_o=0xdd2dd50, _c=QMetaObject::InvokeMetaMethod, _id=0, _a=0x7ffffffdb360)
    at .moc/moc_SocketStreamHandlePrivate.cpp:108
#4  0x000000000a94484e in QMetaObject::activate (sender=0xdce5270, signalOffset=7, local_signal_index=1, argv=0x0) at kernel/qobject.cpp:3713
#5  0x000000000a94403e in QMetaObject::activate (sender=0xdce5270, m=0xd731240 <QAbstractSocket::staticMetaObject>, local_signal_index=1, argv=0x0)
    at kernel/qobject.cpp:3578
#6  0x000000000a6b5b23 in QAbstractSocket::connected (this=0xdce5270) at .moc/moc_qabstractsocket.cpp:367
#7  0x000000000a6b1a10 in QAbstractSocketPrivate::fetchConnectionParameters (this=0xdcb9720) at socket/qabstractsocket.cpp:1321
#8  0x000000000a6b1177 in QAbstractSocketPrivate::_q_connectToNextAddress (this=0xdcb9720) at socket/qabstractsocket.cpp:1132
#9  0x000000000a6b0ef9 in QAbstractSocketPrivate::_q_startConnecting (this=0xdcb9720, hostInfo=...) at socket/qabstractsocket.cpp:1067
#10 0x000000000a6b2711 in QAbstractSocket::connectToHost (this=0xdce5270, hostName=..., port=8080, openMode=..., protocol=QAbstractSocket::AnyIPProtocol)
    at socket/qabstractsocket.cpp:1652
#11 0x0000000009097aa6 in WebCore::SocketStreamHandlePrivate::SocketStreamHandlePrivate (this=0xdd2dd50, streamHandle=0xdd2dcb0, url=...)
    at platform/network/qt/SocketStreamHandleQt.cpp:70
#12 0x0000000009098cc1 in WebCore::SocketStreamHandle::SocketStreamHandle (this=0xdd2dcb0, url=..., client=0xdd396b0)
    at platform/network/qt/SocketStreamHandleQt.cpp:190
#13 0x000000000907e2c0 in WebCore::SocketStreamHandle::create (url=..., client=0xdd396b0) at platform/network/qt/SocketStreamHandle.h:58
#14 0x000000000907e91c in WebCore::WebSocketChannel::connect (this=0xdd396b0, url=..., protocol=...) at Modules/websockets/WebSocketChannel.cpp:114
#15 0x000000000907b899 in WebCore::WebSocket::connect (this=0xdd20ae0, url=..., protocols=..., ec=@0x7ffffffdb8cc: 0) at Modules/websockets/WebSocket.cpp:289
#16 0x000000000907abf7 in WebCore::WebSocket::create (context=0xdcc3610, url=..., protocols=..., ec=@0x7ffffffdb8cc: 0) at Modules/websockets/WebSocket.cpp:186
#17 0x000000000907aaba in WebCore::WebSocket::create (context=0xdcc3610, url=..., ec=@0x7ffffffdb8cc: 0) at Modules/websockets/WebSocket.cpp:173

The assertion in WebCore::WebSocketChannel::didOpenSocketStream at src/qt/qtwebkit/Source/WebCore/Modules/websockets/WebSocketChannel.cpp:257 is:

void WebSocketChannel::didOpenSocketStream(SocketStreamHandle* handle)
{
    LOG(Network, "WebSocketChannel %p didOpenSocketStream()", this);
    ASSERT(handle == m_handle);

which under further inspection in GDB reveals:

(gdb) up
#1  0x000000000907f702 in WebCore::WebSocketChannel::didOpenSocketStream (this=0xdce4b90, handle=0xdcc0a50) at Modules/websockets/WebSocketChannel.cpp:257
257     Modules/websockets/WebSocketChannel.cpp: No such file or directory.
(gdb) print handle
$1 = (WebCore::SocketStreamHandle *) 0xdcc0a50
(gdb) print m_handle
$2 = {m_ptr = 0x0}
@sunilmut

This comment has been minimized.

Member

sunilmut commented Jun 6, 2017

@mbland (and @therealkenc) great debugging and details. Thanks for digging to the bottom of this. Yes, non-blocking connect is definitely in our backlog. So, far, it (the lack of non-blocking connect) was only causing marginal problems (such as warnings) etc. But, with this it seems like we are getting into functional issues without proper support for non-blocking connect. We will try to prioritize this. Marking this as dupe of #1584. Thanks again for the great investigation!

@bmayen

This comment has been minimized.

bmayen commented Jun 7, 2017

Amazing work everybody!

@kayakyakr

This comment has been minimized.

kayakyakr commented Jun 27, 2017

Good to find this. It's the first blocker I've run into in about 2 months since switching to using WSL for dev full time.

I mean, aside from vi being broken

@sunilmut

This comment has been minimized.

Member

sunilmut commented Aug 2, 2017

Just an update that non-blocking connect is on it's way and will also be available on Fall Creators Update.

@Bradleycorn

This comment has been minimized.

Bradleycorn commented Oct 10, 2017

@sunilmut any update on this? Looks like the Fall Creators Update is due to land a week from today. Is the fix for this issue ready?

@benhillis

This comment has been minimized.

Member

benhillis commented Oct 10, 2017

@Bradleycorn - As Sunil said above, this fix will be in Fall Creator's Update.

@Bradleycorn

This comment has been minimized.

Bradleycorn commented Oct 10, 2017

Awesome thanks @benhillis. The "also" in @sunilmut's comment threw me off. I was thinking a specific fix for this might have been released on it's own. I'll watch for the Creators Update.

@Bradleycorn

This comment has been minimized.

Bradleycorn commented Oct 17, 2017

I have the Fall Creators Update, and I'm still seeing this issue ... @benhillis @sunilmut anything we need to do to get non-blocking connect working after installing the update?

@mbland

This comment has been minimized.

mbland commented Oct 20, 2017

@Bradleycorn Are you sure you've still got the issue, or that you've actually got the FCU installed? Here's what I see in my WSL environment after installing the FCU (not sure what it was before):

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.3 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.3 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

$ uname -a
Linux win81 4.4.0-43-Microsoft #1-Microsoft Wed Dec 31 14:42:53 PST 2014 x86_64 x86_64 x86_64 GNU/Linux

The reason I ask is becuase I'm happy to report that this is what I see running in my mbland/custom-links project at commit 2fa5a50 (after having to do an explicit npm install phantomjs-prebuilt):

$ karma start
19 10 2017 21:35:54.966:INFO [framework.browserify]: registering rebuild (autoWatch=true)
19 10 2017 21:35:55.018:INFO [framework.detect-browsers]: The following browsers were detected on your system: [ 'PhantomJS' ]
19 10 2017 21:35:55.956:INFO [framework.browserify]: 55328 bytes written (0.18 seconds)
19 10 2017 21:35:55.957:INFO [framework.browserify]: bundle built
19 10 2017 21:35:55.959:WARN [karma]: No captured browser, open http://localhost:9876/
19 10 2017 21:35:55.969:INFO [karma]: Karma v1.7.0 server started at http://0.0.0.0:9876/
19 10 2017 21:35:55.969:INFO [launcher]: Launching browser PhantomJS with unlimited concurrency
19 10 2017 21:35:55.984:INFO [launcher]: Starting browser PhantomJS
19 10 2017 21:35:56.381:INFO [PhantomJS 2.1.1 (Linux 0.0.0)]: Connected on socket wTYvEJKagY0MDC0VAAAA with id 57737398

START:
  Custom Links
    showView
      ✔ does not show any view until called
      ✔ shows the landing page view when no other view set
      ✔ shows the landing page view when the hash ID is empty
      ✔ shows the landing page view when the ID is a hash only
      ✔ doesn't change the view when the hash ID is unknown
      ✔ passes the hash view parameter to the view function
      ✔ calls the done() callback if present
      ✔ shows the landing view when the container isn't empty
      ✔ logs to console.error without changing the view on error
...
[...snip...]
...
    completeEditLinkView
      ✔ fills in the link information
      the target URL form
        ✔ sets the active element to the target field
        ✔ sets the target URL value to the existing target
        ✔ selects the entire target URL value
        ✔ assigns a submit handler to call cl.updateTarget
      the change owner form
        ✔ assigns a submit handler to call cl.changeOwner

Finished in 1.098 secs / 0.578 secs @ 21:35:57 GMT-0400 (DST)

SUMMARY:
✔ 123 tests completed

Perhaps @sunilmut and @benhillis may want others to confirm as well, but from my perspective, this and #1584 can be closed. :-)

@sunilmut

This comment has been minimized.

Member

sunilmut commented Oct 30, 2017

@Bradleycorn - Could you share the output of the ver Windows command?

@sunilmut

This comment has been minimized.

Member

sunilmut commented Oct 30, 2017

@mbland - Thanks for the validation. This issue should be resolved. Once @Bradleycorn issue is resolved, I will close this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment