Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upcMojangAPI segmentation faults and asserts #2830
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
@Dinnerbone could you give us a hint on this? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
madmaxoft
Dec 28, 2015
Member
That wasn't nice, there's a problem in our code, no need to bother Mojang people about it.
We clearly have a race condition in the networking code.
|
That wasn't nice, there's a problem in our code, no need to bother Mojang people about it. We clearly have a race condition in the networking code. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Seadragon91
Dec 28, 2015
Contributor
Looked with clang thread sanitizer. There is a data race on variable m_IsConnected BlockingSslClientSocket.cpp:341 (write) and BlockingSslClientSocket.cpp:255 (read). Not sure if that is the problem.
|
Looked with clang thread sanitizer. There is a data race on variable |
madmaxoft
self-assigned this
Dec 28, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tigerw
Dec 28, 2015
Member
Xoft, please try not to add too many queues and locks, I don't like queues and locks... :P
On 28 Dec 2015, at 08:27, "Lukas Pioch" notifications@github.com wrote:
Looked with clang thread sanitizer. There is a data race on variable m_IsConnected BlockingSslClientSocket.cpp:341 (write) and BlockingSslClientSocket.cpp:255 (read). Not sure if that is the problem.
—
Reply to this email directly or view it on GitHub.
|
Xoft, please try not to add too many queues and locks, I don't like queues and locks... :P On 28 Dec 2015, at 08:27, "Lukas Pioch" notifications@github.com wrote:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
I made a joke... just a joke |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Just make it an |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Seadragon91
Dec 28, 2015
Contributor
Yes used atomic and fixed then another data race, but not the cause of this problem.
|
Yes used atomic and fixed then another data race, but not the cause of this problem. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
madmaxoft
Dec 28, 2015
Member
I can't really pinpoint this, because I'm not getting a crash at all, no matter what I do. Has anyone run into this under Windows, or is it Linux- / Mac- specific?
I've identified one possible cause of trouble, m_IsConnected being set to true two times. And I'll make it an atomic bool, that can't hurt.
|
I can't really pinpoint this, because I'm not getting a crash at all, no matter what I do. Has anyone run into this under Windows, or is it Linux- / Mac- specific? I've identified one possible cause of trouble, |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
madmaxoft
Dec 28, 2015
Member
I've added a uuid command to the Debuggers plugin that queries the cMojangAPI for players' UUIDs. It contacts the Mojang servers each time an unknown name is queried, so using it on some non-existent username provides a simple way of testing the cBlockingSslClientSocket implementation. Can anyone verify that it indeed invokes the crash on their side?
|
I've added a |
madmaxoft
referenced this issue
Dec 28, 2015
Merged
Changed m_IsConnected to an atomic bool, added logging. #2831
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tigerw
Dec 28, 2015
Member
You can still get a thread switch after resetting m_Server but before setting m_IsConnected though.
PS: I could only reproduce the crash with some manual thread delaying using cEvent and this_thread::sleep
|
You can still get a thread switch after resetting m_Server but before setting m_IsConnected though. PS: I could only reproduce the crash with some manual thread delaying using cEvent and this_thread::sleep |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
madmaxoft
Dec 28, 2015
Member
That's why I swapped them in #2831. It should be okay if a thread sees m_IsConnected as false but m_Server still assigned.
Can you retry those tests that did break on the #2831 code?
I'm currently compiling on my Linux machine to see if I get any failures from the original code, then I'll test the new one, too.
|
That's why I swapped them in #2831. It should be okay if a thread sees Can you retry those tests that did break on the #2831 code? I'm currently compiling on my Linux machine to see if I get any failures from the original code, then I'll test the new one, too. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Seadragon91
Dec 28, 2015
Contributor
I changed m_TicksSinceLastPacket in ClientHandle.h to use atomic. Currently I get the assertion error or a null access to m_Socket in BlockingSslClientSocket.cpp:242.
With the command uuid, I have no crashes / problems.
|
I changed With the command uuid, I have no crashes / problems. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
madmaxoft
Dec 28, 2015
Member
@Seadragon91 are you using the uuid command with a non-existent playername?
|
@Seadragon91 are you using the |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Seadragon91
Dec 28, 2015
Contributor
Tested the uuid command with existent and non-existent player names. No crashes under windows and linux. But under windows I get messages from this issue #2418. I don't get them under linux.
|
Tested the |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
psilohanf
Dec 29, 2015
I am experiencing this every single time even with the newest revision of cuberite. My system is: Linux vs1191233.vserver.de 2.6.32-042stab111.12 #1 SMP Thu Sep 17 11:38:20 MSK 2015 x86_64 GNU/Linux (Debian GNU/Linux 8)
As soon as a user logs in, a segmentation fault happens...
psilohanf
commented
Dec 29, 2015
|
I am experiencing this every single time even with the newest revision of cuberite. My system is: |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
lstep
Dec 29, 2015
On Ubuntu, just cloned the latest revision, and compiled with debug activated. On my first run, just launching it (no client connected), I got the error:
[d68e6fdd5b532615|14:48:03] Cuberite local build build id: Unknown
[d68e6fdd5b532615|14:48:03] from commit id: approx: 7a778e291139624ad0733103d57b6e816c5a9ccc built at: approx: 2015-12-28 11:08:46 +0100
[d68e6fdd5b532615|14:48:03] Creating new server instance...
[...]
[8a0bca1c3ee8bb7a|14:48:04] Assertion failed: m_IsConnected, file /tmp/Cuberite2/cuberite/src/PolarSSL++/BlockingSslClientSocket.cpp, line 219
[4c7b68d3907bc990|14:48:04] BioGenCache: 8 hits, 2 misses, saved 80.00 %
[4c7b68d3907bc990|14:48:04] BioGenCache: Avg cache chain length: 0.12
./Cuberite_debug(_Z15PrintStackTracev+0x23)[0xadb417]
./Cuberite_debug(_ZN24cBlockingSslClientSocket7ReceiveEPvm+0x46)[0xdcef7a]
./Cuberite_debug(_ZN10cMojangAPI13SecureRequestERKSsS1_RSs+0x168)[0xc150de]
./Cuberite_debug(_ZN10cMojangAPI18QueryUUIDToProfileERKSs+0x215)[0xc17935]
./Cuberite_debug(_ZN10cMojangAPI6UpdateEv+0x2e8)[0xc18594]
./Cuberite_debug(_ZN10cMojangAPI13cUpdateThread7ExecuteEv+0x1f)[0xc18bf5]
./Cuberite_debug(_ZN9cIsThread9DoExecuteEv+0x2f)[0xad10f1]
./Cuberite_debug(_ZNKSt7_Mem_fnIM9cIsThreadFvvEEclIIEvEEvPS0_DpOT_+0x65)[0xad265d]
./Cuberite_debug(_ZNSt12_Bind_simpleIFSt7_Mem_fnIM9cIsThreadFvvEEPS1_EE9_M_invokeIILm0EEEEvSt12_Index_tupleIIXspT_EEE+0x43)[0xad25ad]
./Cuberite_debug(_ZNSt12_Bind_simpleIFSt7_Mem_fnIM9cIsThreadFvvEEPS1_EEclEv+0x1b)[0xad24b5]
./Cuberite_debug(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt7_Mem_fnIM9cIsThreadFvvEEPS3_EEE6_M_runEv+0x1c)[0xad244e]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d390)[0x7f0073666390]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f00738ee182]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f0072df147d]
Cuberite_debug: /tmp/Cuberite2/cuberite/src/PolarSSL++/BlockingSslClientSocket.cpp:219: int cBlockingSslClientSocket::Receive(void*, size_t): Assertion `0' failed.
But now I can't generate it anymore.
lstep
commented
Dec 29, 2015
|
On Ubuntu, just cloned the latest revision, and compiled with debug activated. On my first run, just launching it (no client connected), I got the error:
But now I can't generate it anymore. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
madmaxoft
Dec 29, 2015
Member
@lstep how about the uuid command with a non-existent player name, does that produce the assert for you? (note: you need to load the Debuggers plugin to have that command)
|
@lstep how about the |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
lstep
Dec 29, 2015
Nope, no assert generated (first one, lstep, is me, second one, nosuchuser1434234, doesn't exist)
uuid lstep
[770a5bdf600590d7|15:54:41] Executing console command: "uuid lstep"
[770a5bdf600590d7|15:54:41] Player lstep:
[770a5bdf600590d7|15:54:41] - in the cache: ""
[770a5bdf600590d7|15:54:41] - online: "6375d80ec44c43028c598ab0e6d7fb8f"
uuid nosuchuser1434234
[770a5bdf600590d7|15:54:51] Executing console command: "uuid nosuchuser1434234"
[770a5bdf600590d7|15:54:51] Player nosuchuser1434234:
[770a5bdf600590d7|15:54:51] - in the cache: ""
[770a5bdf600590d7|15:54:52] - online: ""
lstep
commented
Dec 29, 2015
|
Nope, no assert generated (first one, lstep, is me, second one, nosuchuser1434234, doesn't exist)
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
madmaxoft
Dec 29, 2015
Member
I'm baffled. It uses the same underlying function, cMojangAPI::SecureRequest, yet it fails in one case and doesn't in the other.
|
I'm baffled. It uses the same underlying function, |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Seadragon91
Dec 29, 2015
Contributor
Not sure if that is something.
https://gist.github.com/Seadragon91/c6f3c9aa28e0d9fb7e73
Ops: I have much more data races (cMojangAPI.cpp). Will create a gist, with the whole log file.
|
Not sure if that is something. Ops: I have much more data races (cMojangAPI.cpp). Will create a gist, with the whole log file. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Seadragon91
Dec 29, 2015
Contributor
Here is a whole file created with the clang threadsanitizer.
https://gist.github.com/Seadragon91/df3edf98b9aabc0a8ba5
|
Here is a whole file created with the clang threadsanitizer. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
worktycho
Dec 29, 2015
Member
@Seadragon91 What c++ standard lib are you using, because that looks like a race condition on shared_ptr ref counts?
|
@Seadragon91 What c++ standard lib are you using, because that looks like a race condition on shared_ptr ref counts? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
worktycho
Dec 29, 2015
Member
Is there any possiblity those race conditions are related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57507?
|
Is there any possiblity those race conditions are related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57507? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Seadragon91
Dec 29, 2015
Contributor
Yes, I think this is a false positive. Will look to use a newer version of clang. Have a few problems with linking. Currently using clang 3.6.
|
Yes, I think this is a false positive. Will look to use a newer version of clang. Have a few problems with linking. Currently using clang 3.6. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tigerw
Dec 29, 2015
Member
The assertion in debug builds is due to our apparently mistaken assumption that, if the server has closed the connection, no more data is forthcoming; LibEvent obviously says otherwise.
The segmentation fault in release builds is due to Mojang closing the connection sooner, exploiting, ahem, a threading problem.
|
The assertion in debug builds is due to our apparently mistaken assumption that, if the server has closed the connection, no more data is forthcoming; LibEvent obviously says otherwise. The segmentation fault in release builds is due to Mojang closing the connection sooner, exploiting, ahem, a threading problem. |
madmaxoft
changed the title from
Mojang sabotaged us :P (cMojangAPI segmentation faults)
to
cMojangAPI segmentation faults and asserts
Dec 29, 2015
madmaxoft
removed
status/needs confirmation
type/question
labels
Dec 29, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
madmaxoft
Dec 29, 2015
Member
@lstep, are you compiling your own executable? Could you try getting branch BeforeLibEventUpdate (don't forget to git submodule update), build and test it? I'm trying to find the cause of the sudden rise of this issue - whether we could attribute it to a change in LibEvent.
|
@lstep, are you compiling your own executable? Could you try getting branch |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
#2831 seems to have fixed this. |
tigerw commentedDec 27, 2015
We are currently experiencing technical difficulties with regards to connections to Mojang; references are as follows:
Potential analysis:
Problem 1
Mojang made their servers close the connection quicker. This means that
cBlockingSslClientSocket::OnDisconnectedis often called whilst cMojangAPI is still reading data from the SSL context (whatever that is :P). In Debug builds, this causes further calls tocBlockingSslClientSocket::Receiveto assert thatm_IsConnectedis unexpectedly false. This is known.Problem 2
OnDisconnectedruns on a separate thread from the thread cMojangAPI callsReceivefrom.OnDisconnectalso will reset them_Serverpointer of the above class. Thus, in Release builds, I suspect, but am not sure, that a concurrency issue occurs where afterm_Serverhas been reset but beforem_IsConnectedhas been set to false byOnDisconnect,cBlockingSslClientSocket::Disconnectis called automatically by the destruction of thecBlockingSslClientSocketobject (because cMojangAPI has completed its work super quickly, yeah) which attempts to dereference them_Serverto callcTCPLink::Close, but crashes sincem_Serverwas already reset.