You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The MSVC version (MSYS-less) bazel crashes if you run it in non-batch mode and the server isn't running.
The culprit is pretty complex and took days to debug. It consists of the following parts:
I narrowed down the problem to the Bazel client trying to connect to the non-running server, failing to do so, and trying to print an error. The printing code was segfaulting.
On the surface, everything looks fine: gpr_strdup returns char*, gpr_tchar_to_char returns that, and gpr_format_message stores that also in a char*, however the two pointers -- dst on the return site, message on the call site -- were consistently different.
Turns out gpr_strdup was not declared before using it, and thanks to the ancient C feature of implicit function declaration (which can be turned into an error with /WE4013), the compiler implicitly declared it with int return type, which is apparently 4 bytes long on MSVC, while char* is of course 8 bytes long. So the upper 32 bits of message were corrupt.
So due to a missing header inclusion, we ended up implicitly declaring a function with the wrong yet compatible signature, causing a corruption of the return value which is a pointer, causing a segfault down the line.
We had no chance of catching it because the BUILD file of gRPC contains -Wnoimplicit-function-declaration line. Once @lberkipatched this in d699a28, but we overwrote that in the next commit 8b3b918 and never noticed. All was working fine because gcc's int is 8 bytes long.
Fix the bug that the MSYS-less client couldn't
connect to the freshly started server and had to
be started again.
Fixes#2672
See #2107
--
PiperOrigin-RevId: 150069285
MOS_MIGRATED_REVID=150069285
Add a missing header inclusion to string_win32.c
which was resulting in an implicit function
declaration with the wrong, but compatible type,
causing a char* being converted to int, converted
to char*, corrupting the upper 32 bits, leading to
a segfault.
Fixes#2672
Change-Id: I805737c93c248f792b2c0f54fe15ab9a261575d2
The MSVC version (MSYS-less) bazel crashes if you run it in non-batch mode and the server isn't running.
The culprit is pretty complex and took days to debug. It consists of the following parts:
I narrowed down the problem to the Bazel client trying to connect to the non-running server, failing to do so, and trying to print an error. The printing code was segfaulting.
The printing logic attempted to translate the
WSAGetLastError()
result to a string. This was callinggpr_format_message
which crashed while callinggpr_tchar_to_char
. We built non-unicode, sogpr_tchar_to_char
is implemented here as justgpr_strdup
, which itself is implemented here.On the surface, everything looks fine:
gpr_strdup
returnschar*
,gpr_tchar_to_char
returns that, andgpr_format_message
stores that also in achar*
, however the two pointers --dst
on the return site,message
on the call site -- were consistently different.Turns out
gpr_strdup
was not declared before using it, and thanks to the ancient C feature of implicit function declaration (which can be turned into an error with/WE4013
), the compiler implicitly declared it withint
return type, which is apparently 4 bytes long on MSVC, whilechar*
is of course 8 bytes long. So the upper 32 bits ofmessage
were corrupt.So due to a missing header inclusion, we ended up implicitly declaring a function with the wrong yet compatible signature, causing a corruption of the return value which is a pointer, causing a segfault down the line.
We had no chance of catching it because the BUILD file of gRPC contains
-Wnoimplicit-function-declaration
line. Once @lberki patched this in d699a28, but we overwrote that in the next commit 8b3b918 and never noticed. All was working fine because gcc's int is 8 bytes long.And even if we did notice, the MSVC wrapper script ignores that warning.
The text was updated successfully, but these errors were encountered: