Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel client (Windows/MSVC) crashes in server mode #2672

Closed
laszlocsomor opened this issue Mar 13, 2017 · 1 comment
Closed

Bazel client (Windows/MSVC) crashes in server mode #2672

laszlocsomor opened this issue Mar 13, 2017 · 1 comment
Assignees
Labels
P1 I'll work on this now. (Assignee required) platform: windows type: bug

Comments

@laszlocsomor
Copy link
Contributor

laszlocsomor commented Mar 13, 2017

The MSVC version (MSYS-less) bazel crashes if you run it in non-batch mode and the server isn't running.

The culprit is pretty complex and took days to debug. It consists of the following parts:

  1. I narrowed down the problem to the Bazel client trying to connect to the non-running server, failing to do so, and trying to print an error. The printing code was segfaulting.

    The printing logic attempted to translate the WSAGetLastError() result to a string. This was calling gpr_format_message which crashed while calling gpr_tchar_to_char. We built non-unicode, so gpr_tchar_to_char is implemented here as just gpr_strdup, which itself is implemented here.

    On the surface, everything looks fine: gpr_strdup returns char*, gpr_tchar_to_char returns that, and gpr_format_message stores that also in a char*, however the two pointers -- dst on the return site, message on the call site -- were consistently different.

    Turns out gpr_strdup was not declared before using it, and thanks to the ancient C feature of implicit function declaration (which can be turned into an error with /WE4013), the compiler implicitly declared it with int return type, which is apparently 4 bytes long on MSVC, while char* is of course 8 bytes long. So the upper 32 bits of message were corrupt.

    So due to a missing header inclusion, we ended up implicitly declaring a function with the wrong yet compatible signature, causing a corruption of the return value which is a pointer, causing a segfault down the line.

  2. We had no chance of catching it because the BUILD file of gRPC contains -Wnoimplicit-function-declaration line. Once @lberki patched this in d699a28, but we overwrote that in the next commit 8b3b918 and never noticed. All was working fine because gcc's int is 8 bytes long.

  3. And even if we did notice, the MSVC wrapper script ignores that warning.

@laszlocsomor laszlocsomor added platform: windows P1 I'll work on this now. (Assignee required) type: bug labels Mar 13, 2017
@laszlocsomor laszlocsomor self-assigned this Mar 13, 2017
bazel-io pushed a commit that referenced this issue Mar 14, 2017
See #2672

--
Change-Id: Idfbc1841cc4f448939000e58cc9712ab8daa1a2f
Reviewed-on: https://cr.bazel.build/9353
PiperOrigin-RevId: 150045801
MOS_MIGRATED_REVID=150045801
bazel-io pushed a commit that referenced this issue Mar 14, 2017
Fix the bug that the MSYS-less client couldn't
connect to the freshly started server and had to
be started again.

Fixes #2672
See #2107

--
PiperOrigin-RevId: 150069285
MOS_MIGRATED_REVID=150069285
@lberki
Copy link
Contributor

lberki commented Mar 14, 2017

Woahdude. That was one hell of a bug. Nice debugging! :)

bazel-io pushed a commit that referenced this issue Mar 15, 2017
Add a missing header inclusion to string_win32.c
which was resulting in an implicit function
declaration with the wrong, but compatible type,
causing a char* being converted to int, converted
to char*, corrupting the upper 32 bits, leading to
a segfault.

Fixes #2672

Change-Id: I805737c93c248f792b2c0f54fe15ab9a261575d2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) platform: windows type: bug
Projects
None yet
Development

No branches or pull requests

2 participants