-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect initialization of global arrays that are 2^32 bytes or larger #57353
Comments
just a note, seems like this isnt a problem within clang but llvm codegen, https://godbolt.org/z/dEo36dW43 at least with bad_int, with bad_char it is initializing to zeros. I'll have to investigate deeper because I don't know the codebase yet, but this seems relevant. I don't know actually, you might want to ignore this comment, but that means somehow VectorType is doing something fishy, because all it does (understandably) is extending Type with methods to get the number of elements etc |
See
|
@llvm/issue-subscribers-good-first-issue |
Hello! Thanks for pointing that out, i had to do something else (if you're wondering). Currently I don't have an environment to actually fix this, but if this isn't in a very high priority, can I fix this as I actually set up my machine? |
Sure; this appears to be a longstanding issue, so no rush. |
Thanks! |
See also #54766. |
interestingly, both master and 14.0.6 fail on my machine in the parsing part, and seems like godbolt does too https://godbolt.org/z/59oqh7na7 (this is c++, not c) |
Hello! Today I found both problems. Of course that I just didn't pay attention to where exactly I was pointed to. |
I think this is related to #57317 |
This was originally reported as the bug for C code, not the issue for C++, so it should probably remain the bug for C code. The C++ crash should be a different bug (not sure if there's an existing report for it). |
Of course. Would you be willing to reply to my comment? |
I don't want to discuss the C++ bug too much on the "wrong" bug report. File that report if it doesn't exist, and CC me on it. But probably if we can, we want to split the initializer the same way we end up doing for C code. And in the case where we can't split, we probably run out of memory on most computers, so probably fine to just report an error. |
As an LLVM user, I'd find it clearest and least confusing to just reject any array that exceeds this 4GB size limit earlier on, in semantic analysis/type checking, regardless of how it's initialized. Right now, it looks like This seems appropriate especially because from the discussion on #57317 it looks like multiple spots in the code base assume that array sizes can fit into 32 bits. (I don't mean to extend the discussion of the C++ problem here but I think that fix would apply to both C and C++) |
Yea, on both occasions that made the problems appear (that assumption). And it would fix both |
I suspect enforcing a 32-bit limit on array sizes would break existing code. And "it's too much work to make LLVM clean for 64-bit array sizes" doesn't really feel like a compelling argument; we've only seen a few scattered issues with accidentally truncating 64-bit sizes to 32 bits. |
Makes sense, I agree. |
If you're interested in working on this, patches are welcome. IR generation in Clang will probably have to completely change how we emit constant initialization of uninitialized elements of large arrays in order to produce code with acceptable resource usage here. Right now, we have a special case for fully bitwise-zero initializers, but when that doesn't match we'll need to produce something like That's assuming that the AST is already reasonable, of course. IIRC, the AST doesn't make any effort to use a sparse representation in general, but it does have a special representation for trailing zero initializers. As a result, we'd probably blow out memory if you used a designator to initialize just the billionth element of an array like this. But we can tackle that separately. |
Hello, I have a branch that does compile (correctly) the example above! https://github.com/OfekShochat/llvm-project/blob/big-test/clang/lib/CodeGen/CGExprConstant.cpp#L1227 |
Opened a diff for this issue, it was just a unsigned/uint64_t issue, expectedly. Thanks, https://reviews.llvm.org/D133648 |
Hi @OfekShochat , can I maybe try to work on this issue? |
Current state:
|
This is part of the fixes to address llvm#57353 https://reviews.llvm.org/D133845
This is part of the fixes to address #57353 https://reviews.llvm.org/D133845 Pull Request: #92334
Based on @OfekShochat's https://reviews.llvm.org/D133648 init.c is the primary test for array initialization, but it uses a 32-bit triple. Add the new test to array-init.c instead. Fix llvm#57353
Based on @OfekShochat's https://reviews.llvm.org/D133648 init.c is the primary test for array initialization, but it uses a 32-bit triple, which would lead to an "array is too large" error. Add the new test to array-init.c instead. Fix llvm#57353
Based on @OfekShochat's https://reviews.llvm.org/D133648 init.c is the primary test for array initialization, but it uses a 32-bit triple, which would lead to an "array is too large" error. Add the new test to array-init.c instead. Fix #57353 Pull Request: #92473
@llvm/issue-subscribers-clang-codegen Author: None (nlsandler)
C programs with partially initialized arrays that are larger than 2^32 bytes aren't compiled correctly:
* `char` arrays get initialized entirely to zero, even if they have a non-zero initializer in the original source code
* when initializing arrays of other types, LLVM emits a `.space` or `.zero` directive with a negative byte count, which is rejected by the assembler.
char bad_char[4294967296] = {1};
char ok_char[4294967295u] = {1};
int bad_int[1073741824] = {1};
int ok_int[1073741823] = {1}; Compiling this with .section __TEXT,__text,regular,pure_instructions
.build_version macos, 10, 15 sdk_version 10, 15
.globl _bad_char ## @<!-- -->bad_char
.zerofill __DATA,__common,_bad_char,4294967296,4
.section __DATA,__data
.globl _ok_char ## @<!-- -->ok_char
.p2align 4
_ok_char:
.byte 1 ## 0x1
.space 4294967294
.globl _bad_int ## @<!-- -->bad_int
.p2align 4
_bad_int:
.long 1 ## 0x1
.space 4294967292
.space -4294967296
.globl _ok_int ## @<!-- -->ok_int
.p2align 4
_ok_int:
.long 1 ## 0x1
.space 4294967288
.subsections_via_symbols
That's from compiling with Clang 11.0, but it looks like the issue exists in the most recent version too: https://godbolt.org/z/eobvhbf7s |
C programs with partially initialized arrays that are larger than 2^32 bytes aren't compiled correctly:
char
arrays get initialized entirely to zero, even if they have a non-zero initializer in the original source code.space
or.zero
directive with a negative byte count, which is rejected by the assembler.Compiling this with
clang -S -arch x86_64 bad_static_arrays.c
produces:That's from compiling with Clang 11.0, but it looks like the issue exists in the most recent version too: https://godbolt.org/z/eobvhbf7s
The text was updated successfully, but these errors were encountered: