New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CStruct bind_attribute can write beyond malloc'd buffer. #1369
base: main
Are you sure you want to change the base?
CStruct bind_attribute can write beyond malloc'd buffer. #1369
Conversation
…rray. `bind_attribute` had been calling `set_ptr_at_offset` for all `CArray`s, whether inlined or not. That writes a pointer at the slot offset. For the corner case of the inlined array being both the last member in the structure and smaller than the size of a pointer (eg `HAS int32 @.b[1] is CArray;` on a 64 bit system), this causes a pointer write beyond the end of the allocated storage. More generally, on a system which faults unaligned pointer writes, this write faults in the more realistic case where the previous structure member(s) don't add up to pointer-sized alignment. This is clearly a bug. It seems that the write to the slot is not needed in this case. This commit takes the same approach as commit e4ff694 from May 2020: Setup inlined CStruct assigned during construction Fixes rakudo/rakudo#3687 by `break`ing out of the switch statement for the inlined case. The bug itself seems to date back to the code first being added in Aug 2018 by commit ac3d3c7: Fix alignment/mem issue of inlined CArrays Now CArrays that are inlined in CStructs are handled correctly regarding alignment of the CArray within the struct. Also accessing the slots of the inlined array works.
= (char *)body->cstruct + repr_data->struct_offsets[slot]; | ||
if (repr_data->attribute_locations[slot] & MVM_CSTRUCT_ATTR_INLINED) { | ||
((MVMCArray *)value)->body.storage | ||
= (char *)body->cstruct + repr_data->struct_offsets[slot]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sets up sharing of the storage between the array inlined into the CStruct and the array that's bound to that attribute. It does so by replacing the array's storage pointer with a pointer to the appropriate place in the CStruct. If we do just that, we lose the actual values in the array and I think we even leak the array's storage memory (it doesn't get freed and will not be referenced anymore). This will work for a constructor that initializes the attribute with a zero'ed out array, but not when doing an actual $!my-inlined-array-attribute := CArray.new(1, 2, 3);
(except for the memory leak).
That's the point of the memcpy in the inlined CStruct case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. At this point, I don't know what the correct fix is, nor do I have a good feel for how to write a regression test to confirm this.
I also) think that here you're commenting on code that effectively I didn't change - this bug is present already in master
?
To be clear - I am now out of my depth. I can't fix this bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nwc10 same here: when I fixed the CStruct case, I felt like the CArray case was more complex and out of my depth: an Array has a dynamic storage while a struct is very fixed and defined. Clearly the code is leaky and buggy but I don't feel like there is a simple solution for the leak while there might easily be a simple fixup for the bug. Right?
@niner I am not sure that we should free the storage by assigning. I would stick with an interpretation where binding corresponds to copy for inlined storages. I would not free for instance memory that has been allocated in a C library otherwise that will cause an error. The user (and third party libraries) can take care of managing the memory and not do anything inherently leaky.
That is why I am for replacing the (in my opinion) weird HAS
syntax for inlined storages in NativeCall with a more clear is copy
trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback.
Clearly the code is leaky and buggy but I don't feel like there is a simple solution for the leak while there might easily be a simple fixup for the bug. Right?
I hope so, but I don't know. I'm confident that I'm way out of my depth on how this code is supposed to work - ie I'm not going to get to the point of understanding it well enough to comment with any accuracy any time soon. (I am confident that both manifestations of the bug are correctly described)
That is why I am for replacing the (in my opinion) weird
HAS
syntax for inlined storages
I was totally unfamiliar with any of this (I started down this road with a SIGBUS
on sparc64 in a test) and somewhat lazily asked about HAS
instead of attempting search the docs. So I can't say authoritatively on this either, but how easy is it to search English language documentation case-sensitively with most tools easily available? Most search engines searches for HAS are going to show results for has, if they don't decide that it is a stop word and refuse to search for it at all. So for reasons other than your (understandable) "weird", I'm not sure if it is the best syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What @niner is commenting on is a bug already present in master - it's #1294
My patch to remove the assignment to cobj
is orthogonal to that bug.
However, I think I'm making more sense of what the other bug is about. However, sigh, there's far too much encapsulation leakage between CArray, CStruct, CUnion and CPPStruct
Does anything take the code blocks in the official docs (eg in https://docs.raku.org/language/nativecall) and attempt to run those as regression tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anything take the code blocks in the official docs (eg in https://docs.raku.org/language/nativecall) and attempt to run those as regression tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a test that ensures all the code blocks don't have any compile time errors. We're not checking to see if the code runs and produces the expected output. I think there was a ticket for checking the output, but it would require a bit of effort to standardize expected output (if any) and skip testing for the large number of cases where there's nothing to check. nativecall in particular would be very hard as we'd need to manage the external dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anything take the code blocks in the official docs (eg in https://docs.raku.org/language/nativecall) and attempt to run those as regression tests?
Not really. They're just syntax-checked
Please review this commit with the assumption that my proposed fix is incorrect.
bind_attribute
had been callingset_ptr_at_offset
for allCArray
s, whether inlined or not. That writes a pointer at the slot offset.For the corner case of the inlined array being both the last member in the structure and smaller than the size of a pointer
(eg
HAS int32 @.b[1] is CArray;
on a 64 bit system), this causes a pointer write beyond the end of the allocated storage.More generally, on a system which faults unaligned pointer writes, this write faults in the more realistic case where the previous structure member(s) don't add up to pointer-sized alignment.
This is clearly a bug.
Minimum test case I can find is
with this result:
ASAN doesn't seem to spot this.
Run this on sparc64 and it will
SIGBUS
, because that write is not properly aligned. This is how I found the problem.It seems that the write to the slot is not needed in this case. The next case statement down was changed by commit e4ff694 to
break
ing out of the switch statement for the inlined case.I am assuming that this is the correct fix here. Please could someone who actually understands this code double check this.
Looking further through the code, it seems to me that the logic of commit e4ff694 for the
type == MVM_CSTRUCT_ATTR_CSTRUCT
case really ought to be duplicated further down for theMVM_CSTRUCT_ATTR_CPPSTRUCT
andMVM_CSTRUCT_ATTR_CUNION
cases. If I take the test case from rakudo/rakudo#3687 and changerepr('CStruct')
torepr('CPPStruct')
then it no longer works. And then the layout ofCPPstruct.c
suggests that it too needs all of this...There seems to be a lot of repetition here. It's unclear to me how much is unavoidable.