Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CStruct bind_attribute can write beyond malloc'd buffer. #1369

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nwc10
Copy link
Contributor

@nwc10 nwc10 commented Oct 26, 2020

Please review this commit with the assumption that my proposed fix is incorrect.

bind_attribute had been calling set_ptr_at_offset for all CArrays, whether inlined or not. That writes a pointer at the slot offset.

For the corner case of the inlined array being both the last member in the structure and smaller than the size of a pointer
(eg HAS int32 @.b[1] is CArray; on a 64 bit system), this causes a pointer write beyond the end of the allocated storage.
More generally, on a system which faults unaligned pointer writes, this write faults in the more realistic case where the previous structure member(s) don't add up to pointer-sized alignment.

This is clearly a bug.

Minimum test case I can find is

use v6;

use NativeCall;

class InlinedArrayInStruct is repr('CStruct') {
    has int32 $.a is rw;
    HAS int32 @.b[1] is CArray;
}

my $iais = InlinedArrayInStruct.new();

with this result:

==31060== Invalid write of size 8
==31060==    at 0x51126B6: set_ptr_at_offset (CStruct.c:362)
==31060==    by 0x5113814: bind_attribute (CStruct.c:651)
==31060==    by 0x503B798: MVM_interp_run (interp.c:2018)
==31060==    by 0x51D646D: MVM_vm_run_file (moar.c:486)
==31060==    by 0x109784: main (main.c:305)
==31060==  Address 0xe1ef4b4 is 4 bytes inside a block of size 8 alloc'd
==31060==    at 0x4C2DBC5: calloc (vg_replace_malloc.c:711)
==31060==    by 0x51105AA: MVM_calloc (alloc.h:11)
==31060==    by 0x51129CA: initialize (CStruct.c:413)
==31060==    by 0x503A4AA: MVM_interp_run (interp.c:1880)
==31060==    by 0x51D646D: MVM_vm_run_file (moar.c:486)
==31060==    by 0x109784: main (main.c:305)

ASAN doesn't seem to spot this.

Run this on sparc64 and it will SIGBUS, because that write is not properly aligned. This is how I found the problem.

It seems that the write to the slot is not needed in this case. The next case statement down was changed by commit e4ff694 to breaking out of the switch statement for the inlined case.

I am assuming that this is the correct fix here. Please could someone who actually understands this code double check this.

Looking further through the code, it seems to me that the logic of commit e4ff694 for the type == MVM_CSTRUCT_ATTR_CSTRUCT case really ought to be duplicated further down for the MVM_CSTRUCT_ATTR_CPPSTRUCT and MVM_CSTRUCT_ATTR_CUNION cases. If I take the test case from rakudo/rakudo#3687 and change repr('CStruct') to repr('CPPStruct') then it no longer works. And then the layout of CPPstruct.c suggests that it too needs all of this...

There seems to be a lot of repetition here. It's unclear to me how much is unavoidable.

…rray.

`bind_attribute` had been calling `set_ptr_at_offset` for all `CArray`s,
whether inlined or not. That writes a pointer at the slot offset.

For the corner case of the inlined array being both the last member in the
structure and smaller than the size of a pointer
(eg `HAS int32 @.b[1] is CArray;` on a 64 bit system),
this causes a pointer write beyond the end of the allocated storage.
More generally, on a system which faults unaligned pointer writes, this
write faults in the more realistic case where the previous structure
member(s) don't add up to pointer-sized alignment.

This is clearly a bug.

It seems that the write to the slot is not needed in this case. This commit
takes the same approach as commit e4ff694 from May 2020:
    Setup inlined CStruct assigned during construction

    Fixes rakudo/rakudo#3687

by `break`ing out of the switch statement for the inlined case.

The bug itself seems to date back to the code first being added in Aug 2018
by commit ac3d3c7:
    Fix alignment/mem issue of inlined CArrays

    Now CArrays that are inlined in CStructs are handled correctly
    regarding alignment of the CArray within the struct. Also accessing
    the slots of the inlined array works.
@nwc10 nwc10 added the bug label Oct 26, 2020
@nwc10 nwc10 requested a review from niner October 26, 2020 10:21
= (char *)body->cstruct + repr_data->struct_offsets[slot];
if (repr_data->attribute_locations[slot] & MVM_CSTRUCT_ATTR_INLINED) {
((MVMCArray *)value)->body.storage
= (char *)body->cstruct + repr_data->struct_offsets[slot];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sets up sharing of the storage between the array inlined into the CStruct and the array that's bound to that attribute. It does so by replacing the array's storage pointer with a pointer to the appropriate place in the CStruct. If we do just that, we lose the actual values in the array and I think we even leak the array's storage memory (it doesn't get freed and will not be referenced anymore). This will work for a constructor that initializes the attribute with a zero'ed out array, but not when doing an actual $!my-inlined-array-attribute := CArray.new(1, 2, 3); (except for the memory leak).
That's the point of the memcpy in the inlined CStruct case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. At this point, I don't know what the correct fix is, nor do I have a good feel for how to write a regression test to confirm this.

I also) think that here you're commenting on code that effectively I didn't change - this bug is present already in master?

To be clear - I am now out of my depth. I can't fix this bug.

Copy link
Contributor

@scovit scovit Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nwc10 same here: when I fixed the CStruct case, I felt like the CArray case was more complex and out of my depth: an Array has a dynamic storage while a struct is very fixed and defined. Clearly the code is leaky and buggy but I don't feel like there is a simple solution for the leak while there might easily be a simple fixup for the bug. Right?

@niner I am not sure that we should free the storage by assigning. I would stick with an interpretation where binding corresponds to copy for inlined storages. I would not free for instance memory that has been allocated in a C library otherwise that will cause an error. The user (and third party libraries) can take care of managing the memory and not do anything inherently leaky.

That is why I am for replacing the (in my opinion) weird HAS syntax for inlined storages in NativeCall with a more clear is copy trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback.

Clearly the code is leaky and buggy but I don't feel like there is a simple solution for the leak while there might easily be a simple fixup for the bug. Right?

I hope so, but I don't know. I'm confident that I'm way out of my depth on how this code is supposed to work - ie I'm not going to get to the point of understanding it well enough to comment with any accuracy any time soon. (I am confident that both manifestations of the bug are correctly described)

That is why I am for replacing the (in my opinion) weird HAS syntax for inlined storages

I was totally unfamiliar with any of this (I started down this road with a SIGBUS on sparc64 in a test) and somewhat lazily asked about HAS instead of attempting search the docs. So I can't say authoritatively on this either, but how easy is it to search English language documentation case-sensitively with most tools easily available? Most search engines searches for HAS are going to show results for has, if they don't decide that it is a stop word and refuse to search for it at all. So for reasons other than your (understandable) "weird", I'm not sure if it is the best syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @niner is commenting on is a bug already present in master - it's #1294

My patch to remove the assignment to cobj is orthogonal to that bug.

However, I think I'm making more sense of what the other bug is about. However, sigh, there's far too much encapsulation leakage between CArray, CStruct, CUnion and CPPStruct

Does anything take the code blocks in the official docs (eg in https://docs.raku.org/language/nativecall) and attempt to run those as regression tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anything take the code blocks in the official docs (eg in https://docs.raku.org/language/nativecall) and attempt to run those as regression tests?

I think there is. @JJ or @coke probably know the details.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test that ensures all the code blocks don't have any compile time errors. We're not checking to see if the code runs and produces the expected output. I think there was a ticket for checking the output, but it would require a bit of effort to standardize expected output (if any) and skip testing for the large number of cases where there's nothing to check. nativecall in particular would be very hard as we'd need to manage the external dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anything take the code blocks in the official docs (eg in https://docs.raku.org/language/nativecall) and attempt to run those as regression tests?

I think there is. @JJ or @coke probably know the details.

Not really. They're just syntax-checked

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants