Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C impl: Prevent memcpy undefined behavior #4

Merged
merged 1 commit into from
Jan 10, 2020

Conversation

guidovranken
Copy link
Contributor

This prevents a (potential, depending on the caller) case of undefined behavior in the call to memcpy in chunk_state_fill_buf. This can happen when blake3_hasher_update is called with input=NULL and input_len=0. Passing a null pointer to memcpy is undefined behavior even if the n argument is 0.

Empy C++ vectors (can) behave this way, eg.:

std::vector<uint8_t> v;
blake3_hasher_update(&hasher, v.data(), v.size());

@oconnor663
Copy link
Member

Thanks for catching this. I probably won't have time to merge it today, but I should get to it tomorrow. For now, I'll note that this code is currently only used to test the other C code; the Rust crate does not build this file.

@cemeyer
Copy link

cemeyer commented Jan 9, 2020

Passing a null pointer to memcpy is undefined behavior even if the n argument is 0.

I'm not sure that's undefined behavior (but I'm also not confident it isn't). From C18, § 7.24.1 "String handling" > "String function conventions", (2):

Where an argument declared as size_t n specifies the length of the array for a function, n can have
the value zero on a call to that function. … On such a call, ... a function that copies characters copies zero characters.

A few paragraphs later, § 7.24.2.1, "The memcpy function", (2):

The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1 .

I don't really understand how any valid implementation of memcpy could produce UB regardless of s1 and s2, if n is 0.

@oconnor663
Copy link
Member

@cemeyer
Copy link

cemeyer commented Jan 10, 2020

@oconnor663 It's a great write-up.

I think perhaps Langley and definitely the glibc authors have interpreted §7.1.4 differently than I have. The language only says,

If an argument to a function has an invalid value (such as ... or a null pointer, or ….) …, the behavior is undefined.

The sentence includes other examples clearly irrelevant to string functions, such as mathematical values out of domain. So I believe the "such as" language is intended to provide only examples of possible invalid argument values, rather than a list of values which are always invalid. Indeed, some of the string.h functions explicitly permit NULL arguments, so it cannot be the case that the entire list is always invalid and therefore use in arguments is always UB.

So the question I think is, is NULL an invalid value for s1 or s2 of any valid implementation of memcpy, with n=0? And I don't see any reason to believe that. If n is zero, it is completely bogus for memcpy to dereference s1 or s2, and thus it does not matter if they are valid object pointers.

Similarly, it is valid for malloc(0) to return a non-null pointer. But it is UB to attempt to access the object pointed to. Such a pointer would be a valid input to memcpy(,, 0), but not a valid input with any non-zero size, despite being non-NULL. NULL is not what makes a memcpy argument valid or invalid; it is a combination of the value of n, as well as the size of the object pointed to by s1 and s2 (or NULL, which is essentially the same as a theoretical non-null pointer returned by malloc(0)).

In response to Langley's example of glibc, nonnull, and compiler optimizations, I think basically glibc was wrong to annotate memcpy with the nonnull attribute. (And perhaps GCC was wrong to provide the attribute in the first place — the name is misleading and leads to accidental misuse, the optimizations from the annotation that are not provided by other analysis are extremely limited, and the downside is large.) The optimization bugs are fallout from that erroneous annotation.

As Langley goes on to describe, there is negligible to zero program code benefit from making this invocation UB, and potentially large downside. I think I'm largely on the same page with his Conclusions section.

Although I've little hope of it happening, I'd suggest that GCC and glibc remove these assumptions and that the next revision of the C standard change 7.24.1(2) to clarify that when a length is zero, pointers can be NULL.

💯

In summary: I am still not persuaded this is actually UB per the standard. However, please go ahead and make this change to defend against the glibc/GCC implementation-specific foot-gun.

@oconnor663 oconnor663 merged commit 253e830 into BLAKE3-team:master Jan 10, 2020
@oconnor663
Copy link
Member

Merged and followed on with d7d71b2.

@lulcat lulcat mentioned this pull request May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants