Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upToo much undefined behavior #23
Comments
This comment has been minimized.
This comment has been minimized.
|
This uninitialized memory is stack allocated, we always fully fill it and never read uninitialized values from it. I don't see how it can cause UB. Essentially we just ask compiler to reserve N bytes on stack for us to fill later. Also I don't quite get why you think that As for out-of-bound read Intel AES-NI white-paper (p. 32) uses essentially the same code. I knew about out-of-bound access (see comment), but I don't see how it can cause problems in practice, as we always overwrite garbage data. Theoretically it may cause problems if key pointer has address As for migration to |
This comment has been minimized.
This comment has been minimized.
As mentioned, the undefined behavior in the library is not of type "read of uninitialized memory", but "reference to uninitialized memory". This code never reads uninitialized memory: let x: i32 = mem::uninitialized();
let y = &x; // UByet it has undefined behavior because it creates a reference to memory that is uninitialized - raw pointers can point to uninitialized memory, but references always must point to initialized memory and creating a reference that points to uninitialized memory is undefined behavior. In the code, What the code probably wants to do here is: ptr::write(x.as_mut_ptr().offset(i), 2);which creates a raw pointer instead of a reference, and writes to it using
The code is using
Yeah, I just wanted to submit a PR with a small prototype, so that you could see what the changes look like, so that we have something concrete to talk about. I think that would be more productive than just talking about this in an abstract way without looking at any code. |
This comment has been minimized.
This comment has been minimized.
|
FYI the
Using ptr::write everywhere sucks, but that's the most common way. Because union U {
x: i32,
y: (),
}
let mut u = U { y: () }; // "uninitialized" memory
u.x = 3; // initializes the memory
// this should always work though:
ptr::write(&mut u as *mut i32, 3);
// note that u is initialized,
// so &mut u can never become undefined behaviorIn the |
This comment has been minimized.
This comment has been minimized.
It's a bit pedantic to my taste, as for me when we work with plain bytes one is just a sugar around another. But, ok, I'll change it, though it will make code a bit noisier.
Note, that your link talks only about types which implement
I know about it. Intel code example does absolutely the same. I don't see how it can cause problems in practice, as we never use out-of-bound bits. But if we are being pedantic, we can change it, luckily this part of the code can tolerate one or two redundant instructions. |
This comment has been minimized.
This comment has been minimized.
|
I'll write a PR with the discussed changes. Do you have any other places which you think cause UB? |
This comment has been minimized.
This comment has been minimized.
Then Intel code example
I think it can be changed without redundant instructions if you do something like this (I haven't fully checked this code though): let key: GenericArray<u8, 24>;
let mut x: [u8; 32] = mem::uninitialized();
ptr::write(x.as_mut_ptr() as *mut GenericArray<u8, 24>, key);
encrypt(x.as_ptr());
// note that we can't pass x by & because that would
// create a reference to uninitialized memory :(
fn encrypt(key: *const [u8; 32]) {
let u : __m128i = _mm_loadu_si128((key as *const u8).offset(16) as *const __m128i);
// ^^^ the last 8 bytes of u are uninitialized
// as long as we overwrite them before we use them everything should be ok
// but in the mean time one cannot create references to u :(
}
It makes the code a lot noisier sadly, I have no good ideas about how to make it less noisy here :/ Maybe just use raw pointers instead? Or use
None of the
For my taste too, but undefined behavior is undefined. I'd like to think that the Rust from the future might use this to optimize in some cases. Those cases are not the one in this crate, but if the behavior is undefined other programs might behave incorrectly as a consequence of the optimization. So Rust from the future decides to So even though your program doesn't benefit from the optimization, because of the UB, it might some day panic. I don't know if that motivates you to fix these, but that's what I tell myself to motivate myself. FWIW the root issue here is that the code uses Before you start fixing these, I'd recommend you to read on these issues, and play with I haven't removed all uses of
I've just looked at a tiny portion of the library, sorry. I can go through the PR if you wish, and I will try to prototype the |
This comment has been minimized.
This comment has been minimized.
Can you, please, show how |
This comment has been minimized.
This comment has been minimized.
Not really, I've never used What I currently always do in union U {
x: i32,
y: (), // uninitialized variant
}
// create uninitialized value:
let u = U { y: () };
// initialize it to some value:
let value = 42_i32;
ptr::write(&mut u /* always ok: u is initialized */ as *mut i32, value);What you want here is to have an array that contains partially initialized values, so I would do: union U {
x: u8,
y: (), // uninitialized variant
}
let array: [U; 32] = [U { y: () }; 32]; // uninitialized values
// note: array itself is initialized, and so are the elements of it
ptr::write(&mut array[3] /* ok: the array, array[3] are initialized */ as *mut u8, some_value);Does that make sense? IIRC, to use The point is that the array object is initialized, and each element is initialized to the These is all this complicated because of the uninitialized memory. In |
This comment has been minimized.
This comment has been minimized.
|
While I see the motivation behind making creation of uninitialized data safe and moving unsafety to final conversion from union to "plain" type, currently I think that for small cases like this where you essentially deal with one
I've checked your approach on godbolt, and surprisingly it has produced a good result. ( |
This comment has been minimized.
This comment has been minimized.
Yeah, you should really check the code generation for all these changes. You don't want to inadvertently produce a regression.
FWIW, I started with that opinion, and it took me about 2 weeks to change it because I started discovering more and more undefined behavior due to For example, for the array indexing case ( I started to like the If you want to keep the |
newpavlov
closed this
in
#24
Aug 8, 2018
This comment has been minimized.
This comment has been minimized.
|
Nice, this was really quick! |
gnzlbg commentedAug 7, 2018
•
edited
I started to write a PR to just show how using
#[target_feature]would look like but I encountered too much undefined behavior in the process (fixing it was taking more time than actually dealing with target feature).I've only taken a look at a tiny fraction of the
aesnicrate, but it seems that references to uninitialized memory are created pretty much everywhere (all references must point to initialized memory [0]):* https://github.com/RustCrypto/block-ciphers/blob/master/aes/aesni/src/aes128/expand.rs#L21
* https://github.com/RustCrypto/block-ciphers/blob/master/aes/aesni/src/aes128/expand.rs#L36
The way I fixed these was by using
ptr::writeto write to uninitialized memory without creating a reference.There are also reads memory out-of-bounds: https://github.com/RustCrypto/block-ciphers/blob/master/aes/aesni/src/aes192/expand.rs#L51
The way I fixed this was by creating a stack buffer of appropriate size in the
mod.rsfile which I initialized withmem::uninitialized, and then usingptr::writeto write theGenericArrayinto the head of the buffer, so that the vector load never reads memory out of bounds - the memory read is only partially initialized though, and reading vector lanes with uninitialized memory should be done with care (so that no UB is introduced).[0]
&T as *const Tand&mut T as *mut Tmight not end up being undefined behavior because they might end up being atomic operations in the memory model that create a raw pointer directly without creating a reference (so that no reference to uninitialized memory is ever created by these expressions, preventing them from invoking undefined behavior).