Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UncasedStr::new is undefined behavior #494

Closed
dtolnay opened this issue Dec 6, 2017 · 8 comments
Closed

UncasedStr::new is undefined behavior #494

dtolnay opened this issue Dec 6, 2017 · 8 comments
Labels
triage A bug report being investigated

Comments

@dtolnay
Copy link

dtolnay commented Dec 6, 2017

pub struct UncasedStr(str);

impl UncasedStr {
    pub fn new(string: &str) -> &UncasedStr {
        unsafe { &*(string as *const str as *const UncasedStr) }
    }
}

I believe there is no guarantee that str and UncasedStr have the same layout on all targets. From a discussion with @arielb1 at Rust Belt Rust about ref-cast, the behavior is only defined if UncasedStr is #[repr(C)].

@dtolnay
Copy link
Author

dtolnay commented Dec 6, 2017

Same thing for these transmutes, which all assume #[repr(C)].

  • UncasedStr::into_uncased
  • Uncased::into_boxed_uncased
  • <&RawStr as From<&str>>::from

@SergioBenitez
Copy link
Member

@dtolnay Is there definitive documentation about this? If the layout isn't guaranteed to be the same, why does Rust allow the *const str as *const UncasedStr cast?

@dtolnay
Copy link
Author

dtolnay commented Dec 6, 2017

I believe nomicon's Alternative representations is definitive.

#[repr(C)] is also necessary to soundly do more elaborate tricks with data layout such as reinterpreting values as a different type.

@dtolnay
Copy link
Author

dtolnay commented Dec 6, 2017

Pointer casting is allowed across types that have a different layout but dereferencing the pointer is normally undefined.

// Does not have the same layout as str.
struct Sergio {
    repr: char,
}

fn main() {
    let p: *const str = "C";
    println!("{:p}", p as *const Sergio);
}

@tesaguri
Copy link
Contributor

tesaguri commented Dec 6, 2017

FYI, transmute::<T, U> requires T and U to have the same size, which is more strict than pointer casting.

struct Sergio {
    repr: char,
}

fn main() {
    let p = "C";
    unsafe {
        println!("{:p}", std::mem::transmute::<&str, &Sergio>(p)); // error[E0512]: transmute called with types of different sizes
    }
}

@SergioBenitez
Copy link
Member

@dtolnay, @dmizuk Isn't the transmute in <&RawStr as From<&str>>::from okay given that transmute checks that the size of its two type parameters are identical?

Looking at the std. library, it doesn't seem like #[repr(C)] is used in similar situations. I'm not even sure what #[repr(C)] would mean on DSTs like struct S(str).

In any case, here are the set of changes I have planned:

--- a/lib/src/http/uncased.rs
+++ b/lib/src/http/uncased.rs
@@ -36,7 +36,7 @@ impl UncasedStr {
     /// ```
     #[inline(always)]
     pub fn new(string: &str) -> &UncasedStr {
-        unsafe { &*(string as *const str as *const UncasedStr) }
+        unsafe { ::std::mem::transmute(string) }
     }

     /// Returns `self` as an `&str`.
@@ -69,7 +69,7 @@ impl UncasedStr {
     #[inline(always)]
     pub fn into_uncased(self: Box<UncasedStr>) -> Uncased<'static> {
         unsafe {
-            let raw_str = Box::into_raw(self) as *mut str;
+            let raw_str: *mut str = ::std::mem::transmute(Box::into_raw(self));
             Uncased::from(Box::from_raw(raw_str).into_string())
         }
     }
@@ -203,8 +203,9 @@ impl<'s> Uncased<'s> {
     #[inline(always)]
     pub fn into_boxed_uncased(self) -> Box<UncasedStr> {
         unsafe {
-            let raw_str = Box::into_raw(self.string.into_owned().into_boxed_str());
-            Box::from_raw(raw_str as *mut UncasedStr)
+            let box_str = self.string.into_owned().into_boxed_str();
+            let raw: *mut UncasedStr = ::std::mem::transmute(Box::into_raw(box_str));
+            Box::from_raw(raw)
         }
     }

Do these resolve the issue?

@SergioBenitez SergioBenitez added the triage A bug report being investigated label Dec 14, 2017
@dtolnay
Copy link
Author

dtolnay commented Dec 14, 2017

Isn't the transmute in <&RawStr as From<&str>>::from okay given that transmute checks that the size of its two type parameters are identical?

The transmute checks that &RawStr and &str are the same number of bytes, but does not check that RawStr and str have the same representation, which is not guaranteed. So I believe that is still undefined behavior.

Looking at the std. library, it doesn't seem like #[repr(C)] is used in similar situations.

My understanding is that the standard library is not beholden to the same soundness considerations that other libraries are. If the compiler makes a change that changes the layout of struct RawStr(str), they would just fix all the transmutes in the standard library in the same release. But in crates that same change would cause memory unsafety.

I'm not even sure what #[repr(C)] would mean on DSTs like struct S(str).

I believe that is well defined. S will be guaranteed to have the same memory representation as str.

Note that without #[repr(transparent)] it may still not be ABI compatible for the purpose of FFI, but that is not required here. It wouldn't be sound to declare an extern fn with argument &str and implement it in a different compilation unit with a function that takes &MyStr or vice versa. That is being tracked in rust-lang/rust#43036.

Do these resolve the issue?

Don't think so. Transmute vs pointer cast is a matter of preference I think, but whether the behavior is well defined is the same. I think the fix is adding #[repr(C)] to any newtype wrappers that are involved in transmute / pointer casting.

@dtolnay
Copy link
Author

dtolnay commented Dec 14, 2017

For example I think it would be legal for a target to use a layout like this, to support better debugging or to compile for JVM or some other VM or any other reason. Here T and Newtype<T> may have different alignment requirements and casting *const T to *const Newtype<T> is not meaningful.

struct Newtype<T>(T);
 Newtype<T>                 Typeinfo
+----------+          +------------------+
| typeinfo | -------> | name = "Newtype" |
+----------+          +------------------+
| T        |          | ...              |
+----------+          +------------------+

But with #[repr(C)] struct Newtype<T>(T) the target would be required to use the same memory layout for T and Newtype<T>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage A bug report being investigated
Projects
None yet
Development

No branches or pull requests

3 participants