Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STATUS_ACCESS_VIOLATION on Windows MSVC #67

Closed
stphnt opened this issue Jul 14, 2023 · 12 comments
Closed

STATUS_ACCESS_VIOLATION on Windows MSVC #67

stphnt opened this issue Jul 14, 2023 · 12 comments

Comments

@stphnt
Copy link
Contributor

stphnt commented Jul 14, 2023

I am able to trigger a STATUS_ACCESS_VIOLATION or panic in some cases on my Windows machine. It is highly dependent upon the surrounding code. I have only been able to trigger it in release mode so far, not debug mode. The simplest reproduction case I've found so far was to add the following to a file in this repo's tests directory and run cargo test --release. If I find a simpler, stand alone case, I'll post it.

use linkme::distributed_slice;

pub struct Item {
    pub name: &'static str,
}

#[distributed_slice]
pub static ITEMS: [Item] = [..];

macro_rules! make_items {
    ($($name:ident = $str:literal), +) => {
        $(
            #[distributed_slice(ITEMS)]
            static $name: Item = Item {
                name: $str,
            };
        )+
    }
}
make_items! {
    ITEM1 = "item1",
    ITEM2 = "item2",
    ITEM3 = "item3"
}

#[test]
fn test_failure() {
    let mut last_address = None;
    for item in ITEMS {
        let address = item as *const Item as usize;

        // If these 3 lines are commented out there is no STATUS_ACCESS_VIOLATION
        // below. Instead you get an invalid utf8 error.
        use std::io::Write as _;
        println!("{:x} {}", address, item.name);
        std::io::stdout().flush().unwrap();

        if let Some(last) = last_address {
            assert_eq!(address - last, std::mem::size_of::<Item>());
        }
        last_address = Some(address);

        // When run with `cargo test --release`, this causes either a
        // STATUS_ACCESS_VIOLATION or panic depending on what lines are commented
        // out above. Note, that if it panics, it prints out the name properly. 
        // When this is commented out, everything runs fine and the test passes.
        let _ = std::str::from_utf8(item.name.as_bytes())
            .unwrap_or_else(|err| panic!("invalid utf8: {:?}: {:?}", item.name, err));
    }
}

The errors I get when running cargo test --release look like either of the following:

running 1 test
error: test failed, to rerun pass `--test status_access_violation`

Caused by:
  process didn't exit successfully: `target\release\deps\status_access_violation-0071020cc9b8bc0c.exe` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)
running 1 test
test test_failure ... FAILED

failures:

---- test_failure stdout ----
thread 'test_failure' panicked at 'invalid utf8: "item3": Utf8Error { valid_up_to: 6, error_len: Some(1) }', tests\status_access_violation.rs:47:33
stack backtrace:
...

I've seen it with linkme versions 0.3.9 and master and Rust versions 1.70 and 1.71.0. I haven't looked at any other versions.

Information about my machine:

OS Name:                   Microsoft Windows 11 Home
OS Version:                10.0.22621 N/A Build 22621
System Model:              ZenBook UX434IQ_UM433IQ
System Type:               x64-based PC
Processor(s):              AMD Ryzen 7 4000 series

I am happy to help debug/test things.

@stphnt
Copy link
Contributor Author

stphnt commented Jul 14, 2023

Although not a stand-alone case, this one is slightly simpler and I think gets at the issue better

use linkme::distributed_slice;

pub struct Item {
    pub name: &'static str,
}

#[distributed_slice]
pub static ITEMS: [Item] = [..];

#[distributed_slice(ITEMS)]
static ITEM: Item = Item {
     name: "item1",
};

#[test]
fn test_failure() {
    let mut last_address = None;
    for item in ITEMS {
        // Removing these four lines, causes the problem to go away.
        let address = item as *const Item as usize;
        if let Some(last) = last_address {
            assert_eq!(address - last, std::mem::size_of::<Item>());
        }
        last_address = Some(address);

        let _ = std::str::from_utf8(item.name.as_bytes())
            .unwrap_or_else(|err| {
                panic!("invalid utf8: {:?}: {:?}", item.name, err);  // panics and prints "item1" as the name
                // panic!("invalid utf8: {:?}: {:?}", item.name.as_bytes(), err);  // STATUS_ACCESS_VIOLATION fault
                // panic!("invalid utf8: {:?}: {:?}", item.name.len(), err);  // Panics and prints 16 as the length
            });
    }
}

My current theory is that in some cases the number of bytes included in the item's name is incorrectly inflated (in this case to 16 from 5 bytes). This means that when the code iterates over it, depending on where in memory it is and the data around it, it may or may not access memory it isn't allowed to or read invalid utf8 when it goes beyond the 5 bytes of the original field. Why it would be inflating the length, I'm not sure.

@dtolnay
Copy link
Owner

dtolnay commented Jul 14, 2023

I don't have access to Windows. It would be good if you could narrow this down to a linker bug vs compiler bug using https://github.com/pacak/cargo-show-asm. Does the assembly code for test_failure look like it is doing the wrong thing, or is the code correct while wrong data is stored in the .linkme_ITEMS symbol that it loads from.

@stphnt
Copy link
Contributor Author

stphnt commented Jul 14, 2023

I think the assembly looks OK (I'm not terribly familiar with assembly).

ITEMS has the following assembly, which looks to me like it lines up with the definition of DistributedSlice:

.section .rdata,"dr",one_only,__unnamed_10
__unnamed_10:
	.ascii	"ITEMS"

.section .rdata,"dr",one_only,status_access_violation::ITEMS
	.p2align	3, 0x0
status_access_violation::ITEMS:
	.quad	__unnamed_10
	.asciz	"\005\000\000\000\000\000\000"
	.quad	status_access_violation::ITEMS::LINKME_START
	.quad	status_access_violation::ITEMS::LINKME_STOP
	.quad	status_access_violation::ITEMS::DUPCHECK_START
	.quad	status_access_violation::ITEMS::DUPCHECK_STOP

.section .linkme_ITEMS$a,"dr"
	.p2align	3, 0x0
status_access_violation::ITEMS::LINKME_START:

.section .linkme_ITEMS$c,"dr"
	.p2align	3, 0x0
status_access_violation::ITEMS::LINKME_STOP:

.section .linkm2_ITEMS$a,"dr"
status_access_violation::ITEMS::DUPCHECK_START:

.section .linkm2_ITEMS$c,"dr"
status_access_violation::ITEMS::DUPCHECK_STOP:

.section .linkm2_ITEMS$b,"dr"
	.p2align	3, 0x0
status_access_violation::ITEMS::DUPCHECK:
	.asciz	"\001\000\000\000\000\000\000"

The assembly for each item (I had three in this case) is further down and also makes sense, I think; each points to the correct string and has the correct length (5):

.section .rdata,"dr",one_only,__unnamed_14
__unnamed_14:
	.ascii	"item3"

.section .linkme_ITEMS$b,"dr"
	.p2align	3, 0x0
status_access_violation::ITEM3:
	.quad	__unnamed_14
	.asciz	"\005\000\000\000\000\000\000"

.section .rdata,"dr",one_only,__unnamed_15
__unnamed_15:
	.ascii	"item2"

.section .linkme_ITEMS$b,"dr"
	.p2align	3, 0x0
status_access_violation::ITEM2:
	.quad	__unnamed_15
	.asciz	"\005\000\000\000\000\000\000"

.section .rdata,"dr",one_only,__unnamed_16
__unnamed_16:
	.ascii	"item1"

.section .linkme_ITEMS$b,"dr"
	.p2align	3, 0x0
status_access_violation::ITEM1:
	.quad	__unnamed_16
	.asciz	"\005\000\000\000\000\000\000"

The full assembly is available here: https://gist.github.com/stphnt/72a877754761ac4e1c98ff5ec6860b2f. I briefly skimmed the rest of the assembly, but I would need more time than I have today to dig into that.

While testing I also noticed that item.name.len() returns different values depending on how many items are in the list. For example, for 1, 2, and 3 items, it returns 16, 32, or 48, respectively. If I add an assert_eq!(item.name.len(), 5); in the loop it always passes but the final line

        let _ = std::str::from_utf8(item.name.as_bytes())
            .unwrap_or_else(|err| panic!("invalid utf8: {:?}: {:?}", item.name.len(), err));

still panics with an error like the following (note the length is now correctly reported as 5 but it checked 9 bytes when trying to parse it as utf8):

thread 'test_failure' panicked at 'invalid utf8: 5: Utf8Error { valid_up_to: 8, error_len: Some(1) }', tests\status_access_violation.rs:41:35

@stphnt
Copy link
Contributor Author

stphnt commented Jul 14, 2023

Oh, I also meant to say "Thanks!" for the suggestion to use cargo-show-asm. I hadn't heard of it before - it's super useful!

@stphnt
Copy link
Contributor Author

stphnt commented Jul 14, 2023

It is notable that the size returned by item.name.len() matches the number of bytes in the contents of the distributed slice: 16 bytes (8 bytes for the pointer + 8 bytes for the length) for each &str in the array. I checked up to 10 items in the list and the pattern holds (provided I omit the assertion on the length). Why the number of bytes in the slice is being confused for the number of bytes in each str, I'm not sure.

@stphnt
Copy link
Contributor Author

stphnt commented Jul 15, 2023

This problem seems to be linker-related. The bug disappears if I completely disable LTO by adding the following to the Cargo.toml

[profile.release]
lto = "off"

It shows up with all other levels of LTO (true, false, and "thin") for exactly the same code.

@stphnt
Copy link
Contributor Author

stphnt commented Jul 16, 2023

It is notable that the size returned by item.name.len() matches the number of bytes in the contents of the distributed slice: 16 bytes (8 bytes for the pointer + 8 bytes for the length) for each &str in the array. I checked up to 10 items in the list and the pattern holds (provided I omit the assertion on the length). Why the number of bytes in the slice is being confused for the number of bytes in each str, I'm not sure.

Although this pattern was true with that code, it does not appear to be always true. I can get the value returned by item.name.len() to be "random" in some cases. For example, I modified the earlier test to the following

use linkme::distributed_slice;

pub struct Item {
    pub name: &'static str,
}

#[distributed_slice]
pub static ITEMS: [Item] = [..];

macro_rules! make_items {
    ($($name:ident = $str:literal),* $(,)?) => {
        $(
            #[distributed_slice(ITEMS)]
            static $name: Item = Item {
                name: $str,
            };
        )*
    }
}
make_items! {
    ITEM1 = "item1",
    ITEM2 = "item2",
    ITEM3 = "item3",
}

#[test]
fn test_failure() {
    let mut last_address = None;
    for item in ITEMS {
        let address = item as *const Item as usize;
        if let Some(last) = last_address {
            assert_eq!(address - last, std::mem::size_of::<Item>());
        }
        last_address = Some(address);

        println!("{} {}", item.name.len(), item.name);
    }
    panic!("end");  // So we can see what is printed
}

I tried different numbers of elements in the list but the print out is always something like this, with the length of the first item always being wrong:

---- test_failure stdout ----
2266525401088 item3
5 item2
5 item1

The length of "item3" is random, returning a different value each time the executable is run (no recompilation in between runs). The length of the other element names always seems to be correct. Unsurprisingly, adding println!("{:?}", item.name.as_bytes()); triggers an STATUS_ACCESS_VIOLATION.

The assembly for that code is available here. The length stored for ITEM3 is correct (5) as before.

@stphnt
Copy link
Contributor Author

stphnt commented Jul 19, 2023

I've had a little more time to look into the assembly of code similar to the last example and, if I've read it properly, I think I know what is going on. Here is the new

The code is nearly identical to the previous example except I added

impl Item {
    #[inline(never)]
    fn len(&self) -> usize {
        self.name.len()
    }
}

and replaced item.name.len() with item.len(). Like the previous example code, item.len() returns an incorrect value for the first item, but correct values for the remaining items. The main reason for adding this never-inlined method is that it adds a symbol to the assembly which makes it easier to see where the length calculation is happening.

The first relevant parts of the assembly are lines 180-194:

	lea rax, [rip + status_access_violation::ITEMS::DUPCHECK_STOP::h74e1724df292ac16]
	lea rcx, [rip + status_access_violation::ITEMS::DUPCHECK_START::h870b2f241a9cc5fe+8]
	cmp rcx, rax
	jb .LBB5_6

	.cv_inline_site_id 14 within 11 inlined_at 7 36 0
	lea rsi, [rip + status_access_violation::ITEMS::LINKME_START::he0b5d265ab5a1f75]

	lea rbx, [rip + status_access_violation::ITEMS::LINKME_STOP::h4ea3bbd812e95522]
	sub rbx, rsi
	cmp rbx, 16

	jb .LBB5_5

	call status_access_violation::Item::len::h8e6cfeac92d3d77b

Here we see the check for duplicate distributed slices and the creation of the slice from the section's start and stop pointers, which comes from the DistributedSlice::static_slice method, and then the first call to item.len() (status_access_violation::Item::len::h8e6cfeac92d3d77b) at the end. It then does some printing and the pointer math, before looping over the next items in the list in lines 224-255:

	lea rsi, [rip + status_access_violation::ITEMS::LINKME_START::he0b5d265ab5a1f75+16]
	lea rdi, [rsp + 48]
	.p2align	4, 0x90
.LBB5_4:

	.cv_inline_site_id 16 within 14 inlined_at 8 59 0
	.cv_inline_site_id 17 within 16 inlined_at 8 90 0
	.cv_inline_site_id 18 within 17 inlined_at 10 463 0
	lea r14, [rsi + 16]

	mov rcx, qword ptr [rsi + 8]
	call status_access_violation::Item::len::h8e6cfeac92d3d77b
	mov qword ptr [rsp + 32], rax
	lea rax, [rsp + 32]
	mov qword ptr [rsp + 96], rax
	mov qword ptr [rsp + 104], r15
	mov qword ptr [rsp + 112], rsi
	mov qword ptr [rsp + 120], r12

	mov qword ptr [rsp + 48], r13
	mov qword ptr [rsp + 56], 3
	mov qword ptr [rsp + 80], 0
	mov qword ptr [rsp + 64], rbp
	mov qword ptr [rsp + 72], 2

	mov rcx, rdi
	call std::io::stdio::_print::hbd5b543e3c754daa

	mov rsi, r14

	add rbx, -16
	jne .LBB5_4

Notably, the first call to item.len() is separate from the code that loops through the rest of the items.

Now status_access_violation::Item::len::h8e6cfeac92d3d77b is defined further down in lines 317-322 as

status_access_violation::Item::len::h8e6cfeac92d3d77b:
	.cv_func_id 22
	mov rax, rcx
	ret

So the length of the string is expected to be in rcx.

In the loop, rcx appears to be properly set:

  • the loops increments rsi by 16 bytes (the size of Item) relative to LINKME_START
  • rcx is set to the second 8 bytes of each item, which, looking at the definitions of ITEM1, ITEM2, and ITEM3 further down, do indeed contain the length of the string and are properly set to 5.

However, for the first item in the slice the code is different. rcx is instead set to the value at [rip + status_access_violation::ITEMS::DUPCHECK_START::h870b2f241a9cc5fe+8] at line 181. Now this is only correct if DUPCHECK_START and LINKME_START are the same address. If they are not then rcx will have whatever is in memory at 8 bytes past DUPCHECK_START.

So I think the problem is that one or both of the following are happening:

  1. The MSVC linker is not always setting DUPCHECK_START and LINKME_START to the same address, causing the code to fail sporadically.
    • The fact that turning LTO off causes the bug to go away with effectively identical assembly, suggests this is happening.
    • Given that LINKME_START and DUPCHECK_START are placed in separate sections, .linkme_ITEMS$a and .linkm2_ITEMS$a, respectively, it seems reasonable that they would have different addresses.
  2. The compiler is incorrectly using DUPCHECK_START instead of LINKME_START to find the first item in the slice.
    • My assumption is that the compiler assumes DUPCHECK_START and LINKME_START are the same address and rather than calculating the offset relative to LINKME_START again, it just reuses the value it already calculated when doing the duplication check. This seems like a reasonable optimization to do but I'm not sure the compiler can safely make the assumption that they are the same.

It is not clear to me whether

  1. The linker's behaviour is incorrect
  2. The compiler's behaviour is incorrect

Looking over the code in the linkme crate, it doesn't seem like DUPCHECK_START and LINKME_START were ever intended to be the same value. If they were, I would expect a DistributedSlice would only store it once instead of twice. Furthermore, they appear to be serving separate and unrelated purposes.

@stphnt
Copy link
Contributor Author

stphnt commented Aug 5, 2023

I'm just coming back to his after vacation. If you agree with the above analysis, @dtolnay, then I think the next step is for me to see if I can find a related issue in the rust-lang/rust repo that could be a cause for this or submit a new issue if I can't find one.

@stphnt
Copy link
Contributor Author

stphnt commented Sep 29, 2023

It's been a while since my last post here. I haven't done anything - I'm waiting on confirmation of what you would like to do next. Happy to continue looking into this.

@jrose-signal
Copy link
Contributor

I don't know if #72 fixes your issue, since yours seemed potentially related to DUPCHECK as well, but it may be worth trying!

stphnt added a commit to stphnt/linkme that referenced this issue Oct 7, 2023
stphnt added a commit to stphnt/linkme that referenced this issue Oct 7, 2023
stphnt added a commit to stphnt/linkme that referenced this issue Oct 7, 2023
stphnt added a commit to stphnt/linkme that referenced this issue Oct 7, 2023
@stphnt
Copy link
Contributor Author

stphnt commented Oct 7, 2023

I don't know if #72 fixes your issue, since yours seemed potentially related to DUPCHECK as well, but it may be worth trying!

Thanks, @jrose-signal! That does indeed fix my issue.

@stphnt stphnt closed this as completed Oct 7, 2023
dtolnay added a commit that referenced this issue Oct 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants