Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation error on Windows #142

Closed
nre-ableton opened this issue Jan 13, 2021 · 18 comments
Closed

Compilation error on Windows #142

nre-ableton opened this issue Jan 13, 2021 · 18 comments

Comments

@nre-ableton
Copy link

@nre-ableton nre-ableton commented Jan 13, 2021

Similar to issue #79, I'm getting compilation errors when building blake3 v0.3.7 on Windows:

[2021-01-13T09:05:58.008Z] The following warnings were emitted during compilation:
[2021-01-13T09:05:58.008Z] 
[2021-01-13T09:05:58.008Z] warning: The C compiler "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\bin\\amd64\\cl.exe" does not support /arch:AVX512.
[2021-01-13T09:05:58.008Z] 
[2021-01-13T09:05:58.008Z] error: failed to run custom build command for `blake3 v0.3.7`
[2021-01-13T09:05:58.008Z] 
[2021-01-13T09:05:58.008Z] Caused by:
[2021-01-13T09:05:58.008Z]   process didn't exit successfully: `C:\Users\ci\AppData\Local\Temp\cargo-installMihItV\release\build\blake3-98f585247d671863\build-script-build` (exit code: 1)
[2021-01-13T09:05:58.008Z]   --- stdout
[2021-01-13T09:05:58.008Z]   cargo:rerun-if-env-changed=CARGO_FEATURE_PURE
[2021-01-13T09:05:58.008Z]   TARGET = Some("x86_64-pc-windows-msvc")
[2021-01-13T09:05:58.008Z]   HOST = Some("x86_64-pc-windows-msvc")
[2021-01-13T09:05:58.008Z]   CC_x86_64-pc-windows-msvc = None
[2021-01-13T09:05:58.008Z]   CC_x86_64_pc_windows_msvc = None
[2021-01-13T09:05:58.008Z]   HOST_CC = None
[2021-01-13T09:05:58.008Z]   CC = None
[2021-01-13T09:05:58.008Z]   CFLAGS_x86_64-pc-windows-msvc = None
[2021-01-13T09:05:58.008Z]   CFLAGS_x86_64_pc_windows_msvc = None
[2021-01-13T09:05:58.008Z]   HOST_CFLAGS = None
[2021-01-13T09:05:58.008Z]   CFLAGS = None
[2021-01-13T09:05:58.008Z]   CRATE_CC_NO_DEFAULTS = None
[2021-01-13T09:05:58.008Z]   CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2")
[2021-01-13T09:05:58.008Z]   OPT_LEVEL = Some("3")
[2021-01-13T09:05:58.008Z]   CC_x86_64-pc-windows-msvc = None
[2021-01-13T09:05:58.008Z]   CC_x86_64_pc_windows_msvc = None
[2021-01-13T09:05:58.008Z]   HOST_CC = None
[2021-01-13T09:05:58.008Z]   CC = None
[2021-01-13T09:05:58.008Z]   CFLAGS_x86_64-pc-windows-msvc = None
[2021-01-13T09:05:58.008Z]   CFLAGS_x86_64_pc_windows_msvc = None
[2021-01-13T09:05:58.008Z]   HOST_CFLAGS = None
[2021-01-13T09:05:58.008Z]   CFLAGS = None
[2021-01-13T09:05:58.008Z]   CRATE_CC_NO_DEFAULTS = None
[2021-01-13T09:05:58.009Z]   CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2")
[2021-01-13T09:05:58.009Z]   DEBUG = Some("false")
[2021-01-13T09:05:58.009Z]   cargo:warning=The C compiler "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\bin\\amd64\\cl.exe" does not support /arch:AVX512.
[2021-01-13T09:05:58.009Z]   cargo:rerun-if-env-changed=BLAKE3_CI
[2021-01-13T09:05:58.009Z]   cargo:rerun-if-env-changed=CARGO_FEATURE_PREFER_INTRINSICS
[2021-01-13T09:05:58.009Z]   cargo:rerun-if-env-changed=CARGO_FEATURE_PURE
[2021-01-13T09:05:58.009Z]   cargo:rustc-cfg=blake3_sse2_ffi
[2021-01-13T09:05:58.009Z]   cargo:rustc-cfg=blake3_sse41_ffi
[2021-01-13T09:05:58.009Z]   cargo:rustc-cfg=blake3_avx2_ffi
[2021-01-13T09:05:58.009Z]   TARGET = Some("x86_64-pc-windows-msvc")
[2021-01-13T09:05:58.009Z]   OPT_LEVEL = Some("3")
[2021-01-13T09:05:58.009Z]   HOST = Some("x86_64-pc-windows-msvc")
[2021-01-13T09:05:58.009Z]   CC_x86_64-pc-windows-msvc = None
[2021-01-13T09:05:58.009Z]   CC_x86_64_pc_windows_msvc = None
[2021-01-13T09:05:58.009Z]   HOST_CC = None
[2021-01-13T09:05:58.009Z]   CC = None
[2021-01-13T09:05:58.009Z]   CFLAGS_x86_64-pc-windows-msvc = None
[2021-01-13T09:05:58.009Z]   CFLAGS_x86_64_pc_windows_msvc = None
[2021-01-13T09:05:58.009Z]   HOST_CFLAGS = None
[2021-01-13T09:05:58.009Z]   CFLAGS = None
[2021-01-13T09:05:58.009Z]   CRATE_CC_NO_DEFAULTS = None
[2021-01-13T09:05:58.009Z]   CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2")
[2021-01-13T09:05:58.009Z]   DEBUG = Some("false")
[2021-01-13T09:05:58.009Z]   running: "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\bin\\amd64\\ml64.exe" "-nologo" "-FoC:\\Users\\ci\\AppData\\Local\\Temp\\cargo-installMihItV\\release\\build\\blake3-86dd5a2a633273e8\\out\\c/blake3_sse2_x86-64_windows_msvc.o" "-c" "c/blake3_sse2_x86-64_windows_msvc.asm"
[2021-01-13T09:05:58.009Z]    Assembling: c/blake3_sse2_x86-64_windows_msvc.asm
[2021-01-13T09:05:58.009Z]   c/blake3_sse2_x86-64_windows_msvc.asm(2057)pecified size
[2021-01-13T09:05:58.009Z]   c/blake3_sse2_x86-64_windows_msvc.asm(2057) : error A2070:invalid instrucc/blake3_sse2_x86-64_windows_msvc.asm(2058)pecified size
[2021-01-13T09:05:58.009Z]   c/blake3_sse2_x86-64_windows_msvc.asm(2058) : error A2070:invalid instrucc/blake3_sse2_x86-64_windows_msvc.asm(2189)pecified size
[2021-01-13T09:05:58.009Z]   c/blake3_sse2_x86-64_windows_msvc.asm(2189) : error A2070:invalid instrucc/blake3_sse2_x86-64_windows_msvc.asm(2190)pecified size
[2021-01-13T09:05:58.009Z]   c/blake3_sse2_x86-64_windows_msvc.asm(2190) : error A2070:invalid instrucexit code: 1
[2021-01-13T09:05:58.009Z]   running: "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\bin\\amd64\\ml64.exe" "-nologo" "-FoC:\\Users\\ci\\AppData\\Local\\Temp\\cargo-installMihItV\\release\\build\\blake3-86dd5a2a633273e8\\out\\c/blake3_sse41_x86-64_windows_msvc.o" "-c" "c/blake3_sse41_x86-64_windows_msvc.asm"
[2021-01-13T09:05:58.009Z]    Assembling: c/blake3_sse41_x86-64_windows_msvc.asm
[2021-01-13T09:05:58.009Z]   c/blake3_sse41_x86-64_windows_msvc.asm(1820)specified size
[2021-01-13T09:05:58.009Z]   c/blake3_sse41_x86-64_windows_msvc.asm(1820) : error A2070:invalid instruc/blake3_sse41_x86-64_windows_msvc.asm(1821)specified size
[2021-01-13T09:05:58.009Z]   c/blake3_sse41_x86-64_windows_msvc.asm(1821) : error A2070:invalid instruc/blake3_sse41_x86-64_windows_msvc.asm(1941)specified size
[2021-01-13T09:05:58.009Z]   c/blake3_sse41_x86-64_windows_msvc.asm(1941) : error A2070:invalid instruc/blake3_sse41_x86-64_windows_msvc.asm(1942)specified size
[2021-01-13T09:05:58.009Z]   c/blake3_sse41_x86-64_windows_msvc.asm(1942) : error A2070:invalid instruexit code: 1

I'm actually trying to build sccache 0.2.15, and blake3 is a dependency (or transient dependency?) of that, so forgive me for my unfamiliarity with this project. I've checked the open issues/PRs and didn't see this reported yet.

@sneves
Copy link
Collaborator

@sneves sneves commented Jan 13, 2021

This doesn't look like it has much to do with AVX-512 at all; the build script detects that VS2015 doesn't support it, and doesn't include it in the build.

What is ultimately failing is VS2015's MASM inability to recognize movq xmm, r64 as a valid instruction; it only recognizes movd. A quick fix would be to mechanically replace movq by movd in the affected lines.

@nre-ableton
Copy link
Author

@nre-ableton nre-ableton commented Jan 13, 2021

Ah sorry, that wasn't obvious to me from the compiler output. I'll rename this issue to something less specific.

@nre-ableton nre-ableton changed the title AVX-512 compilation error on Windows Compilation error on Windows Jan 13, 2021
@sneves
Copy link
Collaborator

@sneves sneves commented Jan 13, 2021

3a8204f should have made this not fail, but I do not have a VS 2015 handy to check.

@nre-ableton
Copy link
Author

@nre-ableton nre-ableton commented Jan 13, 2021

@sneves Thanks for the quick fix! I will try to setup a VM to test your change.

@nre-ableton
Copy link
Author

@nre-ableton nre-ableton commented Jan 13, 2021

@sneves Ok, I have verified that 3a8204f now compiles with VS2015. Thanks again!

@Boscop
Copy link

@Boscop Boscop commented May 5, 2021

I was just bitten by this, because I'm also on VS2015 and any blake3 version >= 0.3.1 doesn't build for me.
Could you please release a new version on crates.io that includes this fix? :)

@oconnor663
Copy link
Member

@oconnor663 oconnor663 commented May 5, 2021

@Boscop sorry for the delay. The next release is likely to be 1.0, and I've been waiting until I have a chunk of time to go over everything and make sure the whole API is in good shape. If this is a major blocker for you, though, we could cherry-pick it onto a patch release branch as a workaround?

@Boscop
Copy link

@Boscop Boscop commented May 5, 2021

It's fine, I just depend on the git version for now :)

@Boscop
Copy link

@Boscop Boscop commented May 14, 2021

@oconnor663 Actually, it is a problem, because I cannot cargo install trunk because it fails with the same error.
If there was a new blake3 version with this fix, cargo would fetch that instead.
So could you please release a new version with this fix? :)

oconnor663 added a commit that referenced this issue May 18, 2021
Changes since 0.3.7:
- Fix a build break under Visual Studio 2015:
  #142
- Add Hash::from_hex().
- Implement PartialEq<[u8]> for Hash, using constant_time_eq.
- Implement Display for Hash, equivalent to Hash::to_hex().
- Change derive_key() to return a 32-byte array. As with hash() and
  keyed_hash(), callers who want a non-default output length can use
  Hasher::finalize_xof().
- Replace Hasher::update_with_join() with Hasher::update_rayon(). The
  former was excessively generic, and the Join trait leaked
  implementation details. As part of this change, the Join trait is no
  longer public.
- Upgraded arrayvec to 0.7.0, which uses const generics. This bumps the
  minimum supported Rust compiler version to 1.51.
- Gate the digest and crypto-mac trait implementations behind an
  unstable feature, "traits-preview". As part of this change upgrade
  crypto-mac to 0.11.0.
@oconnor663
Copy link
Member

@oconnor663 oconnor663 commented May 18, 2021

@Boscop ok I'm going to stop dragging my feet and just release 1.0. Could you help me test that this branch (this PR) works for you?

@Boscop
Copy link

@Boscop Boscop commented May 18, 2021

Thanks, I'll test it soon.
But it would be nice to have a semver-compatible version with the fix, so that cargo will automatically pull them it I depend on crates that transitively depend on blake3 (which is the case with several crates).
Otherwise it requires waiting more time until the other crates bump blake3 to 1.0.
But I understand if it's not easy to create a branch that only contains this fix and no API changes..

@oconnor663
Copy link
Member

@oconnor663 oconnor663 commented May 20, 2021

@Boscop I think it's doable. I just put up a new branch, where I've cherry-picked just the changes from master that fix bugs in C and assembly files. (These are all build bugs, not get-the-wrong-answer bugs, as far as I know.) Could you see if this branch works for you: https://github.com/BLAKE3-team/BLAKE3/tree/release_0.3.8

@Boscop
Copy link

@Boscop Boscop commented May 22, 2021

@oconnor663 Thanks a lot, I just tested that branch, and it works!

I tested it by building this (which didn't build with the crates-io version of blake3 0.3) with:

[patch.crates-io]
blake3 = { git = "https://github.com/BLAKE3-team/BLAKE3", branch = "release_0.3.8" }

on Win 8.1 with Visual Studio 2015.

oconnor663 pushed a commit that referenced this issue May 25, 2021
Changes since 0.3.7:
- This is a backport release of bugfixes from master. The next release
  of master will be version 1.0.
- Fix a build break under Visual Studio 2015:
  #142
@oconnor663
Copy link
Member

@oconnor663 oconnor663 commented May 25, 2021

@Boscop I just released 0.3.8. Please let me know if cargo install trunk looks good under VS2015, when you get a chance.

@Boscop
Copy link

@Boscop Boscop commented May 25, 2021

@oconnor663 Yes, it worked, thanks!

Btw, I'm also using blake3 directly in a project, and because of VS2015 and because of my old CPU it won't be able to use AVX-512.
I'm regularly hashing thousands of small files (avg size 45 kb) for indexing purposes, is there a way to speed it up?
The docs say that enabling rayon usage doesn't make sense below 128 kb.
Would it help to memory map each file, or to have more threads than cores for parallel hash computation?
I'm using this function:

fn hash_file(path: impl AsRef<Path>) -> io::Result<Vec<u8>> {
	use blake3::Hasher;

	const BLOCKSIZE: usize = 16 * 1024; // 16 KiB supports all SIMD instruction sets

	let mut buf = [0u8; BLOCKSIZE];
	let mut file = File::open(path)?;
	let mut hasher = Hasher::new();
	loop {
		match file.read(&mut buf)? {
			0 => break,
			n => {
				hasher.update(&buf[.. n]);
			}
		}
	}
	Ok(hasher.finalize().as_bytes().to_vec())
}

Also, is there a way to compile my exe so that it will use runtime detection of SIMD features, so that it uses AVX-512 when running on a CPU that supports that? (If I compile it with the latest VS2019)

Btw, would it still run on CPUs that don't support AVX-512 if I compile it on a CPU that supports it?

@oconnor663
Copy link
Member

@oconnor663 oconnor663 commented May 26, 2021

Would it help to memory map each file, or to have more threads than cores for parallel hash computation?

I haven't done much work with lots of small files. My instinct is that hashing speed is less likely to be your bottleneck than disk read or directory traversal speed. You might want to look into some of the strategies that ripgrep uses to read lots of small files quickly. (I think the walkdir crate might be involved one way or another.) On the other hand, having a ton of threads reading lots of different directories might thrash a spinning disk if nothing is in cache. You might want to decide up-front about whether you care about benchmarking the cold cache case, or exclusively the hot cache case. (Notably I think burntsushi focused on hot cache benchmarks in his famous ripgrep post.)

b3sum uses 16 KiB as a hard cutoff (based on only a little bit of by-hand testing) below which it doesn't even attempt to memory-map files. At some point the overhead of mapping files (which is larger than that of a single read) starts to outweigh the gains. If you're at ~45 KiB, my wild guess is that you might see some mild-but-probably-not-impressive gains from memory mapping. Another thing I'd try is using a larger read buffer, to cut down on the number of read system calls you execute. (Maybe even just read_to_end into a Vec, if you can reuse the Vec.)

Also, is there a way to compile my exe so that it will use runtime detection of SIMD features, so that it uses AVX-512 when running on a CPU that supports that? (If I compile it with the latest VS2019)

Btw, would it still run on CPUs that don't support AVX-512 if I compile it on a CPU that supports it?

BLAKE3 does CPU feature detection automatically in both C and Rust. So unless you use special flags to turn it off, you'll get what you want here by default.

@Boscop
Copy link

@Boscop Boscop commented May 26, 2021

Thanks for the detailed response!

Yeah, I used to use walkdir but now I'm using jwalk because it scans the fs in parallel using rayon.

BLAKE3 does CPU feature detection automatically in both C and Rust. So unless you use special flags to turn it off, you'll get what you want here by default.

Even when compiling with VS2015 which doesn't support the flag for AVX-512?

@oconnor663
Copy link
Member

@oconnor663 oconnor663 commented May 26, 2021

Even when compiling with VS2015 which doesn't support the flag for AVX-512?

Ah good question, this slipped my mind above. If you take a look at our build.rs, there's a place where we explicitly check whether the C compiler supports AVX-512. If we find that the C compiler doesn't support the necessary flags, we skip building that implementation, and we set some cfg flags that suppress the corresponding Rust wrapper code. (Support for AVX2/SSE4.1/SSE2 is just assumed, and trying to build with a compiler so old that it's missing AVX2 will simply fail.)

What that means is, if you compile BLAKE3 with a modern C compiler that supports AVX-512 flags, then the AVX-512 implementation will be available for the dispatch functions to choose, based on processor support detected at runtime. But if you compile BLAKE3 with an old C compiler that doesn't support AVX-512, then the dispatch functions won't have the AVX-512 implementation available, even if you run the binary on a CPU that could have supported it.

If Rust adds support for AVX-512 in the future (either through intrinsics, like it has for SSE2/4.1 and AVX2 today, or through stable inline assembly), we'll be able to get rid of this reliance on the C compiler.

And to be clear, in all of these cases, the built binary is fully compatible with any x86 CPU. It will never dispatch to an implementation that the CPU doesn't support. That only changes if you set something like RUSTFLAGS="-C target-cpu=native". Then you get a non-portable binary, and you have to be careful where you run it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants