Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ninja + Clang on Windows Failure #270

Closed
All8Up opened this issue Dec 9, 2022 · 20 comments · Fixed by #271
Closed

Ninja + Clang on Windows Failure #270

All8Up opened this issue Dec 9, 2022 · 20 comments · Fixed by #271

Comments

@All8Up
Copy link
Contributor

All8Up commented Dec 9, 2022

Error:
"
CMake Error at b/_deps/corrosion-src/cmake/Corrosion.cmake:525 (message):
Unknown windows environment - Can't determine implib name
"

I recently switched our CI builds to use corrosion from a custom import script and all the combinations of msbuild vs ninja and clang vs msvc work except this one. It is only on Windows though as our linux ninja+clang works just fine.

I will dig into it in the morning but figured I'd open this up and see if there were any thoughts or existing issues related to this before I go spelunking?

@jschwe
Copy link
Collaborator

jschwe commented Dec 9, 2022

Are you targeting the windows msvc or gnu Abi?

Edit:
Does the issue also appear on master? It sounds similar to #255, but I don't think I've released a new version yet which includes the fix.

@All8Up
Copy link
Contributor Author

All8Up commented Dec 9, 2022

Are you targeting the windows msvc or gnu Abi?

In this case it is all msvc I believe, the command line in question specifies clang and clang++ for the compiler while everything else is left alone.

Edit: Does the issue also appear on master? It sounds similar to #255, but I don't think I've released a new version yet which includes the fix.

Hmm, I'll try changing the fetch first thing and figure out how to change it over. That would be wonderful if it was real simple. :D

@All8Up
Copy link
Contributor Author

All8Up commented Dec 9, 2022

So, I changed to using the master branch and then ended up with the same results as #255. Now, having read that bug though, something is not making a lot of sense to me. Up until I switched to Corrosion, the same command line arguments used to create the CMake functioned and calling Cargo from CMake did not cause this problem. So, I think the issue isn't where suspected or described. This seems much more like something Corrosion is doing that is throwing things off based on these results?

@All8Up
Copy link
Contributor Author

All8Up commented Dec 9, 2022

Ok, one more follow up for now. I added verbose to the flags for the import and received this:

C:<redacted>/stable-x86_64-pc-windows-msvc/bin/rustc.exe --crate-name build_script_build --edition=2018 C:\<redacted>cargo\registry\src\github.com-1ecc6299db9ec823\quote-1.0.21\build.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C embed-bitcode=no -C debuginfo=2 --cfg "feature=\"default\"" --cfg "feature=\"proc-macro\"" -C metadata=6b15d0cb383ab226 -C extra-filename=-6b15d0cb383ab226 --out-dir C:/<redacted>build\quote-6b15d0cb383ab226 -C "linker=C:/Program Files/LLVM/bin/clang++.exe" -L dependency=C:/<redacted>/Debug/cargo/build\debug\deps --cap-lints allow

Corrosion is explicitly setting the linker to the incorrect item. i.e. the "linker=.../clang++.exe".
I see I can set per target linker commands, is there a way to do the entire import?

@All8Up
Copy link
Contributor Author

All8Up commented Dec 9, 2022

Ok, think I know what happened here. Basically we are setting the CMAKE_LINKER but according to CMake documentation that does not propagate to the CMAKE_LINKER_PREFERENCE internal variable. As such, when Corrosion goes to figure out the linker to pass to each package target, it is overriding it with the wrong linker.

At least for my needs in this particular case (i.e. not a cross plat build), I have no need of a per package linker flag being set. So, I added the CORROSION_NO_LINKER_OVERRIDE flag and used that to disable the string which sets the linker per package in the crate. Would you be interested in a PR for that, it's maybe a 5-10 line change? The only issue is VSCode decided to reformat everything when I saved it and I didn't notice, so everything in my fork has different spaces.. Could fix that up if you are interested though?

@jschwe
Copy link
Collaborator

jschwe commented Dec 10, 2022

Basically we are setting the CMAKE_LINKER but according to CMake documentation that does not propagate to the CMAKE_LINKER_PREFERENCE internal variable. As such, when Corrosion goes to figure out the linker to pass to each package target, it is overriding it with the wrong linker.

Actually corrosion is determining the linker by analyzing the CMAKE_LINKER_PREFERENCE internal variables. But for some reason this does not work with msvc and is disabled here for the msvc Generator. I guess we may need to disable it for all MSVC abi targets, instead of just when using the MSVC generator. The condition would need to be changed to NOT is_windows_msvc, which potentially is not in scope at the point where the condition is evaluated. Maybe you could first check by deleting the whole block manually, to confirm that it is the responsible location?

@All8Up
Copy link
Contributor Author

All8Up commented Dec 10, 2022

Basically we are setting the CMAKE_LINKER but according to CMake documentation that does not propagate to the CMAKE_LINKER_PREFERENCE internal variable. As such, when Corrosion goes to figure out the linker to pass to each package target, it is overriding it with the wrong linker.

Actually corrosion is determining the linker by analyzing the CMAKE_LINKER_PREFERENCE internal variables. But for some reason this does not work with msvc

Yup, that's the problem child. As https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_LINKER_PREFERENCE.html mentioned, it is an internal thing and subject to change, so I'd be rather hesitant to trust it given how fast CMake changes. Additionally, there are several references which I found which mention that when the external user does something like '-DCMAKE_C_LINKER="blah"', that does not propagate to the 'CMAKE_LANG_LINKER_PREFERENCE'. Also in that link it points out that for 'CMAKE_LANG_LINKER_PREFERENCE_PROPAGATES' "A language compiled into static libraries linked by the target is considered if this variable is true", which I found was never true for the libraries we were creating.

I guess we may need to disable it for all MSVC abi targets, instead of just when using the MSVC generator. The condition would need to be changed to NOT is_windows_msvc, which potentially is not in scope at the point where the condition is evaluated. Maybe you could first check by deleting the whole block manually, to confirm that it is the responsible location?

I'm pretty sure the change I made is doing what you are saying. Specifically I wrapped up where Corrosion is creating the string for "cargo_target_linker " with a new flag fixed my builds. I added an option and when enabled, it later forces "cargo_target_linker " used in the "add_custom_target" for the package to be an empty string, so there is no 'linker=.." added to the command.

You can see the changes at: master...grafxtechnology:corrosion:master but unfortunately VSCode reformatted the entire thing so the diff is a huge mess. Basically though, there is a new option 'CORROSION_NO_LINKER_OVERRIDE' added at line 41 which disables the entire bit about selecting a linker. That may not be what you want to do as a final fix but I "think"(?) it is the confirmation you are looking for? I know for us it turned 24 failing CI builds to successes at least.

@jschwe
Copy link
Collaborator

jschwe commented Dec 10, 2022

Yup, that's the problem child. As https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_LINKER_PREFERENCE.html mentioned, it is an internal thing and subject to change, so I'd be rather hesitant to trust it given how fast CMake changes.

This particular code has been here since before I started maintaining the project, but I think it is necessary and provides reasonable defaults. CMAKE_<LANG>_LINKER_PREFERENCE essentially tells us what CMake thinks is the most important language for linking. For example, if both C and C++ are used in a project, then the C++ compiler will be used as the linker since the CXX linker preference is higher than the C linker preference. I'm not very familiar with Windows, but I guess the problem with msvc is that the linker should be cl.exe`, which is different from the compiler. So if we changed the check to check for MSVC ABI instead of the MSVC Generator, then your immediate issue should be fixed.

Also in that link it points out that for 'CMAKE_LANG_LINKER_PREFERENCE_PROPAGATES' "A language compiled into static libraries linked by the target is considered if this variable is true", which I found was never true for the libraries we were creating.

This would be used internally by CMake to determine which linker to use by choosing the language with the highest linker preference value considering the languages used for a target. Corrosion can't emulate this when linking is done via rust, since Corrosion does not know which languages are used for a target. In general, I'm not a huge fan of the current situation, but to my knowledge, CMake doesn't really provide anything to get the actual linker for a certain language.

Basically though, there is a new option 'CORROSION_NO_LINKER_OVERRIDE' added at line 41 which disables the entire bit about selecting a linker.

If this were an option on corrosion_import_crate instead of a global option, I'd be fine with it.

@jschwe jschwe linked a pull request Dec 10, 2022 that will close this issue
@jschwe
Copy link
Collaborator

jschwe commented Dec 10, 2022

Reopening issue to

@jschwe jschwe reopened this Dec 10, 2022
@All8Up
Copy link
Contributor Author

All8Up commented Dec 11, 2022

since the CXX linker preference is higher than the C linker preference. I'm not very familiar with Windows, but I guess the problem with msvc is that the linker should be cl.exe, which is different from the compiler. So if we changed the check to check for MSVC ABI instead of the MSVC Generator, then your immediate issue should be fixed.

Unfortunately Windows is such a mess when it comes to development environment, I don't think there is going to be a good 100% solution in this area. That's basically what I meant when I said I wouldn't really trust that linker preference variable, it's just another "guess" system which unfortunately on Windows is likely to make mistakes and need to be overridden for special cases.

Also in that link it points out that for 'CMAKE_LANG_LINKER_PREFERENCE_PROPAGATES' "A language compiled into static libraries linked by the target is considered if this variable is true", which I found was never true for the libraries we were creating.

This would be used internally by CMake to determine which linker to use by choosing the language with the highest linker preference value considering the languages used for a target. Corrosion can't emulate this when linking is done via rust, since Corrosion does not know which languages are used for a target. In general, I'm not a huge fan of the current situation, but to my knowledge, CMake doesn't really provide anything to get the actual linker for a certain language.

This is where I'm a bit confused. CMake, at least in my case, knows to use the correct linker which is cl. It is only the Rust libraries where it goes wrong. I.e. the compiler is overridden with clang/clang++, but initially we didn't bother to set the linker explicitly because CMake still did the correct thing and used cl for linking. Spelunking the CMake source doesn't help a lot since it looks like this behavior is a bit different in each generator, but the general behavior seemed to be: if the 'CMAKE_lang_LINKER' is not set, only then read the 'CMAKE_lang_LINKER_PREFERENCE'. I could be wrong in my interpretation but that was my reading from a quick look at the ninja and msvc generators.

Of course, prior to explicitly overriding the the linker in our builds, it was still doing the correct thing and using cl in both builds for everything "except" Rust. Bit of a head scratcher really.. It makes me think that the behavior we are seeing is due to an incorrect interpretation of how the values are computed but I can't really make sense of how that is happening without doing a deep dive into CMake sources which is something I actively avoid because it is scary in there...

Basically though, there is a new option 'CORROSION_NO_LINKER_OVERRIDE' added at line 41 which disables the entire bit about selecting a linker.

If this were an option on corrosion_import_crate instead of a global option, I'd be fine with it.

Yup, making this a flag on the import rather than as a global was my initial inclination, what I did was a quick hack to just prove it would work. But, given the above and some further thought, I'm leaning towards a different approach. Basically the problem, as I see it, is that Corrosion is trying to override the linker by default and take it out of Rust's hands even in common case builds. But, Rust has the information to do the right thing in "most" cases between the target toolchain and the host/target tuples. I.e. for most uses based on target and environment the override should be redundant should it not?

So, I was considering reversing things would be the better and more general purpose solution. Disable the linker setup in the custom target by default and add two new flags to the import: one to use the current "preference" approach and another to just set it explicitly. My current guess/reading of everything suggests that would be the much more robust solution at least until CMake itself starts integrating Rust for the linker preferences, if they bother.

What are your thoughts on such a change? I suspect the risk is actually pretty low based on my current use cases. Basically my CI dashboard has 37 entries covering a wide range of mixed C/C++ and Rust usage for Linux and Windows. It covers using rust in C/C++ libraries/executables and the other way around along with 4+ different ninja+vc, msbuild+clang etc combinations. So, pretty broad coverage. Disabling the linker override in all of those gives me a green board, all tests pass and we've been running the results without issue since the other day. I "think" that this is a fairly strong indicator that the proposal is more likely to fix things than cause issues while leaving the oddball outliers fixable with an explicit override?

@jschwe
Copy link
Collaborator

jschwe commented Dec 11, 2022

But, Rust has the information to do the right thing in "most" cases between the target toolchain and the host/target tuples. I.e. for most uses based on target and environment the override should be redundant should it not?

If you have a C++ static library and let Rust do the linking, by default Rust would just use the C-Compiler to link, which results in errors if the C++ code uses the C++ standard library.

It covers using rust in C/C++ libraries/executables and the other way around along with 4+ different ninja+vc, msbuild+clang etc combinations.

Corrosion only messes with the linker if Rust does the final linking, so the important tests are linking C++ into rust. Corrosion has a test that fails without setting the C++ linker, since Rust will by default use cc. Potentially corrosion could tell rust/the linker that it should also link the C++ standard library, but we don't know which linker rust would pick, so using the correct flags may also be difficult.

@All8Up
Copy link
Contributor Author

All8Up commented Dec 11, 2022

But, Rust has the information to do the right thing in "most" cases between the target toolchain and the host/target tuples. I.e. for most uses based on target and environment the override should be redundant should it not?

If you have a C++ static library and let Rust do the linking, by default Rust would just use the C-Compiler to link, which results in errors if the C++ code uses the C++ standard library.

Hmmmmmmm. ... I have two cases of doing this, so I'm not sure how/why it is functioning then. I have a circular case where I'm building a sys crate from a C++ static library which internally links against a number of Rust libraries. It uses "build.rs" to call CMake which internally calls Corrosion and Cargo again, yes I know it's evil.... :D On Windows I reset a bunch of the CMake default link libs and flags because we use the static multi thread runtime with utf16 support specifically for Rust linkage, so perhaps because we've narrowed our build parameters down to a very specific case that avoids this issue.

This of course goes right back to the horrible mess that is Windows development environments, some stupid number of possible combinations of the runtime and standard libraries because they have static/dynamic, single/multi thread, utf8, utf16, blah blah blah. Rust only supports a minimal set of those combinations and ignores all the rest. Worse, in a lot of cases mixing "works", it's unsafe and likely to cause other bugs, but unlike a nice safe Rusty thing, c/c++/linker won't bitch most of the time about mixing incompatible stuff.

I'll have to investigate this further next week. I suspect this might be masking a larger issue. I'd be suspicious of CMake defaulting to the incorrect unicode settings to link with Rust in C++ based on what you are saying. Just a guess based on too many years of futzing with all the horrible crap involved with making Windows work... I much prefer targeting consoles or linux, they don't have 40+ years of legacy crap getting in the way, but unfortunately part of our stuff has to run on Winblows.

It covers using rust in C/C++ libraries/executables and the other way around along with 4+ different ninja+vc, msbuild+clang etc combinations.

Corrosion only messes with the linker if Rust does the final linking, so the important tests are linking C++ into rust. Corrosion has a test that fails without setting the C++ linker, since Rust will by default use cc. Potentially corrosion could tell rust/the linker that it should also link the C++ standard library, but we don't know which linker rust would pick, so using the correct flags may also be difficult.

I'll have to re-read it again because when I went through the code it looked to me like it was inserting a link override in every generated custom target no matter what the type was. Very easy to say I might have missed something since I was in a bit of a hurry to get the red wall of CI failures fixed. :D

@jschwe
Copy link
Collaborator

jschwe commented Dec 11, 2022

Corrosion only messes with the linker if Rust does the final linking [...]

I'll have to re-read it again because when I went through the code it looked to me like it was inserting a link override in every generated custom target no matter what the type was. Very easy to say I might have missed something since I was in a bit of a hurry to get the red wall of CI failures fixed. :D

Yes, corrosion sets this unconditionally. It's just that when creating static libraries the linker is never called. So If you create Rust static libraries and then in CMake link them into your C/C++ target, then CMake will call the linker and corrosion has nothing to do with that. On the other hand if you have a Rust executable and link in a C /C++ library (via corrosion_link_libraries), then Rust will call the linker and the overrides corrosion sets will do something.

@All8Up
Copy link
Contributor Author

All8Up commented Dec 11, 2022

Yes, corrosion sets this unconditionally. It's just that when creating static libraries the linker is never called. So If you create Rust static libraries and then in CMake link them into your C/C++ target, then CMake will call the linker and corrosion has nothing to do with that. On the other hand if you have a Rust executable and link in a C /C++ library (via corrosion_link_libraries), then Rust will call the linker and the overrides corrosion sets will do something.

Ah.... Gotcha. That makes me wonder about the other case then. Specifically, even in my static library, if I use Winapi it will fail, suggesting that they are doing something behind the scenes that invokes the linker for some reason...

@jschwe
Copy link
Collaborator

jschwe commented Dec 11, 2022

After some more thought I have to partially retract my previous statement. In the traditional Cmake model it is indeed that the case that the linker is never invoked for static libraries, and for a rust only static library this should also hold true.

If the rust static library has a build script and links in other (non-rust) static libraries, then the linker does become involved, since cargo will have to link in the other static libraries. It would probably be preferable to defer the linking to the stage of the final executable, but then you would need to define the dependencies in CMake and not in the build script.
There is also the build script itself that is built and linked, which is affected by the linker corrosion sets (except for cross-compiling).

@All8Up
Copy link
Contributor Author

All8Up commented Dec 12, 2022

After some more thought I have to partially retract my previous statement. In the traditional Cmake model it is indeed that the case that the linker is never invoked for static libraries, and for a rust only static library this should also hold true.

If the rust static library has a build script and links in other (non-rust) static libraries, then the linker does become involved, since cargo will have to link in the other static libraries. It would probably be preferable to defer the linking to the stage of the final executable, but then you would need to define the dependencies in CMake and not in the build script. There is also the build script itself that is built and linked, which is affected by the linker corrosion sets (except for cross-compiling).

Ok, things are starting to clarify in my head about all of this now.. At least in the case of static libs, I would absolutely suggest disabling the linker override by default since whatever libraries are doing in the build.rs is their responsibility and not really something I think Corrosion should get in the way of. That's not to say that Winapi or other libraries should be doing something that fails with a linker override, but it does seems like something that should be left to them to decide on?

As to the issue with linking C++ into Rust items. At least on Windows, disabling the linker overrides still allows all the tests to succeed. So, I'm assuming it is a Linux issue or at least a non ms ABI issue. (Or I'm not running the tests correctly since I get a number that are skipped?) I tried it with VC2019 just for fun, not the oldest, not the newest, but at least after the 2015 runtime library split that made what was already a mess a bigger mess.

As to setting flags/libraries required for C++ on Linux, that's something a bit different and my linker-fu on Linux is not particularly strong. But, as a general rule of thumb, Rust itself is basically a C variant, it doesn't need/want the C++ standard libraries and I'm not sure that I would consider it Corrosion's responsibility to deal with that case. It's one of those things that if you were linking a C main file and it links against some library you don't have source for, CMake/make whatever is going to assume C linkage. If that other library contains C++ it would be up to the user to add the appropriate runtime as there is no way to detect such a case. Not to mention that the clang versus gnu C++ runtimes are different so you would have to figure out which one the library needs.

Given that, while it does not impact my current work, my $0.02 on this is that supporting the C++ case is out of scope of corrosion. It sounds much more like something that should just be documented with pointers to the most common libraries and then left up to the user. Given one of my targets is the Steam Deck (native Linux, not the emulation layer), I'm pretty sure I'll be running into this area soonish, so, we'll see when I get there.

I'll clean up adding an option to the import which disables the linker override and get a PR up for that. Seems like the minimal change for the moment which will fix the two current bugs caused by the Winapi crate among others it would impact. I just have to get a clean copy and not use VSCode or the stupid thing will reformat the entire file again... :D

@jschwe
Copy link
Collaborator

jschwe commented Dec 12, 2022

As to the issue with linking C++ into Rust items. At least on Windows, disabling the linker overrides still allows all the tests to succeed. So, I'm assuming it is a Linux issue or at least a non ms ABI issue

Ah, I guess with MSVC the linker driver is cl regardless of the language being C or C++. For gcc/g++ and clang/clang++ this does make a difference.

But, as a general rule of thumb, Rust itself is basically a C variant, it doesn't need/want the C++ standard libraries and I'm not sure that I would consider it Corrosion's responsibility to deal with that case. It's one of those things that if you were linking a C main file and it links against some library you don't have source for, CMake/make whatever is going to assume C linkage. If that other library contains C++ it would be up to the user to add the appropriate runtime as there is no way to detect such a case.

I tend to agree and I myself would probably have not added the linker preference code, but as a matter of fact we do have the logic in corrosion now and removing it would be a breaking change. I'll consider removing it in the next major release.

jschwe added a commit to jschwe/corrosion that referenced this issue Dec 18, 2022
As discussed in issue corrosion-rs#270 we at least should not set the linker for
Rust static libraries. In the general case the linker is not needed for
static libraries, and if it is invoked,
Rust should take care of what to choose
(e.g. for building build scripts) instead
of us forcing the choice.
jschwe added a commit to jschwe/corrosion that referenced this issue Dec 18, 2022
As discussed in issue corrosion-rs#270 we at least should not set the linker for
Rust static libraries. In the general case the linker is not needed for
static libraries, and if it is invoked,
Rust should take care of what to choose
(e.g. for building build scripts) instead
of us forcing the choice.
jschwe added a commit to jschwe/corrosion that referenced this issue Dec 18, 2022
As discussed in issue corrosion-rs#270 we at least should not set the linker for
Rust static libraries. In the general case the linker is not needed for
static libraries, and if it is invoked,
Rust should take care of what to choose
(e.g. for building build scripts) instead
of us forcing the choice.
jschwe added a commit that referenced this issue Dec 18, 2022
As discussed in issue #270 we at least should not set the linker for
Rust static libraries. In the general case the linker is not needed for
static libraries, and if it is invoked,
Rust should take care of what to choose
(e.g. for building build scripts) instead
of us forcing the choice.
jschwe added a commit that referenced this issue Jan 10, 2023
As discussed in issue #270 we at least should not set the linker for
Rust static libraries. In the general case the linker is not needed for
static libraries, and if it is invoked,
Rust should take care of what to choose
(e.g. for building build scripts) instead
of us forcing the choice.
@jschwe
Copy link
Collaborator

jschwe commented Jan 11, 2023

@All8Up Short reminder for me, was the remaining issue here also that linking of C++ libraries failed (with clang++)?

@All8Up
Copy link
Contributor Author

All8Up commented Jan 11, 2023

@All8Up Short reminder for me, was the remaining issue here also that linking of C++ libraries failed (with clang++)?

Actually, I'm not sure anymore. I've been working on smoothing the build process out and found that the current tagged version works fine with or without the flag in question but the head of main fails. Based on the different issues, I've moved forward with smoothing out our build process for the time being then I'll have to get back to this.

In other news, I'll have to go figure out the install process since I have to work around the lack of the EXPORT set flag. I'm going to have to get to that pretty soon, I need the export flag working since we use the auto generated configuration files from those.

@jschwe
Copy link
Collaborator

jschwe commented Jan 13, 2023

Okay - I added a CI job that tests ninja + clang + msvc abi, so I'm going to go ahead and close this issue for now. If you encounter another issue / related issues please open a new issue.

In other news, I'll have to go figure out the install process since I have to work around the lack of the EXPORT set flag.

This is probably something where corrosion is still quite lacking and PRs to improve install functionality would be very welcome.

@jschwe jschwe closed this as completed Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants