Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM 16 and 17 support #1730

Merged
merged 16 commits into from
Oct 4, 2023
Merged

Conversation

brechtvl
Copy link
Contributor

@brechtvl brechtvl commented Sep 16, 2023

Description

  • Bump maximum supported LLVM version to 17.
  • Rename makeArrayRef to toArrayRef to avoid llvm::makeArrayRef deprecation warnings.
  • Minor required changes related to llvm::Align.
  • Disable initializeInstrumentation, which was removed and appears to serve no purpose now.
  • Add i128 to OptiX data layout. This was already in the NVPTX backend since LLVM 6, the docs this was taken from seem outdated. Unclear why it started failing now, but the data layout seems clearly incomplete without this.
  • Since llvm/llvm-project@2c3f82b, FRem generates an optix.ptx.testp.infinite.f32 intrinsic that OptiX does not currently implement. Work around with custom implementation.
  • LLVM 16 was generating global variables with characters not supported by ptx. Add manual renaming matching the NVPTXAssignValidGlobalNames LLVM pass.
  • These global variables were actually meant to be string hashes instead of strings for performance, so also ensure they are evaluated to hashes at compile time.

This depends on both opaque pointers (#1728) and new pass manager support (#1729), and should not be merged before them.

Tests

CI jobs updated to use LLVM 16 for bleeding edge.

Checklist:

  • I have read the contribution guidelines.
  • I have previously submitted a Contributor License Agreement.
  • I have updated the documentation, if applicable.
  • I have ensured that the change is tested somewhere in the testsuite (adding new test cases if necessary).
  • My code follows the prevailing code style of this project.

* Bump minimum supported LLVM version to 16.
* Rename `makeArrayRef` to `toArrayRef` to avoid `llvm::makeArrayRef`
  deprecation warnings.
* Minor required changes related to `llvm::Align`.
* Disable `initializeInstrumentation` and `createPruneEHPass` which
  were remove and appear to serve no purpose now.
* Add assert when using default optimization levels with legacy pass
  manager, which is no longer supported.

This depends on both opaque pointers and new pass manager support.

Signed-off-by: Brecht Van Lommel <brecht@blender.org>
@brechtvl brechtvl marked this pull request as draft September 16, 2023 22:49
@brechtvl brechtvl marked this pull request as ready for review September 16, 2023 23:57
Signed-off-by: Brecht Van Lommel <brecht@blender.org>
@lgritz
Copy link
Collaborator

lgritz commented Sep 18, 2023

If it helps you, this little patch will modify the CI to run the bleeding edge test on llvm 16:

--------------------------- .github/workflows/ci.yml ---------------------------
index c9b8470c..be7778c7 100644
@@ -343,7 +343,7 @@ jobs:
             python_ver: "3.10"
             simd: avx2,f16c
             batched: b8_AVX2,b8_AVX512,b16_AVX512
-            setenvs: export LLVM_VERSION=14.0.0
+            setenvs: export LLVM_VERSION=15.0.6
                             LLVM_DISTRO_NAME=ubuntu-18.04
                             OPENCOLORIO_VERSION=v2.2.0
                             PUGIXML_VERSION=v1.13
@@ -359,8 +359,8 @@ jobs:
             python_ver: "3.10"
             simd: avx2,f16c
             batched: b8_AVX2,b8_AVX512,b16_AVX512
-            setenvs: export LLVM_VERSION=15.0.6
-                            LLVM_DISTRO_NAME=ubuntu-18.04
+            setenvs: export LLVM_VERSION=16.0.4
+                            LLVM_DISTRO_NAME=ubuntu-22.04
                             OPENCOLORIO_VERSION=main
                             PUGIXML_VERSION=master
           - desc: clang14/C++17 llvm14 boost1.71 exr3.1 py3.8 avx2 batch-b16avx512

Co-authored-by: Larry Gritz <lg@larrygritz.com>
Signed-off-by: Brecht Van Lommel <brecht@blender.org>
@brechtvl
Copy link
Contributor Author

brechtvl commented Sep 18, 2023

If it helps you, this little patch will modify the CI to run the bleeding edge test on llvm 16:

Thanks. It's going to fail to build with LLVM 16 right now, but once the other PRs are merged I can try it.

@brechtvl
Copy link
Contributor Author

brechtvl commented Sep 18, 2023

LLVM 17 (just released) actually works as well, I needed to do a minor change the new pass manager PR.

Signed-off-by: Brecht Van Lommel <brecht@blender.org>
@brechtvl brechtvl changed the title LLVM 16 support LLVM 16 and 17 support Sep 20, 2023
i128:128 was already in the NVPTX backend since LLVM 6, the docs
this was taken from seem outdated. Unclear it started failing now,
but the data layout seems clearly incomplete without this.

Signed-off-by: Brecht Van Lommel <brecht@blender.org>
@brechtvl
Copy link
Contributor Author

OptiX almost works with LLVM 16 but there is one problem, optixModuleCreateFn reports the following:

New backend is missing implementation for PTX intrinsic optix.ptx.testp.infinite.f32

What seems to happen is that LLVM is now generating an intrinsic that OptiX does not support. A custom LLVM 16 build with the following commit reverted works: llvm/llvm-project@2c3f82b

@tgrant-nv, what would be the solution here? A fix in the OptiX driver? Somehow patching the generated ptx in OSL to avoid this instruction?

Signed-off-by: Brecht Van Lommel <brecht@blender.org>
@brechtvl
Copy link
Contributor Author

I found a workaround, though it doesn't seem like a great long term solution.

Signed-off-by: Brecht Van Lommel <brecht@blender.org>
@brechtvl
Copy link
Contributor Author

I encountered a problem with the testoptix-noise.optix:

Message: Creating module for PTX group group: COMPILE ERROR: Invalid PTX input: ptx2llvm-module-001: error: Failed to parse input PTX string
ptx2llvm-module-001, line 46; fatal   : Parsing error near '.str6': syntax error
Cannot parse input PTX string

I couldn't spot anything suspicious relating to str6 in shadeops_cuda.ptx. It's a global string usimplexnoise. Shuffling around the code that generates this worked around the problem.

@tgrant-nv
Copy link
Contributor

Thanks, Brecht.

If there is a variable named .str6, that is not a valid identifier in PTX since names cannot contain . (see section 4.4 in the CUDA docs). There is an opt flag to emit valid identifiers (--nvptx-assign-valid-global-names), which we are already using in cuda_macros.cmake. opt is being run on the shadeops IR with that flag when shadeops_cuda.ptx is being generated, so it's not clear to me why it's not having the intended effect.

Regarding the fmod issue, that looks like something that should be handled in the OptiX driver. Unfortunately we do stumble upon these coverage gaps from time to time. We should be able to support these testp instructions in an upcoming driver release, but your workaround will be necessary for existing drivers.

@brechtvl
Copy link
Contributor Author

brechtvl commented Sep 28, 2023

Ah, it's not actually using that --nvptx-assign-valid-global-names flag with the new pass manager for LLVM 16, so that would explain it. Those changes are in #1729, specifically c83afef.

The opt command with the legacy pass manager no longer supports default -O levels, so I made it use the new pass manager. But I could not find a way to use the --nvptx-assign-valid-global-names pass in either opt or llc then.

My understanding is that this type of target specific codegen pass should be done in llc instead of opt now. And there are some --nvptx flags for llc and various pass flags for other targets, but not this specific one. Maybe this was an oversight and the flag could be implemented in llc upstream, but that would be for LLVM 18 I guess.

@brechtvl
Copy link
Contributor Author

This is what it looks like in PTX, there's not actually a . in the name?

.visible .global .align 1 .b8 _$_str6[14] = {117, 115, 105, 109, 112, 108, 101, 120, 110, 111, 105, 115, 101, 0};

@brechtvl brechtvl mentioned this pull request Sep 28, 2023
5 tasks
@tgrant-nv
Copy link
Contributor

I've built your branch and confirmed that the issue is not in shadeops_cuda.ptx, which contains the valid name as seen in the snippet you posted, but in the JIT'd shader PTX for group2. It contains two declarations for strings with invalid identifiers:

.extern .global .align 1 .b8 .str6[14];
.extern .global .align 1 .b8 .str7[9];

We will need to somehow convince LLVM to generate valid names, or perhaps write a small pass to sanitize invalid names.

Signed-off-by: Brecht Van Lommel <brecht@blender.org>
Signed-off-by: Brecht Van Lommel <brecht@blender.org>
Signed-off-by: Brecht Van Lommel <brecht@blender.org>
src/liboslexec/opnoise.cpp Outdated Show resolved Hide resolved
Signed-off-by: Brecht Van Lommel <brecht@blender.org>
And revert previous workaround.

Signed-off-by: Brecht Van Lommel <brecht@blender.org>
Signed-off-by: Brecht Van Lommel <brecht@blender.org>
@brechtvl
Copy link
Contributor Author

It occurred to me that there should be no strings in the ptx code in the first place, there should only be string hashes. And it appears that there is nothing strictly requiring the hash to be evaluated at compile time. So I implemented a fix based on this:
https://artificial-mind.net/blog/2020/11/14/cpp17-consteval

The issue of JIT potentially generating invalid global variable names remains. But I suspect it's not new and we've just been lucky that the optimizer decided to evaluate the hash at compile time in earlier LLVM versions.

With this latest fix I think this PR is ready,

@AlexMWells
Copy link
Contributor

@brechtvl , what is the minimum CUDA version required? I thought constexpr might have problems compiling with CUDA < 12?

@brechtvl
Copy link
Contributor Author

brechtvl commented Oct 2, 2023

This fix should only use C++14, which has been supported since CUDA 9 (released in 2017). I couldn't find a minimum CUDA version in the OSL code, but that seems old enough.

Maybe you are thinking of if constexpr which is a C++17 feature?

I have been testing with CUDA 11.8.

@lgritz
Copy link
Collaborator

lgritz commented Oct 2, 2023

I think we use 11.6 currently at SPI? (I think, I'm not in front of it at the moment.)

The INSTALL.md says that OSL supports Cuda >= 8.0, but (a) we definitely don't test it, so I don't know what the actual oldest version is that works properly, and (b) I have no particular objection to picking a higher minimum, though I don't have a lot of data about how high we can go before people will be inconvenienced. (Anecdotal evidence: according to the vfx industry build matrix, the major studios are all on 9.2 and higher and the DCCs say they want 10 or higher.)

@brechtvl
Copy link
Contributor Author

brechtvl commented Oct 2, 2023

I ran some tests:

  • CUDA 8.0: fails to build in main because nvcc --std=c++14 is unsupported.
  • CUDA 9.2 and 10.2: builds and most test pass, but testoptix-noise.optix.opt fails with illegal memory access was encountered. both in main and this PR.
  • CUDA 11.8: builds and tests pass.

I'm not sure what is going on with that failing test, if that's a real problem that someone using CUDA 9 or 10 might encounter in production.

But for this PR at least, the usage of C++14 seems ok since it's already using --std=c++14 for nvcc commands.

@tgrant-nv
Copy link
Contributor

tgrant-nv commented Oct 2, 2023

Regarding the invalid variable name issue, I'm not sure if there is a way to trigger the existing pass (NVPTXAssignValidGlobalNames) with the new pass manager. Perhaps there are some target options that could be set before the PTX is emitted that would do the trick, but I'm not sure. But it would be easy to duplicate the existing pass in liboslexec. It doesn't need to be a full-fledged pass. Something like this would do the trick (cribbed from the LLVM source):

diff --git a/src/liboslexec/llvm_instance.cpp b/src/liboslexec/llvm_instance.cpp
index 532ba40b..82a0857d 100644
--- a/src/liboslexec/llvm_instance.cpp
+++ b/src/liboslexec/llvm_instance.cpp
@@ -1871,12 +1871,28 @@ BackendLLVM::run()
                 fn.addFnAttr("osl-lib-function", "true");
             }
 
+            auto cleanupName = [&](llvm::StringRef Name) {
+                std::string ValidName;
+                llvm::raw_string_ostream ValidNameStream(ValidName);
+                for (char C : Name) {
+                    if (C == '.' || C == '@') {
+                        ValidNameStream << "_$_";
+                    } else {
+                        ValidNameStream << C;
+                    }
+                }
+                return ValidNameStream.str();
+            };
+
             // Mark all global variables extern and discard their initializers.
             // Global variables are defined in the shadeops PTX file.
             for (llvm::GlobalVariable& global : ll.module()->globals()) {
                 global.setLinkage(llvm::GlobalValue::ExternalLinkage);
                 global.setExternallyInitialized(true);
                 global.setInitializer(nullptr);
+                std::string validName = cleanupName(global.getName());
+                if(validName != global.getName())
+                    global.setName(validName);
             }
         }
         OSL_ASSERT(ll.module());

@AlexMWells
Copy link
Contributor

CUDA 8.0: fails to build in main because nvcc --std=c++14 is unsupported.
I think the current INSTALL.md states CUDA 8 as minimum, so perhaps it needs to be bumped up.

https://developer.nvidia.com/blog/programming-efficiently-with-the-cuda-11-3-compiler-toolchain/

@lgritz
Copy link
Collaborator

lgritz commented Oct 2, 2023

Thanks for checking on the Cuda versions, Brecht. It sounds like we should change the advertised set of versions for Cuda to be 9.0 minimum and 11.0+ recommended. I'm happy to quickly whip that up and submit it.

Patch by Tim with minor code style changes.

Co-authored-by: Tim Grant <tgrant@nvidia.com>
Signed-off-by: Brecht Van Lommel <brecht@blender.org>
@brechtvl
Copy link
Contributor Author

brechtvl commented Oct 2, 2023

Regarding the invalid variable name issue, I'm not sure if there is a way to trigger the existing pass (NVPTXAssignValidGlobalNames) with the new pass manager. Perhaps there are some target options that could be set before the PTX is emitted that would do the trick, but I'm not sure. But it would be easy to duplicate the existing pass in liboslexec. It doesn't need to be a full-fledged pass. Something like this would do the trick ...

I can confirm this also fixes the issue. So included it, in case such global variables are generated for a good reason in the future.

@brechtvl
Copy link
Contributor Author

brechtvl commented Oct 2, 2023

Thanks for checking on the Cuda versions, Brecht. It sounds like we should change the advertised set of versions for Cuda to be 9.0 minimum and 11.0+ recommended. I'm happy to quickly whip that up and submit it.

Sounds good.

@lgritz
Copy link
Collaborator

lgritz commented Oct 2, 2023

#1737

Signed-off-by: Brecht Van Lommel <brecht@blender.org>
@lgritz
Copy link
Collaborator

lgritz commented Oct 3, 2023

Is this ready to merge?

Copy link
Collaborator

@lgritz lgritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and passes CI except for the Intel compiler tests which are broken for other reasons (on their end).

@brechtvl
Copy link
Contributor Author

brechtvl commented Oct 3, 2023

Yes, it's ready.

@lgritz lgritz merged commit cdfec4f into AcademySoftwareFoundation:main Oct 4, 2023
21 of 23 checks passed
@spelufo
Copy link

spelufo commented Oct 16, 2023

Hi, just wanted to let you know that OSL is currently disabled on Archlinux's blender since it was waiting for LLVM 16 support. This PR seems to indicate that it is now ready, but I don't know if there's some work left on the blender side. I've emailed the arch package mantainers, haven't heard back yet.

@brechtvl If I compile the latest blender from source with the same cmake command as arch's without the disable osl flag, should it work? I was relying on OSL for this.

Thanks

@brechtvl
Copy link
Contributor Author

It does not work currently. The OSL API is still undergoing changes in the main branch, only when there is a stable OSL release can we really ensure it's compatible.

I suggest to use either the Blender binary download or the build instructions from blender.org, instead of the Arch package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants