Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

floor function missing from libm offloaded on OpenMP (nvptx) #132456

Closed
KaruroChori opened this issue Mar 21, 2025 · 16 comments
Closed

floor function missing from libm offloaded on OpenMP (nvptx) #132456

KaruroChori opened this issue Mar 21, 2025 · 16 comments
Labels
openmp question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

Comments

@KaruroChori
Copy link

The function is missing, however __builtin_elementwise_floor works fine with nvptx as target.
Not sure what else is missing, I just found out this one because glm::fract was failing while linking.

@llvmbot
Copy link
Member

llvmbot commented Mar 21, 2025

@llvm/issue-subscribers-openmp

Author: None (KaruroChori)

The function is missing, however `__builtin_elementwise_floor` works fine with nvptx as target. Not sure what else is missing, I just found out this one because glm::fract was failing while linking.

@jhuber6
Copy link
Contributor

jhuber6 commented Mar 21, 2025

This normally comes from

__DEVICE__ double floor(double __f) { return __nv_floor(__f); }
which should be included in OpenMP unless you pass -nogpuinc. It's also provided as a symbol in the GPU libm from the libc project. How are you compiling this?

@KaruroChori
Copy link
Author

KaruroChori commented Mar 22, 2025

This should be the typical command, sorry for the "noise", but I am not writing it manually as it is generated via meson/ninjia

/archive/shared/apps/cross-clang/install3/usr/local/bin/clang++ -Isrc/app/sdl.p -Isrc/app -I../src/app -I../subprojects/magic_enum/include -Isubprojects/magic_enum/__CMake_build -I../subprojects/magic_enum/__CMake_build -Isubprojects/magic_enum -I../subprojects/magic_enum -I../include -Isubprojects/imgui-1.91.6 -I../subprojects/imgui-1.91.6 -I../subprojects/imgui-1.91.6/backends -I/usr/include/libpng16 -I/usr/local/include -fdiagnostics-color=always -D_GLIBCXX_ASSERTIONS=1 -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -Wpedantic -std=c++23 -O3 -Wno-gnu-case-range -Wno-zero-length-array '-DIMGUI_API=__attribute__((visibility(\"default\")))' -fopenmp -g -fopenmp-targets=nvptx64 -fopenmp-cuda-mode -MD -MQ src/app/sdl.p/main.cpp.o -MF src/app/sdl.p/main.cpp.o.d -o src/app/sdl.p/main.cpp.o -c ../src/app/main.cpp

I have not manually added that flag, and I cannot find it anywhere.

@KaruroChori
Copy link
Author

KaruroChori commented Mar 22, 2025

@jhuber6 I think I was able to find a bit more of the issue.
(yes "%d" is intentional or some of the magic does not let the linker fail with this exact example):

#pragma omp target map(tofrom: fn) 
{
    printf("%d",floor(10.0));
    printf("%d",floor(10.0f));
}

The first is valid, the second is not. If run on the host that code is fine. If in the second entry floorf is used it works. Basically the macro floor which should switch between types according to https://en.cppreference.com/w/c/numeric/math/floor does not do that on the offloaded libm.

@jhuber6
Copy link
Contributor

jhuber6 commented Mar 22, 2025

@jhuber6 I think I was able to find a bit more of the issue. (yes "%d" is intentional or some of the magic does not let the linker fail with this exact example):

#pragma omp target map(tofrom: fn) 
{
    printf("%d",floor(10.0));
    printf("%d",floor(10.0f));
}

The first is valid, the second is not. If run on the host that code is fine. If in the second entry floorf is used it works. Basically the macro floor which should switch between types according to https://en.cppreference.com/w/c/numeric/math/floor does not do that on the offloaded libm.

Ah, yeah that's probably not handled correctly in the GPU wrapper headers. They're extremely hacky. I believe the floor macro is a C++ thing, so it's not provided in libc either.

@KaruroChori
Copy link
Author

KaruroChori commented Mar 22, 2025

In theory, according to the reference it should be there since C99.
To be sure I also tested it in a normal C file, and when run on the CPU the macro is there working as expected with no C++ involvement.

@jhuber6
Copy link
Contributor

jhuber6 commented Mar 22, 2025

In theory, according to the reference it should be there since C99. To be sure I also tested it in a normal C file, and when run on the CPU the macro is there working as expected with no C++ involvement.

Hm, weird. I was looking at https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf and couldn't find it. Either way, it's probably something we can fix in the GPU headers by adding https://godbolt.org/z/obzr7z6c8 or something.

@KaruroChori
Copy link
Author

KaruroChori commented Mar 22, 2025

It actually does, around page 499 when it talks about tgmath.h but don't quote me on that. I can describe myself as a language user at best, I really don't know anything about standards :D.

So to be precise precise, according to the document you posted, it is tgmath which should be responsible for those macros. Not basic math. Which is true, including it in place of math solved the linking issue.

To be honest it is the first time I even heard about that header, I always worked under the assumption math was responsible for that.

Not sure on what the best approach would be. On one hand "as it is" follows the standard it seems. Yet, de facto everywhere else it is implemented differently.

@jhuber6
Copy link
Contributor

jhuber6 commented Mar 22, 2025

Right, totally forgot about that. It's a resource directory header too. it lowers to the C library calls, wondering if those shouldn't use builtins where possible. Anyways, for standards related questions I usually defer to @AaronBallman, but to me this looks like it's 'supposed' to work like that.

@AaronBallman
Copy link
Collaborator

I've always been a bit surprised that we supply tgmath.h as part of Clang's headers. To me, that's the responsibility of the C Standard Library rather than the compiler and C Standard Libraries do actually provide it:

https://codebrowser.dev/glibc/glibc/include/tgmath.h.html
https://elixir.bootlin.com/musl/v1.2.5/source/include/tgmath.h

But regardless, in terms of implementation, the standard requires that the macros in tgmath.h call the library functions defined in math.h. C23 Clause 7.27 is what specifies this.

@jhuber6
Copy link
Contributor

jhuber6 commented Mar 24, 2025

I've always been a bit surprised that we supply tgmath.h as part of Clang's headers. To me, that's the responsibility of the C Standard Library rather than the compiler and C Standard Libraries do actually provide it:

https://codebrowser.dev/glibc/glibc/include/tgmath.h.html https://elixir.bootlin.com/musl/v1.2.5/source/include/tgmath.h

But regardless, in terms of implementation, the standard requires that the macros in tgmath.h call the library functions defined in math.h. C23 Clause 7.27 is what specifies this.

Thanks for the input. Speaking of, we should probably copy the clang header into the LLVM libc to provide it there as well.

Should we close this issue? Since this seems to be the intended behavior when feeding a float constant to floor.

@AaronBallman
Copy link
Collaborator

Should we close this issue? Since this seems to be the intended behavior when feeding a float constant to floor.

Answer a bit unclear. The command line was executing clang++ ... ../src/app/main.cpp so this shouldn't really be involving tgmath.h I would expect?

If @KaruroChori can provide a small reproducer, that would be helpful.

@KaruroChori
Copy link
Author

KaruroChori commented Mar 25, 2025

Yeah it seems like tgmath.h is unrelated, somehow I got my tests wrong last time as well as my guess on what was faulty.

#include <glm/glm.hpp>
#include <cstdio>

int main(){
    #pragma omp target
    {
        printf("%f",glm::fract(1.0f));
    }
    return 0;
}

This works. Replacing 1.0f with 1.0 it does not.

nvlink error   : Undefined reference to 'floor' in '/tmp/test-compile-54f9c3-nvptx64-nvidia-cuda-sm_86-50de99-53c22f.cubin'

However if removing the #pragma omp target it compiles in both cases. Something in the offloaded version is preventing the automatic cast from double to float.
I checked the source of glm and I was unable able to spot anything wrong in there, but that could also be the case.

@jhuber6
Copy link
Contributor

jhuber6 commented Mar 25, 2025

Weird, I don't have any issues with 1.0f or 1.0 off the main branch.

@KaruroChori
Copy link
Author

KaruroChori commented Mar 25, 2025

Which version of nvlink do you have? Mine is a bit old on this workstation

nvlink: NVIDIA (R) Cuda linker
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

I am also rebuilding clang as I am a bit behind with commits and I will check again.

@KaruroChori
Copy link
Author

KaruroChori commented Mar 26, 2025

I can confirm it is a bug because of the older cuda sdk. Just tested on the latest version and it works. Clang had no fault of its own.

@EugeneZelenko EugeneZelenko added the question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! label Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openmp question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!
Projects
None yet
Development

No branches or pull requests

5 participants