-
Notifications
You must be signed in to change notification settings - Fork 644
[New Package] Add rocBLAS 4.2.0 #4255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'm guessing the glibc dlopen failure might be due to usage of AVX2? |
|
amdci7 should support AVX2, and an ISA issue should probably throw a SIGILL error, not a segmentation fault |
|
Ok will try to track this down on amdci2 |
Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com>
|
Ah shoot the issue was not to be able to dlopen |
|
gdb points to a crash during library initialization, so I guess I should be on the lookout for "fancy" things that rocBLAS is trying to do during init. |
|
In case this is familiar to anyone: |
|
The latest commit enables building with Tensile for two reasons:
@haampie if you get the chance, I would appreciate if you could give some insight into why this build is failing. The ASM being compiled looks valid to me, even though the compiler disagrees. |
|
I have never dlopen'ed rocblas.so, so I'm afraid I can't help out :( isn't Tensile required to actually get blas 3 kernels at all? |
|
By the way, hipcc inlines everything by default, but that can be disabled: https://github.com/ROCm-Developer-Tools/HIP/blob/37cb3a34938af39303b73aceb2d7803f5c7ca7ca/bin/hipcc#L522-L525 maybe worth trying? |
|
Somehow, this PR has processes that are still running on the Yggdrasil workers. They all look like: Somehow, they aren't dying properly. I've restarted the agents, but you should be aware that somehow this is causing problems. |
|
Running There are couple errors like this, although I'm not sure how important they are: But the whole process ends in a bit after |
|
Here's also readelf output. |
|
@pxl-th dumping the |
|
|
|
For some reason, when dumping |
|
Backtrace of |
|
@jpsamaroo does |
|
@pxl-th I believe that is the case, it tries to jump to the |
|
Ok, I think I have an idea of what the issue is. It appears that we're mixing up some conventions for how musl vs. glibc do constructors, where musl appears to use -1 as a sentinel for "end of ctors list", while glibc uses 0 for the same purpose. I have no idea why a -1 got inserted when there are ctors to run, but it must be related to link ordering, where somehow the -1 (which should be at the end to signal completion) ended up at the front. I would guess that we accidentally linked both the ctor implementation for musl and glibc (in that order probably). This is probably an issue with how I patched hipcc in What's odd is that I still see this behavior in the musl build, where I wouldn't expect to see the terminator be 0 (I would expect -1). |
|
Superseded by #5441 |
No description provided.