-
Notifications
You must be signed in to change notification settings - Fork 61
Closed
Description
Simple reproducer, not sure if this specific use case is supported or not. CPU and GPU versions for comparison. MI300X GPU, Ubuntu 22.04. ROCm 6.1 pre-release.
julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8* (2024-03-01 10:14 UTC)
Build Info:
Note: This is an unofficial build, please report bugs to the project
responsible for this build and not to the Julia project unless you can
reproduce the issue using official builds available at https://julialang.org/downloads
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 128 × AMD EPYC 9354 32-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 8 default, 0 interactive, 4 GC (on 128 virtual cores)
Environment:
LD_LIBRARY_PATH = /home/amd/local/lib::/home/amd/local/lib:/home/amd/.npm_modules/lib
using AMDGPU
julia> AMDGPU.devices()
┌────┬─────────────────────┬────────────────────────┬───────────┬─────────────┐
│ Id │ Name │ GCN arch │ Wavefront │ Memory │
├────┼─────────────────────┼────────────────────────┼───────────┼─────────────┤
│ 1 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │ 64 │ 191.984 GiB │
│ 2 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │ 64 │ 191.984 GiB │
│ 3 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │ 64 │ 191.984 GiB │
│ 4 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │ 64 │ 191.984 GiB │
│ 5 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │ 64 │ 191.984 GiB │
│ 6 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │ 64 │ 191.984 GiB │
│ 7 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │ 64 │ 191.984 GiB │
│ 8 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │ 64 │ 191.984 GiB │
└────┴─────────────────────┴────────────────────────┴───────────┴─────────────┘
# CPU version
a_h = rand(Float16,5,5)
z_h = a_h .- Float16(0.5)
# GPU version 1
a_d = ROCMatrix(rand(Float16,5,5))
z_d = a_d .- Float16(0.5)
# GPU version 2
b_d = AMDGPU.rand(Float16,5,5)
y_d = b_d .- Float16(0.5)
The a_h and z_h are as expected.
julia> # CPU version
a_h = rand(Float16,5,5)
5×5 Matrix{Float16}:
0.0796 0.5674 0.3735 0.588 0.1387
0.3408 0.747 0.1177 0.01953 0.165
0.962 0.4517 0.1626 0.834 0.1772
0.1313 0.248 0.0947 0.311 0.46
0.51 0.6123 0.593 0.1958 0.356
julia> z_h = a_h .- Float16(0.5)
5×5 Matrix{Float16}:
-0.4204 0.0674 -0.1265 0.0879 -0.3613
-0.1592 0.2471 -0.3823 -0.4805 -0.335
0.462 -0.04834 -0.3374 0.334 -0.3228
-0.3687 -0.252 -0.4053 -0.189 -0.04004
0.009766 0.1123 0.0928 -0.3042 -0.144
The a_d and b_d are properly set, though the subtraction yields this
julia> # GPU version 1
a_d = ROCMatrix(rand(Float16,5,5))
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
0.4282 0.3154 0.796 0.391 0.6763
0.413 0.9087 0.791 0.613 0.5547
0.768 0.004883 0.09033 0.12305 0.9023
0.6484 0.4707 0.827 0.9595 0.8643
0.3164 0.2783 0.4043 0.2222 0.9355
julia> z_d = a_d .- Float16(0.5)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
warning: sramecc 'On' was requested for a processor that does not support it!
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
warning: sramecc 'On' was requested for a processor that does not support it!
ERROR: LLVM error: Cannot select: 0x55d229b85998: i32,ch = load<(dereferenceable invariant load (s8) from %ir..kernarg.offset7.cast + 33, basealign 8, addrspac e 4), zext from i8> 0x55d22a1a9d88, 0x55d228b82c20, undef:i64
0x55d228b82c20: i64 = add 0x55d22e9b76d0, Constant:i64<153>
0x55d22e9b76d0: i64,ch = CopyFromReg 0x55d22a1a9d88, Register:i64 %0
0x55d22e9b7390: i64 = Register %0
0x55d228b829b0: i64 = Constant<153>
0x55d22a33edf0: i64 = undef
In function: _Z3_3516ROCKernelContext14ROCDeviceArrayI7Float16Li2ELi1EE11BroadcastedI13ROCArrayStyleILi2E9HIPBufferE5TupleI5OneToI5Int64ES6_IS7_EE1_S5_I8Extrud edIS0_IS1_Li2ELi1EES5_I4BoolS10_ES5_IS7_S7_EES1_EES7_
Stacktrace:
[1] handle_error(reason::Cstring)
@ LLVM ~/.julia/packages/LLVM/bzSzE/src/core/context.jl:168
[2] LLVMTargetMachineEmitToMemoryBuffer(T::LLVM.TargetMachine, M::LLVM.Module, codegen::LLVM.API.LLVMCodeGenFileType, ErrorMessage::Base.RefValue{…}, OutMemB uf::Base.RefValue{…})
@ LLVM.API ~/.julia/packages/LLVM/bzSzE/lib/15/libLLVM.jl:4241
[3] emit(tm::LLVM.TargetMachine, mod::LLVM.Module, filetype::LLVM.API.LLVMCodeGenFileType)
@ LLVM ~/.julia/packages/LLVM/bzSzE/src/targetmachine.jl:45
[4] mcgen(job::GPUCompiler.CompilerJob, mod::LLVM.Module, format::LLVM.API.LLVMCodeGenFileType)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/mcgen.jl:84
[5] macro expansion
@ ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
[6] macro expansion
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:466 [inlined]
[7] macro expansion
@ ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
[8] macro expansion
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:463 [inlined]
[9] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/utils.jl:92
[10] emit_asm
@ ~/.julia/packages/GPUCompiler/kqxyC/src/utils.jl:86 [inlined]
[11]
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:154
[12] codegen
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:115 [inlined]
[13]
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:111
[14] compile
@ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:103 [inlined]
[15] #40
@ ~/.julia/packages/AMDGPU/gtxsf/src/compiler/codegen.jl:172 [inlined]
[16] JuliaContext(f::AMDGPU.Compiler.var"#40#41"{GPUCompiler.CompilerJob{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.HIPCompilerParams}}; kwargs::@Kwargs{} )
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:52
[17] JuliaContext(f::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:42
[18] hipcompile(job::GPUCompiler.CompilerJob)
@ AMDGPU.Compiler ~/.julia/packages/AMDGPU/gtxsf/src/compiler/codegen.jl:171
[19] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(AMDGPU.Compiler.hipcompi le), linker::typeof(AMDGPU.Compiler.hiplink))
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/execution.jl:128
[20] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/execution.jl:103
[21] macro expansion
@ ~/.julia/packages/AMDGPU/gtxsf/src/compiler/codegen.jl:139 [inlined]
[22] macro expansion
@ ./lock.jl:267 [inlined]
[23] hipfunction(f::GPUArrays.var"#35#37", tt::Type{Tuple{…}}; kwargs::@Kwargs{name::Nothing})
@ AMDGPU.Compiler ~/.julia/packages/AMDGPU/gtxsf/src/compiler/codegen.jl:133
[24] hipfunction
@ ~/.julia/packages/AMDGPU/gtxsf/src/compiler/codegen.jl:132 [inlined]
[25] macro expansion
@ ~/.julia/packages/AMDGPU/gtxsf/src/highlevel.jl:172 [inlined]
[26] #gpu_call#48
@ ~/.julia/packages/AMDGPU/gtxsf/src/gpuarrays.jl:8 [inlined]
[27] gpu_call
@ ~/.julia/packages/AMDGPU/gtxsf/src/gpuarrays.jl:5 [inlined]
[28] gpu_call(::GPUArrays.var"#35#37", ::ROCArray{…}, ::Base.Broadcast.Broadcasted{…}, ::Int64; target::ROCArray{…}, elements::Nothing, threads::Int64, blocks ::Int64, name::Nothing)
@ GPUArrays ~/.julia/packages/GPUArrays/OKkAu/src/device/execution.jl:69
[29] gpu_call
@ ~/.julia/packages/GPUArrays/OKkAu/src/device/execution.jl:34 [inlined]
[30] _copyto!
@ ~/.julia/packages/GPUArrays/OKkAu/src/host/broadcast.jl:82 [inlined]
[31] copyto!
@ ~/.julia/packages/GPUArrays/OKkAu/src/host/broadcast.jl:44 [inlined]
[32] copy
@ ~/.julia/packages/GPUArrays/OKkAu/src/host/broadcast.jl:29 [inlined]
[33] materialize(bc::Base.Broadcast.Broadcasted{AMDGPU.ROCArrayStyle{2, AMDGPU.Runtime.Mem.HIPBuffer}, Nothing, typeof(-), Tuple{ROCArray{…}, Float16}})
@ Base.Broadcast ./broadcast.jl:903
[34] top-level scope
@ REPL[77]:1
[35] top-level scope
@ ~/.julia/packages/AMDGPU/gtxsf/src/tls.jl:200
Some type information was truncated. Use `show(err)` to see complete types.
Metadata
Metadata
Assignees
Labels
No labels