Skip to content

MI300X (gfx942) support for broadcast operations #621

@joelandman

Description

@joelandman

Simple reproducer, not sure if this specific use case is supported or not. CPU and GPU versions for comparison. MI300X GPU, Ubuntu 22.04. ROCm 6.1 pre-release.

julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8* (2024-03-01 10:14 UTC)
Build Info:

    Note: This is an unofficial build, please report bugs to the project
    responsible for this build and not to the Julia project unless you can
    reproduce the issue using official builds available at https://julialang.org/downloads

Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 128 × AMD EPYC 9354 32-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 8 default, 0 interactive, 4 GC (on 128 virtual cores)
Environment:
  LD_LIBRARY_PATH = /home/amd/local/lib::/home/amd/local/lib:/home/amd/.npm_modules/lib

using AMDGPU

julia> AMDGPU.devices()
┌────┬─────────────────────┬────────────────────────┬───────────┬─────────────┐
│ Id │                Name │               GCN arch │ Wavefront │      Memory │
├────┼─────────────────────┼────────────────────────┼───────────┼─────────────┤
│  1 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │
│  2 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │
│  3 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │
│  4 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │
│  5 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │
│  6 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │
│  7 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │
│  8 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │
└────┴─────────────────────┴────────────────────────┴───────────┴─────────────┘


# CPU version
a_h = rand(Float16,5,5)
z_h = a_h .- Float16(0.5)

# GPU version 1
a_d = ROCMatrix(rand(Float16,5,5))
z_d = a_d .- Float16(0.5)

# GPU version 2
b_d = AMDGPU.rand(Float16,5,5)
y_d = b_d .- Float16(0.5)

The a_h and z_h are as expected.

julia> # CPU version
       a_h = rand(Float16,5,5)
5×5 Matrix{Float16}:
 0.0796  0.5674  0.3735  0.588    0.1387
 0.3408  0.747   0.1177  0.01953  0.165
 0.962   0.4517  0.1626  0.834    0.1772
 0.1313  0.248   0.0947  0.311    0.46
 0.51    0.6123  0.593   0.1958   0.356

julia> z_h = a_h .- Float16(0.5)
5×5 Matrix{Float16}:
 -0.4204     0.0674   -0.1265   0.0879  -0.3613
 -0.1592     0.2471   -0.3823  -0.4805  -0.335
  0.462     -0.04834  -0.3374   0.334   -0.3228
 -0.3687    -0.252    -0.4053  -0.189   -0.04004
  0.009766   0.1123    0.0928  -0.3042  -0.144

The a_d and b_d are properly set, though the subtraction yields this

julia> # GPU version 1
       a_d = ROCMatrix(rand(Float16,5,5))
5×5 ROCArray{Float16, 2, AMDGPU.Runtime.Mem.HIPBuffer}:
 0.4282  0.3154    0.796    0.391    0.6763
 0.413   0.9087    0.791    0.613    0.5547
 0.768   0.004883  0.09033  0.12305  0.9023
 0.6484  0.4707    0.827    0.9595   0.8643
 0.3164  0.2783    0.4043   0.2222   0.9355

julia> z_d = a_d .- Float16(0.5)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
warning: sramecc 'On' was requested for a processor that does not support it!
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
'gfx942' is not a recognized processor for this target (ignoring processor)
warning: sramecc 'On' was requested for a processor that does not support it!
ERROR: LLVM error: Cannot select: 0x55d229b85998: i32,ch = load<(dereferenceable invariant load (s8) from %ir..kernarg.offset7.cast + 33, basealign 8, addrspac                                e 4), zext from i8> 0x55d22a1a9d88, 0x55d228b82c20, undef:i64
  0x55d228b82c20: i64 = add 0x55d22e9b76d0, Constant:i64<153>
    0x55d22e9b76d0: i64,ch = CopyFromReg 0x55d22a1a9d88, Register:i64 %0
      0x55d22e9b7390: i64 = Register %0
    0x55d228b829b0: i64 = Constant<153>
  0x55d22a33edf0: i64 = undef
In function: _Z3_3516ROCKernelContext14ROCDeviceArrayI7Float16Li2ELi1EE11BroadcastedI13ROCArrayStyleILi2E9HIPBufferE5TupleI5OneToI5Int64ES6_IS7_EE1_S5_I8Extrud                                edIS0_IS1_Li2ELi1EES5_I4BoolS10_ES5_IS7_S7_EES1_EES7_
Stacktrace:
  [1] handle_error(reason::Cstring)
    @ LLVM ~/.julia/packages/LLVM/bzSzE/src/core/context.jl:168
  [2] LLVMTargetMachineEmitToMemoryBuffer(T::LLVM.TargetMachine, M::LLVM.Module, codegen::LLVM.API.LLVMCodeGenFileType, ErrorMessage::Base.RefValue{…}, OutMemB                                uf::Base.RefValue{…})
    @ LLVM.API ~/.julia/packages/LLVM/bzSzE/lib/15/libLLVM.jl:4241
  [3] emit(tm::LLVM.TargetMachine, mod::LLVM.Module, filetype::LLVM.API.LLVMCodeGenFileType)
    @ LLVM ~/.julia/packages/LLVM/bzSzE/src/targetmachine.jl:45
  [4] mcgen(job::GPUCompiler.CompilerJob, mod::LLVM.Module, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/mcgen.jl:84
  [5] macro expansion
    @ ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
  [6] macro expansion
    @ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:466 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
  [8] macro expansion
    @ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:463 [inlined]
  [9] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/utils.jl:92
 [10] emit_asm
    @ ~/.julia/packages/GPUCompiler/kqxyC/src/utils.jl:86 [inlined]
 [11]
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:154
 [12] codegen
    @ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:115 [inlined]
 [13]
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:111
 [14] compile
    @ ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:103 [inlined]
 [15] #40
    @ ~/.julia/packages/AMDGPU/gtxsf/src/compiler/codegen.jl:172 [inlined]
 [16] JuliaContext(f::AMDGPU.Compiler.var"#40#41"{GPUCompiler.CompilerJob{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.HIPCompilerParams}}; kwargs::@Kwargs{}                                )
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:52
 [17] JuliaContext(f::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:42
 [18] hipcompile(job::GPUCompiler.CompilerJob)
    @ AMDGPU.Compiler ~/.julia/packages/AMDGPU/gtxsf/src/compiler/codegen.jl:171
 [19] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(AMDGPU.Compiler.hipcompi                                le), linker::typeof(AMDGPU.Compiler.hiplink))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/execution.jl:128
 [20] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/execution.jl:103
 [21] macro expansion
    @ ~/.julia/packages/AMDGPU/gtxsf/src/compiler/codegen.jl:139 [inlined]
 [22] macro expansion
    @ ./lock.jl:267 [inlined]
 [23] hipfunction(f::GPUArrays.var"#35#37", tt::Type{Tuple{…}}; kwargs::@Kwargs{name::Nothing})
    @ AMDGPU.Compiler ~/.julia/packages/AMDGPU/gtxsf/src/compiler/codegen.jl:133
 [24] hipfunction
    @ ~/.julia/packages/AMDGPU/gtxsf/src/compiler/codegen.jl:132 [inlined]
 [25] macro expansion
    @ ~/.julia/packages/AMDGPU/gtxsf/src/highlevel.jl:172 [inlined]
 [26] #gpu_call#48
    @ ~/.julia/packages/AMDGPU/gtxsf/src/gpuarrays.jl:8 [inlined]
 [27] gpu_call
    @ ~/.julia/packages/AMDGPU/gtxsf/src/gpuarrays.jl:5 [inlined]
 [28] gpu_call(::GPUArrays.var"#35#37", ::ROCArray{…}, ::Base.Broadcast.Broadcasted{…}, ::Int64; target::ROCArray{…}, elements::Nothing, threads::Int64, blocks                                ::Int64, name::Nothing)
    @ GPUArrays ~/.julia/packages/GPUArrays/OKkAu/src/device/execution.jl:69
 [29] gpu_call
    @ ~/.julia/packages/GPUArrays/OKkAu/src/device/execution.jl:34 [inlined]
 [30] _copyto!
    @ ~/.julia/packages/GPUArrays/OKkAu/src/host/broadcast.jl:82 [inlined]
 [31] copyto!
    @ ~/.julia/packages/GPUArrays/OKkAu/src/host/broadcast.jl:44 [inlined]
 [32] copy
    @ ~/.julia/packages/GPUArrays/OKkAu/src/host/broadcast.jl:29 [inlined]
 [33] materialize(bc::Base.Broadcast.Broadcasted{AMDGPU.ROCArrayStyle{2, AMDGPU.Runtime.Mem.HIPBuffer}, Nothing, typeof(-), Tuple{ROCArray{…}, Float16}})
    @ Base.Broadcast ./broadcast.jl:903
 [34] top-level scope
    @ REPL[77]:1
 [35] top-level scope
    @ ~/.julia/packages/AMDGPU/gtxsf/src/tls.jl:200
Some type information was truncated. Use `show(err)` to see complete types.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions