Skip to content

Fails to respect local CUDA_Runtime_jll's request to compile with local ptxas #2852

@eford

Description

@eford

I'm using a system that recently upgraded to CUDA v13 drivers.
But I have LocalPreferences.toml set to

[CUDA_Runtime_jll]
local = "true"
version = "12.9"

Setting JULIA_DEBUG=CUDA_Runtime_Discovery reveals that something is finding the local CUDA environment

┌ Debug: Looking for binary compute-sanitizer in /storage/icds/RISE/sw8/cuda/cuda-12.9.1 or /storage/icds/RISE/sw8/cuda/cuda-12.9.1/extras/compute-sanitizer or /storage/icds/RISE/sw8/cuda/cuda-12.9.1/compute-sanitizer
│   all_locations =
│    6-element Vector{String}:
│     "/storage/icds/RISE/sw8/cuda/cuda-12.9.1"
│     "/storage/icds/RISE/sw8/cuda/cuda-12.9.1/bin"
│     "/storage/icds/RISE/sw8/cuda/cuda-12.9.1/extras/compute-sanitizer"
│     "/storage/icds/RISE/sw8/cuda/cuda-12.9.1/extras/compute-sanitizer/bin"
│     "/storage/icds/RISE/sw8/cuda/cuda-12.9.1/compute-sanitizer"
│     "/storage/icds/RISE/sw8/cuda/cuda-12.9.1/compute-sanitizer/bin"
└ @ CUDA_Runtime_Discovery ~/.julia/packages/CUDA_Runtime_Discovery/8SKfu/src/CUDA_Runtime_Discovery.jl:164
...

But for some reason, it still tries to compile with ptxas v13. For example, if I run

using CUDA
N=16
x_d = CUDA.fill(1.0f0, N)
y_d = CUDA.fill(2.0f0, N)
y_d .+= x_d

I get

ERROR: Failed to compile PTX code (ptxas exited with code 255)
Invocation arguments: --generate-line-info --verbose --gpu-name sm_60 --output-file /tmp/jl_bmLkRs1kT2.cubin /tmp/jl_BQFczQL2fg.ptx
ptxas fatal   : Value 'sm_60' is not defined for option 'gpu-name'
If you think this is a bug, please file an issue and attach /tmp/jl_BQFczQL2fg.ptx
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/Wfi8S/src/compiler/compilation.jl:356
  [3] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Ecaql/src/execution.jl:245
  [4] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Ecaql/src/execution.jl:159
  [5] macro expansion
    @ ~/.julia/packages/CUDA/Wfi8S/src/compiler/execution.jl:373 [inlined]
  [6] macro expansion
    @ ./lock.jl:273 [inlined]
  [7] cufunction(f::GPUArrays.var"#gpu_broadcast_kernel_linear#39", tt::Type{…}; kwargs::@Kwargs{…})
    @ CUDA ~/.julia/packages/CUDA/Wfi8S/src/compiler/execution.jl:368
  [8] macro expansion
    @ ~/.julia/packages/CUDA/Wfi8S/src/compiler/execution.jl:112 [inlined]
  [9] (::KernelAbstractions.Kernel{…})(::CuArray{…}, ::Vararg{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
    @ CUDA.CUDAKernels ~/.julia/packages/CUDA/Wfi8S/src/CUDAKernels.jl:124
 [10] Kernel
    @ ~/.julia/packages/CUDA/Wfi8S/src/CUDAKernels.jl:110 [inlined]
 [11] _copyto!
    @ ~/.julia/packages/GPUArrays/u6tui/src/host/broadcast.jl:71 [inlined]
 [12] materialize!
    @ ~/.julia/packages/GPUArrays/u6tui/src/host/broadcast.jl:38 [inlined]
 [13] materialize!(dest::CuArray{…}, bc::Base.Broadcast.Broadcasted{…})
    @ Base.Broadcast ./broadcast.jl:875
 [14] top-level scope
    @ REPL[5]:1
Some type information was truncated. Use `show(err)` to see complete types.

I confirmed that I can sucessfully compile that ptx file from the command line using 12.9.1. 

``
So somehow CUDA.jl and GPUCompiler.jl are trying to compile with ptxas 13 rather than 12.9.1 like I've requested.
Any ideas?

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions