Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature 'cvt with .bf16' requires .target sm_80 or higher Error #3947 #18070

Closed
lk1983823 opened this issue Jul 13, 2023 · 2 comments
Closed
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.0.x

Comments

@lk1983823
Copy link

lk1983823 commented Jul 13, 2023

Bug description

I am running a code using lightning and deepspeed, the optimizer is set as:
optimizer = deepspeed.ops.adam.DeepSpeedCPUAdam(model.parameters(), lr=1e-3)

When the code runs the first time, it do as follows:

Using /home/lk/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Creating extension directory /home/lk/.cache/torch_extensions/py310_cu117/cpu_adam...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/lk/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Using /home/lk/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
[1/3] /usr/local/cuda-11.7/bin/nvcc  -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-11.7/include -isystem /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/include -isystem /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/include/TH -isystem /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /home/lk/anaconda3/envs/weather/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 -c /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o 
[2/3] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-11.7/include -isystem /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/include -isystem /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/include/TH -isystem /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /home/lk/anaconda3/envs/weather/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++14 -g -Wno-reorder -L/usr/local/cuda-11.7/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX512__ -D__ENABLE_CUDA__ -c /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o 
[3/3] c++ cpu_adam.o custom_cuda_kernel.cuda.o -shared -lcurand -L/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda-11.7/lib64 -lcudart -o cpu_adam.so

where in the step 1/3, it sets -gencode=arch=compute_75,code=sm_75, which is not satisfied in the following steps and the error shows:

Loading extension module cpu_adam...
Time to load cpu_adam op: 33.08566975593567 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 32.0780827999115 seconds
lk:26404:26404 [0] NCCL INFO Bootstrap : Using enp34s0:192.168.1.3<0>
lk:26404:26404 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
lk:26404:26404 [0] NCCL INFO cudaDriverVersion 11070
NCCL version 2.14.3+cuda11.7
lk:26404:26902 [0] NCCL INFO NET/IB : No device found.
lk:26404:26902 [0] NCCL INFO NET/Socket : Using [0]enp34s0:192.168.1.3<0>
lk:26404:26902 [0] NCCL INFO Using network Socket
lk:26515:26515 [1] NCCL INFO cudaDriverVersion 11070
lk:26515:26515 [1] NCCL INFO Bootstrap : Using enp34s0:192.168.1.3<0>
lk:26515:26515 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
lk:26515:26903 [1] NCCL INFO NET/IB : No device found.
lk:26515:26903 [1] NCCL INFO NET/Socket : Using [0]enp34s0:192.168.1.3<0>
lk:26515:26903 [1] NCCL INFO Using network Socket
lk:26515:26903 [1] NCCL INFO Setting affinity for GPU 1 to ff,c00ffc00
lk:26404:26902 [0] NCCL INFO Setting affinity for GPU 0 to 3ff003ff
lk:26404:26902 [0] NCCL INFO Channel 00/02 :    0   1
lk:26404:26902 [0] NCCL INFO Channel 01/02 :    0   1
lk:26515:26903 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0
lk:26404:26902 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
lk:26404:26902 [0] NCCL INFO Channel 00 : 0[2d000] -> 1[99000] via SHM/direct/direct
lk:26404:26902 [0] NCCL INFO Channel 01 : 0[2d000] -> 1[99000] via SHM/direct/direct
lk:26515:26903 [1] NCCL INFO Channel 00 : 1[99000] -> 0[2d000] via SHM/direct/direct
lk:26515:26903 [1] NCCL INFO Channel 01 : 1[99000] -> 0[2d000] via SHM/direct/direct
lk:26515:26903 [1] NCCL INFO Connected all rings
lk:26515:26903 [1] NCCL INFO Connected all trees
lk:26515:26903 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
lk:26404:26902 [0] NCCL INFO Connected all rings
lk:26515:26903 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
lk:26404:26902 [0] NCCL INFO Connected all trees
lk:26404:26902 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
lk:26404:26902 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
lk:26404:26902 [0] NCCL INFO comm 0x6e3bcde0 rank 0 nranks 2 cudaDev 0 busId 2d000 - Init COMPLETE
lk:26515:26903 [1] NCCL INFO comm 0x6e299d50 rank 1 nranks 2 cudaDev 1 busId 99000 - Init COMPLETE
Using /home/lk/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Creating extension directory /home/lk/.cache/torch_extensions/py310_cu117/utils...
Emitting ninja build file /home/lk/.cache/torch_extensions/py310_cu117/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Using /home/lk/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
[1/2] c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/include -isystem /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/include/TH -isystem /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/include/THC -isystem /home/lk/anaconda3/envs/weather/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o 
[2/2] c++ flatten_unflatten.o -shared -L/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o utils.so
Loading extension module utils...
Time to load utils op: 16.351833820343018 seconds
Loading extension module utils...
Time to load utils op: 16.23490309715271 seconds
Parameter Offload: Total persistent parameters: 3570141 in 191 params
lk:26404:27043 [0] NCCL INFO Using network Socket
lk:26515:27044 [1] NCCL INFO Using network Socket
lk:26404:27043 [0] NCCL INFO Setting affinity for GPU 0 to 3ff003ff
lk:26515:27044 [1] NCCL INFO Setting affinity for GPU 1 to ff,c00ffc00
lk:26404:27043 [0] NCCL INFO Channel 00/02 :    0   1
lk:26515:27044 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0
lk:26404:27043 [0] NCCL INFO Channel 01/02 :    0   1
lk:26404:27043 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
lk:26515:27044 [1] NCCL INFO Channel 00 : 1[99000] -> 0[2d000] via SHM/direct/direct
lk:26515:27044 [1] NCCL INFO Channel 01 : 1[99000] -> 0[2d000] via SHM/direct/direct
lk:26404:27043 [0] NCCL INFO Channel 00 : 0[2d000] -> 1[99000] via SHM/direct/direct
lk:26404:27043 [0] NCCL INFO Channel 01 : 0[2d000] -> 1[99000] via SHM/direct/direct
lk:26515:27044 [1] NCCL INFO Connected all rings
lk:26404:27043 [0] NCCL INFO Connected all rings
lk:26515:27044 [1] NCCL INFO Connected all trees
lk:26404:27043 [0] NCCL INFO Connected all trees
lk:26515:27044 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
lk:26515:27044 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
lk:26404:27043 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
lk:26404:27043 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
lk:26515:27044 [1] NCCL INFO comm 0x6d12f400 rank 1 nranks 2 cudaDev 1 busId 99000 - Init COMPLETE
lk:26404:27043 [0] NCCL INFO comm 0x6e4a87a0 rank 0 nranks 2 cudaDev 0 busId 2d000 - Init COMPLETE
Using /home/lk/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0005307197570800781 seconds
Epoch:  0 dataloader_length:  10
  0%|                                                                                                                                             | 0/10 [00:00<?, ?it/s]Using /home/lk/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0003745555877685547 seconds
Epoch:  0 dataloader_length:  10
  0%|                                                                                                                                             | 0/10 [00:03<?, ?it/s]
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 549, in _worker_compile
    kernel.precompile(warm_cache_only_with_cc=cc)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_inductor/triton_ops/autotune.py", line 69, in precompile
    self.launchers = [
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_inductor/triton_ops/autotune.py", line 70, in <listcomp>
    self._precompile_config(c, warm_cache_only_with_cc)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_inductor/triton_ops/autotune.py", line 83, in _precompile_config
    triton.compile(
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/triton/compiler.py", line 1621, in compile
    next_module = compile(module)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/triton/compiler.py", line 1558, in <lambda>
    lambda src: ptx_to_cubin(src, capability))
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/triton/compiler.py", line 1031, in ptx_to_cubin
    return _triton.compile_ptx_to_cubin(ptx, ptxas, compute_capability)
RuntimeError: Internal Triton PTX codegen error: 
ptxas /tmp/compile-ptx-src-9e79cc, line 59; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 59; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 60; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 60; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 61; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 61; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 62; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 62; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 63; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 63; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 64; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 64; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 65; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 65; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 66; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 66; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 71; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 71; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 72; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 72; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 73; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 73; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 74; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 74; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 75; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 75; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 76; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 76; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 77; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 77; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 78; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 78; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas fatal   : Ptx assembly aborted due to errors

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 670, in call_user_compiler
    compiled_fn = compiler_fn(gm, self.fake_example_inputs())
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/debug_utils.py", line 1055, in debug_wrapper
    compiled_gm = compiler_fn(gm, example_inputs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/__init__.py", line 1390, in __call__
    return compile_fx(model_, inputs_, config_patches=self.config)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 455, in compile_fx
    return aot_autograd(
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 48, in compiler_fn
    cg = aot_module_simplified(gm, example_inputs, **kwargs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2805, in aot_module_simplified
    compiled_fn = create_aot_dispatcher_function(
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2498, in create_aot_dispatcher_function
    compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1713, in aot_wrapper_dedupe
    return compiler_fn(flat_fn, leaf_flat_args, aot_config)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2133, in aot_dispatch_autograd
    compiled_fw_func = aot_config.fw_compiler(
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 430, in fw_compiler
    return inner_compile(
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/debug_utils.py", line 595, in debug_wrapper
    compiled_fn = compiler_fn(gm, example_inputs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_inductor/debug.py", line 239, in inner
    return fn(*args, **kwargs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 177, in compile_fx_inner
    compiled_fn = graph.compile_to_fn()
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_inductor/graph.py", line 586, in compile_to_fn
    return self.compile_to_module().call
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_inductor/graph.py", line 575, in compile_to_module
    mod = PyCodeCache.load(code)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 528, in load
    exec(code, mod.__dict__, mod.__dict__)
  File "/tmp/torchinductor_lk/b6/cb6stltbl4wnxngvc7tc77igbp6oojp2sqjc4gmrs5kkanhkqhdp.py", line 41, in <module>
    async_compile.wait(globals())
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 715, in wait
    scope[key] = result.result()
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 573, in result
    self.future.result()
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
RuntimeError: Internal Triton PTX codegen error: 
ptxas /tmp/compile-ptx-src-9e79cc, line 59; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 59; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 60; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 60; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 61; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 61; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 62; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 62; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 63; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 63; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 64; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 64; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 65; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 65; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 66; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 66; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 71; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 71; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 72; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 72; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 73; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 73; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 74; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 74; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 75; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 75; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 76; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 76; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 77; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 77; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 78; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 78; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas fatal   : Ptx assembly aborted due to errors


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/media/lk/lksgcc/lk_git/21_RenewablePower/WeatherForecast/graph_weather/train/deepspeed_graph_fabric.py", line 209, in <module>
    CLI(main)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/jsonargparse/cli.py", line 85, in CLI
    return _run_component(component, cfg_init)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/jsonargparse/cli.py", line 147, in _run_component
    return component(**cfg)
  File "/media/lk/lksgcc/lk_git/21_RenewablePower/WeatherForecast/graph_weather/train/deepspeed_graph_fabric.py", line 110, in main
    train(fabric, model, optimizer, train_data_loader, train_data_sampler)
  File "/media/lk/lksgcc/lk_git/21_RenewablePower/WeatherForecast/graph_weather/train/deepspeed_graph_fabric.py", line 143, in train
    train_step(batch, fabric, model, optimizer, step_count)
  File "/media/lk/lksgcc/lk_git/21_RenewablePower/WeatherForecast/graph_weather/train/deepspeed_graph_fabric.py", line 188, in train_step
    logits = model(x)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/lightning/fabric/wrappers.py", line 114, in forward
    output = self._forward_module(*args, **kwargs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1724, in forward
    loss = self.module(*inputs, **kwargs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 82, in forward
    return self.dynamo_ctx(self._orig_mod.forward)(*args, **kwargs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 209, in _fn
    return fn(*args, **kwargs)
  File "/media/lk/lksgcc/lk_git/21_RenewablePower/WeatherForecast/graph_weather/graph_weather/models/forecast.py", line 109, in forward
    x, edge_idx, edge_attr = self.encoder(features)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/media/lk/lksgcc/lk_git/21_RenewablePower/WeatherForecast/graph_weather/graph_weather/models/layers/encoder.py", line 164, in forward
    self.graph = self.graph.to(features.device)
  File "/media/lk/lksgcc/lk_git/21_RenewablePower/WeatherForecast/graph_weather/graph_weather/models/layers/encoder.py", line 165, in <graph break in forward>
    self.latent_graph = self.latent_graph.to(features.device)
  File "/media/lk/lksgcc/lk_git/21_RenewablePower/WeatherForecast/graph_weather/graph_weather/models/layers/encoder.py", line 167, in <graph break in forward>
    [features, einops.repeat(self.h3_nodes, "n f -> b n f", b=batch_size)], dim=1
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/einops-0.6.1-py3.10.egg/einops/einops.py", line 533, in repeat
    return reduce(tensor, pattern, reduction='repeat', **axes_lengths)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/einops-0.6.1-py3.10.egg/einops/einops.py", line 412, in reduce
    return _apply_recipe(recipe, tensor, reduction_type=reduction)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/einops-0.6.1-py3.10.egg/einops/einops.py", line 235, in _apply_recipe
    _reconstruct_from_shape(recipe, backend.shape(tensor))
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 337, in catch_errors
    return callback(frame, cache_size, hooks)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 404, in _convert_frame
    result = inner_convert(frame, cache_size, hooks)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 104, in _fn
    return fn(*args, **kwargs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 262, in _convert_frame_assert
    return _compile(
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 324, in _compile
    out_code = transform_code_object(code, transform)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 445, in transform_code_object
    transformations(instructions, code_options)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 311, in transform
    tracer.run()
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1726, in run
    super().run()
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 576, in run
    and self.step()
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 540, in step
    getattr(self, inst.opname)(inst)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1792, in RETURN_VALUE
    self.output.compile_subgraph(
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 517, in compile_subgraph
    self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 588, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 675, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: debug_wrapper raised RuntimeError: Internal Triton PTX codegen error: 
ptxas /tmp/compile-ptx-src-9e79cc, line 59; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 59; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 60; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 60; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 61; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 61; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 62; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 62; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 63; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 63; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 64; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 64; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 65; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 65; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 66; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 66; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 71; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 71; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 72; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 72; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 73; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 73; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 74; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 74; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 75; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 75; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 76; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 76; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 77; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 77; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 78; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-9e79cc, line 78; error   : Feature 'cvt with .bf16' requires .target sm_80 or higher
ptxas fatal   : Ptx assembly aborted due to errors


Set torch._dynamo.config.verbose=True for more information

However, in another computer with the same hardware. There is no such problem and the step 1/3 didn't appear.
[1/3] /usr/local/cuda-11.7/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-11.7/include -isystem /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/include -isystem /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/include/TH -isystem /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /home/lk/anaconda3/envs/weather/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 -c /home/lk/anaconda3/envs/weather/lib/python3.10/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
doesn't appear.
I have reinstalled the cuda-toolkit, but it didn't work.
Anyone know how to solve it? Thanks.
Ubuntu 18.04
Python 3.10
GPU RTX2080ti
CUDA 11.7
NVCC 2.14.3
lightning 2.0.2
deepspeed 0.9.2

What version are you seeing the problem on?

v2.0

How to reproduce the bug

No response

Error messages and logs

# Error messages and logs here please

Environment

Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):

More info

No response

@lk1983823 lk1983823 added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Jul 13, 2023
@sid-kap
Copy link

sid-kap commented Dec 23, 2023

@lk1983823 What was the solution? I'm getting the same error, when using pytorch.compile with lightning with a model that uses bfloat16:

RuntimeError: Internal Triton PTX codegen error:
ptxas /var/tmp/compile-ptx-src-ce8fce, line 77; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 77; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 78; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 78; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 79; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 79; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 80; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 80; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 81; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 81; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 82; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 82; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 83; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 83; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 84; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 84; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 114; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 114; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 115; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 115; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 116; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 116; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 117; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 117; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 118; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 118; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 119; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 119; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 120; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 120; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 121; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 121; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher

I'm running this on a T4. According to https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ SM80 is only supported on A100. So maybe bfloat16 doesn't work on T4s?

(I'm confused because the same model works in eager mode on the T4, but adding model = torch.compile(model) breaks it.)

@jingyeyang95
Copy link

@lk1983823 What was the solution? I'm getting the same error, when using pytorch.compile with lightning with a model that uses bfloat16:

RuntimeError: Internal Triton PTX codegen error:
ptxas /var/tmp/compile-ptx-src-ce8fce, line 77; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 77; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 78; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 78; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 79; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 79; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 80; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 80; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 81; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 81; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 82; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 82; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 83; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 83; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 84; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 84; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 114; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 114; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 115; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 115; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 116; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 116; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 117; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 117; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 118; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 118; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 119; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 119; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 120; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 120; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 121; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /var/tmp/compile-ptx-src-ce8fce, line 121; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher

I'm running this on a T4. According to https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ SM80 is only supported on A100. So maybe bfloat16 doesn't work on T4s?

(I'm confused because the same model works in eager mode on the T4, but adding model = torch.compile(model) breaks it.)

I also met the same error when I tried to run my model using Triton package. I am running it on V100. Any solutions so far? Thank you for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.0.x
Projects
None yet
Development

No branches or pull requests

3 participants