Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError undefined symbol of fused_layer_norm_cuda #1533

Open
JamesZhutheThird opened this issue Nov 5, 2022 · 3 comments
Open

ImportError undefined symbol of fused_layer_norm_cuda #1533

JamesZhutheThird opened this issue Nov 5, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@JamesZhutheThird
Copy link

Describe the Bug

Minimal Steps/Code to Reproduce the Bug

I've followed the installation instruction in the README, but an ImportError occurs when I import fused_layer_norm_cuda. I think the problem may be caused by version conflicts between CUDA, torch, and GCC; however, I don't find any specific version dependency. 😵‍💫

gcc --version
python -c "import torch; print(torch.__version__); print(torch.version.cuda); import fused_layer_norm_cuda"
gcc (GCC) 9.3.0                                                                                                                                                                                                                                                                                                         
Copyright (C) 2019 Free Software Foundation, Inc.                                                                                                                                                                                                                                                                       
This is free software; see the source for copying conditions.  There is NO                                                                                                                                                                                                                                              
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                                                  
1.11.0+cu113                                                                                                                                                                                                                                                                                                            
11.3            
                                                                                                                                                                                                                                                                                                        
Traceback (most recent call last):                                                                                                                                                                                                                                                                                      
  File "<string>", line 1, in <module>                                                                                                                                                                                                                                                                                  
ImportError: /mnt/lustre/sjtu/home/zcz72/anaconda3/envs/OFA3.9New/lib/python3.9/site-packages/fused_layer_norm_cuda.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZNSt18basic_stringstreamIcSt11char_traitsIcESaIcEEC1Ev

Expected Behavior

Environment

PyTorch version: 1.11.0+cu113                                                                                                                                                                                                                                                                                           
Is debug build: False                                                                                                                                                                                                                                                                                                   
CUDA used to build PyTorch: 11.3                                                                                                                                                                                                                                                                                        
ROCM used to build PyTorch: N/A                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                                                        
OS: CentOS Linux 7 (Core) (x86_64)                                                                                                                                                                                                                                                                                      
GCC version: (GCC) 9.3.0                                                                                                                                                                                                                                                                                                
Clang version: Could not collect                                                                                                                                                                                                                                                                                        
CMake version: Could not collect                                                                                                                                                                                                                                                                                        
Libc version: glibc-2.17                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                        
Python version: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21)  [GCC 10.3.0] (64-bit runtime)                                                                                                                                                                                                         
Python platform: Linux-3.10.0-693.el7.x86_64-x86_64-with-glibc2.17                                                                                                                                                                                                                                                      
Is CUDA available: True                                                                                                                                                                                                                                                                                                 
CUDA runtime version: 11.3.58                                                                                                                                                                                                                                                                                           
GPU models and configuration: GPU 0: Tesla V100-PCIE-32GB                                                                                                                                                                                                                                                               
Nvidia driver version: 460.73.01                                                                                                                                                                                                                                                                                        
cuDNN version: Could not collect                                                                                                                                                                                                                                                                                        
HIP runtime version: N/A                                                                                                                                                                                                                                                                                                
MIOpen runtime version: N/A                                                                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                                                        
Versions of relevant libraries:                                                                                                                                                                                                                                                                                         
[pip3] numpy==1.23.1                                                                                                                                                                                                                                                                                                    
[pip3] pytorch-lightning==1.0.8                                                                                                                                                                                                                                                                                         
[pip3] torch==1.11.0+cu113                                                                                                                                                                                                                                                                                              
[pip3] torchmetrics==0.9.3                                                                                                                                                                                                                                                                                              
[pip3] torchvision==0.12.0+cu113                                                                                                                                                                                                                                                                                        
[conda] numpy                     1.23.1           py39hba7629e_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge                                                                                                                                                                                    
[conda] pytorch-lightning         1.0.8                    pypi_0    pypi                                                                                                                                                                                                                                               
[conda] torch                     1.11.0+cu113             pypi_0    pypi                                                                                                                                                                                                                                               
[conda] torchmetrics              0.9.3                    pypi_0    pypi                                                                                                                                                                                                                                               
[conda] torchvision               0.12.0+cu113             pypi_0    pypi
@JamesZhutheThird JamesZhutheThird added the bug Something isn't working label Nov 5, 2022
@avivbrokman
Copy link

I'm having this issue too.

@XiaohanZhangCMU
Copy link

Has anyone found a solution/workaround to this bug?

@avivbrokman
Copy link

@XiaohanZhangCMU I got the bug when I was using a Singularity (now Apptainer) container with CUDA 11.7 and pytorch compiled for CUDA 11.7 while using my University's cluster that has CUDA 11.1. I was able to fix the issue by making a container that had CUDA 11.1 on it instead. That's obviously not a great solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants