Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complile error when building extension 'deform_conv' #15

Closed
syfbme opened this issue Jul 7, 2021 · 11 comments
Closed

Complile error when building extension 'deform_conv' #15

syfbme opened this issue Jul 7, 2021 · 11 comments

Comments

@syfbme
Copy link

syfbme commented Jul 7, 2021

I have followed the "Installation" but encountering below error. Details are attached below. Any help will be grateful.
When running cmd: CUDA_HOME=/usr/local/cuda-10.2 BASICSR_EXT=True pip install basicsr
below error shows:
ERROR: Command errored out with exit status 1: command: /data/anaconda3/envs/pytorch18/bin/python3.8 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ekcq6eum/basicsr_4146e4815a4548d195157ceecc274ac0/setup.py'"'"'; __file__='"'"'/tmp/pip-install-ekcq6eum/basicsr_4146e4815a4548d195157ceecc274ac0/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-uoof9gcv cwd: /tmp/pip-install-ekcq6eum/basicsr_4146e4815a4548d195157ceecc274ac0/ Complete output (260 lines): Traceback (most recent call last): File "/tmp/pip-install-ekcq6eum/basicsr_4146e4815a4548d195157ceecc274ac0/basicsr/ops/dcn/deform_conv.py", line 10, in <module> from . import deform_conv_ext ImportError: cannot import name 'deform_conv_ext' from partially initialized module 'basicsr.ops.dcn' (most likely due to a circular import) (/tmp/pip-install-ekcq6eum/basicsr_4146e4815a4548d195157ceecc274ac0/basicsr/ops/dcn/__init__.py)

The error shows we can't import deform_conv_ext. I think it says that module deform_conv_ext has not been built successfully. But i don't know how to check it when using pip install. So i try to install basicsr by git clone and compile following this link. But when i ran command:
CUDA_HOME=/usr/local/cuda-10.2 BASICSR_EXT=True python setup.py develop
It failed when building 'basicsr.ops.dcn.deform_conv_ext' extension
image

I have googled it and found no useful information. Here is my enviroment:
gcc: 7.5.0
cuda: 10.2
image
torch.config.show()
'PyTorch built with:\n - GCC 7.3\n - C++ Version: 201402\n

@xinntao
Copy link
Member

xinntao commented Jul 7, 2021

I do not the exact reason.

  1. As we do not need to use dcn. You may remove the dcn compile part when run CUDA_HOME=/usr/local/cuda-10.2 BASICSR_EXT=True python setup.py develop.

Comment the following part:
image

  1. You can also use BASICSR_JIT=True.
  1. uninstall all the basicsr
  2. just run pip install basicsr
  3. Test with env BASICSR_JIT=True, which will compile the necessary package just in time.

@syfbme
Copy link
Author

syfbme commented Jul 8, 2021

Hi @xinntao
Thanks for your quick reply. I have tried both 2 ways you suggested.
For the 1st way, there is still errors complaining building 'fused_act_ext' error...
For the 2nd ways, the installation is okay since it doesn't build extensions. However, when testing with BASICSR_JIT=True, it shows "No module named deform_env" which i think was caused by the just in time build failed...
I don't understand that you mentioned "As we do not need to use dcn". But the script result shows we need to import deform_con_ext which is from basicsr.ops.dcn...
image

@tachikoma777
Copy link

BASICSR_JIT=True python inference_gfpgan_full.py --model_path experiments/pretrained_models/GFPGANv1.pth --test_path inputs/whole_imgs
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/basicsr/ops/dcn/deform_conv.py", line 10, in
from . import deform_conv_ext
ImportError: cannot import name 'deform_conv_ext' from 'basicsr.ops.dcn' (/opt/conda/lib/python3.7/site-packages/basicsr/ops/dcn/init.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1673, in _run_ninja_build
env=env)
File "/opt/conda/lib/python3.7/subprocess.py", line 487, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

I got similar error...

@syfbme
Copy link
Author

syfbme commented Jul 8, 2021

Hi @tachikoma777
What is your pytorch built gcc version and current gcc version.
you can use this cammand 'torch.config.show()' in python to see pytorch built gcc version.

@tachikoma777
Copy link

@syfbme

torch.config.show()
'PyTorch built with:\n - GCC 7.3\n - C++ Version: 201402\n - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 10.2\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70\n - CuDNN 7.6.5\n - Magma 2.5.2\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON,USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, \n'

Did you fix this?

@syfbme
Copy link
Author

syfbme commented Jul 8, 2021

Hi @tachikoma777
What is your current gcc version. You can see it by command "gcc -v"
No. I haven't fixed it...I am trying...

@xinntao
Copy link
Member

xinntao commented Jul 8, 2021

Hi @xinntao
Thanks for your quick reply. I have tried both 2 ways you suggested.
For the 1st way, there is still errors complaining building 'fused_act_ext' error...
For the 2nd ways, the installation is okay since it doesn't build extensions. However, when testing with BASICSR_JIT=True, it shows "No module named deform_env" which i think was caused by the just in time build failed...
I don't understand that you mentioned "As we do not need to use dcn". But the script result shows we need to import deform_con_ext which is from basicsr.ops.dcn...
image

  1. For I don't understand that you mentioned ...: BasicSR will import dcn. But I think if the dcn complication has error, so does fused_act_ext.
  2. if you set BASICSR_JIT=True, then the screen will print a lot of information, you may attach it?

@xinntao
Copy link
Member

xinntao commented Jul 8, 2021

@tachikoma777
Could you show the output of ls /opt/conda/lib/python3.7/site-packages/basicsr/ops/dcn/

@xinntao
Copy link
Member

xinntao commented Jul 12, 2021

@syfbme Hi,
have you solved this issue?

@syfbme
Copy link
Author

syfbme commented Jul 12, 2021

@syfbme Hi,
have you solved this issue?

Yes. By reinstall the os... I lost the original environment so i can't reproduce the issue. Below are my guess:
I updated gcc to version 9.3 for some reason. Then i found the pytorch gcc version is 7.3.0. I installed gcc through apt but it can only get v7.5.0 and there is still gcc compatibility issue. So i compile and install gcc 7.3.0 and the issue still exists. Now there should not be gcc version issue... I don't know why and maybe the gcc is messed up. So i reinstall the whole operation system and this time there is no issue.
Sorry not being able to help...

@syfbme
Copy link
Author

syfbme commented Jul 12, 2021

Close it since i can't reproduce the issue

@syfbme syfbme closed this as completed Jul 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants