VMamba Environment Setup and Troubleshooting Manual
Below is the VMamba Environment Setup and Troubleshooting Manual (Directly Reusable Version) that I have organized for you based on the actual issues encountered this time.
The structure is consistent:
Symptom → Cause → Solution → How to Quickly Identify in the Future
You can refer directly to this guide if you encounter similar problems again.
1. Final Stable Environment
The stable combination that ultimately worked this time is:
Python 3.10
PyTorch 2.1.0
CUDA 11.8
gcc/g++ 11
mmengine 0.10.1
mmcv 2.1.0
mmsegmentation 1.2.2
mmdet 3.3.0
mmpretrain 1.2.0
numpy 1.26.4
opencv-python-headless 4.10.0
This combination is stable because:
- PyTorch officially supports
2.1.0 + CUDA 11.8.
- MMCV installation depends on wheels that precisely match the current PyTorch/CUDA version; using versions that are too new often falls back to source builds.
- NumPy 2.x is incompatible with many packages compiled against the NumPy 1.x ABI. Downgrading to
numpy<2 is the standard fix.
2. Problem 1: pip install . Reports No module named 'torch'
Symptom
When executing in the kernels/selective_scan directory:
The error occurs:
ModuleNotFoundError: No module named 'torch'
This appeared clearly in your logs.
Cause
pip install . enables build isolation by default.
This temporary build environment does not have torch from your current conda environment, but selective_scan needs to import torch during the build process, resulting in an immediate error.
Solution
Use instead:
pip install . --no-build-isolation
How to Quickly Identify in the Future
If both of the following are true:
python -c "import torch" succeeds in the current environment
- But
pip install . fails with "No module named 'torch'"
It's almost certainly caused by build isolation.
3. Problem 2: CUDA Compilation Reports unsupported GNU version
Symptom
When compiling selective_scan, nvcc reports:
#error -- unsupported GNU version! gcc versions later than 11 are not supported!
This appeared clearly in the logs.
Cause
You are using CUDA 11.8.
CUDA 11.8 has restrictions on the GCC version; nvcc will refuse to compile if GCC is too new.
This is not an issue with VMamba code or Python packages, but a CUDA toolchain compatibility issue.
Solution
Install and switch to gcc-11 / g++-11:
sudo apt update
sudo apt install gcc-11 g++-11 -y
export CC=gcc-11
export CXX=g++-11
Then recompile:
cd ~/projects/VMamba-main/kernels/selective_scan
pip install . --no-build-isolation
This is exactly how you succeeded later.
How to Quickly Identify in the Future
Whenever you see:
nvcc
unsupported GNU version
gcc versions later than 11 are not supported
Immediately think: Switch to gcc-11.
4. Problem 3: Compilation Succeeds but Import Fails with libc10.so Not Found
Symptom
selective_scan installed successfully, but import fails with:
ImportError: libc10.so: cannot open shared object file
This also appeared in your logs.
Cause
This is runtime dynamic linking failure, not compilation failure.
selective_scan_cuda_oflex.so depends on PyTorch shared libraries during import, such as:
libc10.so
libtorch_cpu.so
These libraries are typically located at:
.../site-packages/torch/lib
And not necessarily in:
Therefore, adding only $CONDA_PREFIX/lib is insufficient.
Solution
First locate the actual torch lib directory:
python -c "import torch, os; print(os.path.join(os.path.dirname(torch.__file__), 'lib'))"
Then add it to LD_LIBRARY_PATH:
export TORCH_LIB=$(python -c "import torch, os; print(os.path.join(os.path.dirname(torch.__file__), 'lib'))")
export LD_LIBRARY_PATH=$TORCH_LIB:$LD_LIBRARY_PATH
Test again:
python -c "import torch; import selective_scan_cuda_oflex; print('OK')"
You have already successfully output OK later.
How to Quickly Identify in the Future
If:
- The extension
.so compiled successfully
- But import fails with
libc10.so / libtorch_cpu.so not found
First check the torch/lib path, rather than reinstalling torch.
5. Problem 4: import torch Fails with _OutOfMemoryError Attribute Error
Symptom
When verifying torch in a new environment:
AttributeError: module 'torch._C' has no attribute '_OutOfMemoryError'
Cause
This indicates that the PyTorch installation itself is corrupted, or there is a conflict between the Python layer and the underlying binary layer.
It is not an issue with mmengine/mmcv, but rather torch itself is broken.
Solution
Do not continue installing mm packages.
Rebuild the environment directly and reinstall PyTorch using an officially supported version combination.
One stable combination officially provided by PyTorch is:
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
This is explicitly listed on the official historical versions page.
How to Quickly Identify in the Future
If even:
fails, and the error occurs near torch._C, fix torch first before installing anything else.
6. Problem 5: libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent
Symptom
When verifying torch again:
ImportError: .../libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent
Cause
This is a typical version conflict between PyTorch and MKL / intel-openmp.
PyTorch official issues clearly state that newer MKL versions can trigger this missing symbol problem.
Solution
Downgrade MKL and intel-openmp in the current conda environment:
conda install "mkl<2024.1" "intel-openmp<2024.1" -c conda-forge -y
Then re-verify:
python -c "import torch; print(torch.__version__); print(torch.cuda.is_available()); print(torch.version.cuda)"
You later successfully obtained:
How to Quickly Identify in the Future
If you see:
undefined symbol: iJIT_NotifyEvent
First check:
Rather than reinstalling CUDA right away.
7. Problem 6: mim install mmcv Falls Back to Source Build
Symptom
When you installed mmcv, mim did not fetch a wheel but downloaded:
And then started a source build, reporting various build issues.
Cause
The official MMCV installation mechanism matches wheels based on:
- CUDA version
- PyTorch version
- MMCV version
If the current combination is too new and no matching wheel exists, it falls back to a source build. Source builds are more prone to failure.
Solution
Avoid compiling from source; instead, switch to a more stable combination, such as:
torch 2.1.0 + cu118
mmcv 2.1.0
This is also the combination that eventually worked for you.
How to Quickly Identify in the Future
If mim install mmcv downloads a .tar.gz instead of a .whl, be cautious.
This usually indicates that the current version combination is not ideal.
8. Problem 7: NumPy 2.x ABI Incompatibility with Current Binary Packages
Symptom
When importing mmcv or torch, a message similar to the following appears:
A module that was compiled using NumPy 1.x cannot be run in NumPy 2.x
But the mm package versions were printed successfully.
Cause
Many current binary packages are still compiled against the NumPy 1.x ABI.
If the environment has NumPy 2.x, compatibility issues may arise. This is a known issue in the PyTorch community.
Solution
Pin NumPy to 1.26.4:
python -m pip install numpy==1.26.4
How to Quickly Identify in the Future
Whenever you see:
compiled using NumPy 1.x
- And the current
numpy.__version__ is 2.x
Prioritize downgrading to:
9. Problem 8: OpenCV Requires numpy>=2 Conversely
Symptom
After downgrading NumPy to 1.26.4, you saw:
opencv-python 4.13.0.92 requires numpy>=2
opencv-python-headless 4.13.0.92 requires numpy>=2
Cause
The current OpenCV version is too new and requires NumPy 2.x.
This conflicts with the newly downgraded numpy 1.26.4.
Solution
Uninstall conflicting versions, keep only headless, and pin to a compatible version:
python -m pip uninstall -y opencv-python opencv-python-headless
python -m pip install --no-cache-dir opencv-python-headless==4.10.0.84
You later verified successfully:
How to Quickly Identify in the Future
In server/WSL environments, prioritize installing only:
Do not install both:
opencv-python
opencv-python-headless
10. Final Correct Installation Order
For future VMamba environment setup from scratch, the recommended order is as follows.
1) Create Environment
conda create -n vmamba_seg python=3.10 -y
conda activate vmamba_seg
2) Install PyTorch
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia -y
conda install "mkl<2024.1" "intel-openmp<2024.1" -c conda-forge -y
3) Verify torch
python -c "import torch; print(torch.__version__); print(torch.cuda.is_available()); print(torch.version.cuda)"
4) Install OpenMMLab Dependencies
python -m pip install --upgrade pip
python -m pip install setuptools==81.0.0 wheel
python -m pip install -U openmim
mim install mmengine==0.10.1
mim install mmcv==2.1.0
python -m pip install mmsegmentation==1.2.2 mmdet==3.3.0 mmpretrain==1.2.0
5) Fix NumPy/OpenCV
python -m pip install numpy==1.26.4
python -m pip uninstall -y opencv-python opencv-python-headless
python -m pip install --no-cache-dir opencv-python-headless==4.10.0.84
6) Verify mm Packages
python -c "import mmengine, mmcv, mmseg, mmdet, mmpretrain; print(mmengine.__version__, mmcv.__version__, mmseg.__version__, mmdet.__version__, mmpretrain.__version__)"
7) Compile selective_scan
export CC=gcc-11
export CXX=g++-11
cd ~/projects/VMamba-main/kernels/selective_scan
pip install . --no-build-isolation
8) Verify selective_scan
python -c "import torch; import selective_scan_cuda_oflex; print('selective_scan ok')"
If libc10.so is missing here, supplement with:
export TORCH_LIB=$(python -c "import torch, os; print(os.path.join(os.path.dirname(torch.__file__), 'lib'))")
export LD_LIBRARY_PATH=$TORCH_LIB:$LD_LIBRARY_PATH
python -c "import torch; import selective_scan_cuda_oflex; print('selective_scan ok')"
11. General Troubleshooting Order for Future Issues
No matter what environmental problem you encounter in the future, diagnose it in these three layers first:
Layer 1: PyTorch Core Layer
First check:
python -c "import torch; print(torch.__version__); print(torch.cuda.is_available()); print(torch.version.cuda)"
If this step fails, do not proceed further.
Layer 2: Binary Compatibility Layer
Check:
- Whether NumPy is 2.x
- Whether OpenCV requires
numpy>=2
- Whether
libc10.so is missing
Layer 3: Custom Extension Layer
Check:
- Whether
--no-build-isolation was used
- Whether gcc is version 11
- Whether
selective_scan can actually be imported
12. One-Sentence Summary
All the issues you encountered this time boil down to three categories:
-
Build Issues
e.g., missing --no-build-isolation, wrong gcc version
-
Dynamic Linking Issues
e.g., libc10.so not found
-
Version Compatibility Issues
e.g., conflicts between PyTorch + MKL, NumPy 2.x, OpenCV, and MMCV combinations
As long as you categorize problems into these three types in the future, you won't get lost.
VMamba Environment Setup and Troubleshooting Manual
Below is the VMamba Environment Setup and Troubleshooting Manual (Directly Reusable Version) that I have organized for you based on the actual issues encountered this time.
The structure is consistent:
Symptom → Cause → Solution → How to Quickly Identify in the Future
You can refer directly to this guide if you encounter similar problems again.
1. Final Stable Environment
The stable combination that ultimately worked this time is:
This combination is stable because:
2.1.0 + CUDA 11.8.numpy<2is the standard fix.2. Problem 1:
pip install .ReportsNo module named 'torch'Symptom
When executing in the
kernels/selective_scandirectory:pip install .The error occurs:
This appeared clearly in your logs.
Cause
pip install .enables build isolation by default.This temporary build environment does not have
torchfrom your current conda environment, butselective_scanneeds to import torch during the build process, resulting in an immediate error.Solution
Use instead:
pip install . --no-build-isolationHow to Quickly Identify in the Future
If both of the following are true:
python -c "import torch"succeeds in the current environmentpip install .fails with "No module named 'torch'"It's almost certainly caused by build isolation.
3. Problem 2: CUDA Compilation Reports
unsupported GNU versionSymptom
When compiling
selective_scan,nvccreports:This appeared clearly in the logs.
Cause
You are using CUDA 11.8.
CUDA 11.8 has restrictions on the GCC version; nvcc will refuse to compile if GCC is too new.
This is not an issue with VMamba code or Python packages, but a CUDA toolchain compatibility issue.
Solution
Install and switch to gcc-11 / g++-11:
Then recompile:
This is exactly how you succeeded later.
How to Quickly Identify in the Future
Whenever you see:
nvccunsupported GNU versiongcc versions later than 11 are not supportedImmediately think: Switch to gcc-11.
4. Problem 3: Compilation Succeeds but Import Fails with
libc10.soNot FoundSymptom
selective_scaninstalled successfully, but import fails with:This also appeared in your logs.
Cause
This is runtime dynamic linking failure, not compilation failure.
selective_scan_cuda_oflex.sodepends on PyTorch shared libraries during import, such as:libc10.solibtorch_cpu.soThese libraries are typically located at:
And not necessarily in:
$CONDA_PREFIX/libTherefore, adding only
$CONDA_PREFIX/libis insufficient.Solution
First locate the actual torch lib directory:
python -c "import torch, os; print(os.path.join(os.path.dirname(torch.__file__), 'lib'))"Then add it to
LD_LIBRARY_PATH:Test again:
python -c "import torch; import selective_scan_cuda_oflex; print('OK')"You have already successfully output
OKlater.How to Quickly Identify in the Future
If:
.socompiled successfullylibc10.so/libtorch_cpu.sonot foundFirst check the
torch/libpath, rather than reinstalling torch.5. Problem 4:
import torchFails with_OutOfMemoryErrorAttribute ErrorSymptom
When verifying torch in a new environment:
Cause
This indicates that the PyTorch installation itself is corrupted, or there is a conflict between the Python layer and the underlying binary layer.
It is not an issue with mmengine/mmcv, but rather torch itself is broken.
Solution
Do not continue installing mm packages.
Rebuild the environment directly and reinstall PyTorch using an officially supported version combination.
One stable combination officially provided by PyTorch is:
This is explicitly listed on the official historical versions page.
How to Quickly Identify in the Future
If even:
python -c "import torch"fails, and the error occurs near
torch._C, fix torch first before installing anything else.6. Problem 5:
libtorch_cpu.so: undefined symbol: iJIT_NotifyEventSymptom
When verifying torch again:
Cause
This is a typical version conflict between PyTorch and MKL / intel-openmp.
PyTorch official issues clearly state that newer MKL versions can trigger this missing symbol problem.
Solution
Downgrade MKL and intel-openmp in the current conda environment:
Then re-verify:
python -c "import torch; print(torch.__version__); print(torch.cuda.is_available()); print(torch.version.cuda)"You later successfully obtained:
How to Quickly Identify in the Future
If you see:
First check:
mklintel-openmpRather than reinstalling CUDA right away.
7. Problem 6:
mim install mmcvFalls Back to Source BuildSymptom
When you installed
mmcv,mimdid not fetch a wheel but downloaded:And then started a source build, reporting various build issues.
Cause
The official MMCV installation mechanism matches wheels based on:
If the current combination is too new and no matching wheel exists, it falls back to a source build. Source builds are more prone to failure.
Solution
Avoid compiling from source; instead, switch to a more stable combination, such as:
This is also the combination that eventually worked for you.
How to Quickly Identify in the Future
If
mim install mmcvdownloads a.tar.gzinstead of a.whl, be cautious.This usually indicates that the current version combination is not ideal.
8. Problem 7: NumPy 2.x ABI Incompatibility with Current Binary Packages
Symptom
When importing
mmcvortorch, a message similar to the following appears:But the mm package versions were printed successfully.
Cause
Many current binary packages are still compiled against the NumPy 1.x ABI.
If the environment has NumPy 2.x, compatibility issues may arise. This is a known issue in the PyTorch community.
Solution
Pin NumPy to 1.26.4:
How to Quickly Identify in the Future
Whenever you see:
compiled using NumPy 1.xnumpy.__version__is 2.xPrioritize downgrading to:
9. Problem 8: OpenCV Requires
numpy>=2ConverselySymptom
After downgrading NumPy to 1.26.4, you saw:
Cause
The current OpenCV version is too new and requires NumPy 2.x.
This conflicts with the newly downgraded
numpy 1.26.4.Solution
Uninstall conflicting versions, keep only headless, and pin to a compatible version:
You later verified successfully:
How to Quickly Identify in the Future
In server/WSL environments, prioritize installing only:
Do not install both:
opencv-pythonopencv-python-headless10. Final Correct Installation Order
For future VMamba environment setup from scratch, the recommended order is as follows.
1) Create Environment
2) Install PyTorch
3) Verify torch
python -c "import torch; print(torch.__version__); print(torch.cuda.is_available()); print(torch.version.cuda)"4) Install OpenMMLab Dependencies
5) Fix NumPy/OpenCV
6) Verify mm Packages
python -c "import mmengine, mmcv, mmseg, mmdet, mmpretrain; print(mmengine.__version__, mmcv.__version__, mmseg.__version__, mmdet.__version__, mmpretrain.__version__)"7) Compile selective_scan
8) Verify selective_scan
python -c "import torch; import selective_scan_cuda_oflex; print('selective_scan ok')"If
libc10.sois missing here, supplement with:11. General Troubleshooting Order for Future Issues
No matter what environmental problem you encounter in the future, diagnose it in these three layers first:
Layer 1: PyTorch Core Layer
First check:
python -c "import torch; print(torch.__version__); print(torch.cuda.is_available()); print(torch.version.cuda)"If this step fails, do not proceed further.
Layer 2: Binary Compatibility Layer
Check:
numpy>=2libc10.sois missingLayer 3: Custom Extension Layer
Check:
--no-build-isolationwas usedselective_scancan actually be imported12. One-Sentence Summary
All the issues you encountered this time boil down to three categories:
Build Issues
e.g., missing
--no-build-isolation, wrong gcc versionDynamic Linking Issues
e.g.,
libc10.sonot foundVersion Compatibility Issues
e.g., conflicts between PyTorch + MKL, NumPy 2.x, OpenCV, and MMCV combinations
As long as you categorize problems into these three types in the future, you won't get lost.