Skip to content

VMamba Environment Setup and Troubleshooting Guide Created: April 2026 #397

@jiangkaiqi2005

Description

@jiangkaiqi2005

VMamba Environment Setup and Troubleshooting Manual

Below is the VMamba Environment Setup and Troubleshooting Manual (Directly Reusable Version) that I have organized for you based on the actual issues encountered this time.
The structure is consistent:

Symptom → Cause → Solution → How to Quickly Identify in the Future

You can refer directly to this guide if you encounter similar problems again.


1. Final Stable Environment

The stable combination that ultimately worked this time is:

Python 3.10
PyTorch 2.1.0
CUDA 11.8
gcc/g++ 11
mmengine 0.10.1
mmcv 2.1.0
mmsegmentation 1.2.2
mmdet 3.3.0
mmpretrain 1.2.0
numpy 1.26.4
opencv-python-headless 4.10.0

This combination is stable because:

  • PyTorch officially supports 2.1.0 + CUDA 11.8.
  • MMCV installation depends on wheels that precisely match the current PyTorch/CUDA version; using versions that are too new often falls back to source builds.
  • NumPy 2.x is incompatible with many packages compiled against the NumPy 1.x ABI. Downgrading to numpy<2 is the standard fix.

2. Problem 1: pip install . Reports No module named 'torch'

Symptom

When executing in the kernels/selective_scan directory:

pip install .

The error occurs:

ModuleNotFoundError: No module named 'torch'

This appeared clearly in your logs.

Cause

pip install . enables build isolation by default.
This temporary build environment does not have torch from your current conda environment, but selective_scan needs to import torch during the build process, resulting in an immediate error.

Solution

Use instead:

pip install . --no-build-isolation

How to Quickly Identify in the Future

If both of the following are true:

  • python -c "import torch" succeeds in the current environment
  • But pip install . fails with "No module named 'torch'"

It's almost certainly caused by build isolation.


3. Problem 2: CUDA Compilation Reports unsupported GNU version

Symptom

When compiling selective_scan, nvcc reports:

#error -- unsupported GNU version! gcc versions later than 11 are not supported!

This appeared clearly in the logs.

Cause

You are using CUDA 11.8.
CUDA 11.8 has restrictions on the GCC version; nvcc will refuse to compile if GCC is too new.

This is not an issue with VMamba code or Python packages, but a CUDA toolchain compatibility issue.

Solution

Install and switch to gcc-11 / g++-11:

sudo apt update
sudo apt install gcc-11 g++-11 -y

export CC=gcc-11
export CXX=g++-11

Then recompile:

cd ~/projects/VMamba-main/kernels/selective_scan
pip install . --no-build-isolation

This is exactly how you succeeded later.

How to Quickly Identify in the Future

Whenever you see:

  • nvcc
  • unsupported GNU version
  • gcc versions later than 11 are not supported

Immediately think: Switch to gcc-11.


4. Problem 3: Compilation Succeeds but Import Fails with libc10.so Not Found

Symptom

selective_scan installed successfully, but import fails with:

ImportError: libc10.so: cannot open shared object file

This also appeared in your logs.

Cause

This is runtime dynamic linking failure, not compilation failure.

selective_scan_cuda_oflex.so depends on PyTorch shared libraries during import, such as:

  • libc10.so
  • libtorch_cpu.so

These libraries are typically located at:

.../site-packages/torch/lib

And not necessarily in:

$CONDA_PREFIX/lib

Therefore, adding only $CONDA_PREFIX/lib is insufficient.

Solution

First locate the actual torch lib directory:

python -c "import torch, os; print(os.path.join(os.path.dirname(torch.__file__), 'lib'))"

Then add it to LD_LIBRARY_PATH:

export TORCH_LIB=$(python -c "import torch, os; print(os.path.join(os.path.dirname(torch.__file__), 'lib'))")
export LD_LIBRARY_PATH=$TORCH_LIB:$LD_LIBRARY_PATH

Test again:

python -c "import torch; import selective_scan_cuda_oflex; print('OK')"

You have already successfully output OK later.

How to Quickly Identify in the Future

If:

  • The extension .so compiled successfully
  • But import fails with libc10.so / libtorch_cpu.so not found

First check the torch/lib path, rather than reinstalling torch.


5. Problem 4: import torch Fails with _OutOfMemoryError Attribute Error

Symptom

When verifying torch in a new environment:

AttributeError: module 'torch._C' has no attribute '_OutOfMemoryError'

Cause

This indicates that the PyTorch installation itself is corrupted, or there is a conflict between the Python layer and the underlying binary layer.

It is not an issue with mmengine/mmcv, but rather torch itself is broken.

Solution

Do not continue installing mm packages.
Rebuild the environment directly and reinstall PyTorch using an officially supported version combination.

One stable combination officially provided by PyTorch is:

conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia

This is explicitly listed on the official historical versions page.

How to Quickly Identify in the Future

If even:

python -c "import torch"

fails, and the error occurs near torch._C, fix torch first before installing anything else.


6. Problem 5: libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent

Symptom

When verifying torch again:

ImportError: .../libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent

Cause

This is a typical version conflict between PyTorch and MKL / intel-openmp.
PyTorch official issues clearly state that newer MKL versions can trigger this missing symbol problem.

Solution

Downgrade MKL and intel-openmp in the current conda environment:

conda install "mkl<2024.1" "intel-openmp<2024.1" -c conda-forge -y

Then re-verify:

python -c "import torch; print(torch.__version__); print(torch.cuda.is_available()); print(torch.version.cuda)"

You later successfully obtained:

2.1.0
True
11.8

How to Quickly Identify in the Future

If you see:

undefined symbol: iJIT_NotifyEvent

First check:

  • mkl
  • intel-openmp

Rather than reinstalling CUDA right away.


7. Problem 6: mim install mmcv Falls Back to Source Build

Symptom

When you installed mmcv, mim did not fetch a wheel but downloaded:

mmcv-2.2.0.tar.gz

And then started a source build, reporting various build issues.

Cause

The official MMCV installation mechanism matches wheels based on:

  • CUDA version
  • PyTorch version
  • MMCV version

If the current combination is too new and no matching wheel exists, it falls back to a source build. Source builds are more prone to failure.

Solution

Avoid compiling from source; instead, switch to a more stable combination, such as:

torch 2.1.0 + cu118
mmcv 2.1.0

This is also the combination that eventually worked for you.

How to Quickly Identify in the Future

If mim install mmcv downloads a .tar.gz instead of a .whl, be cautious.
This usually indicates that the current version combination is not ideal.


8. Problem 7: NumPy 2.x ABI Incompatibility with Current Binary Packages

Symptom

When importing mmcv or torch, a message similar to the following appears:

A module that was compiled using NumPy 1.x cannot be run in NumPy 2.x

But the mm package versions were printed successfully.

Cause

Many current binary packages are still compiled against the NumPy 1.x ABI.
If the environment has NumPy 2.x, compatibility issues may arise. This is a known issue in the PyTorch community.

Solution

Pin NumPy to 1.26.4:

python -m pip install numpy==1.26.4

How to Quickly Identify in the Future

Whenever you see:

  • compiled using NumPy 1.x
  • And the current numpy.__version__ is 2.x

Prioritize downgrading to:

numpy==1.26.4

9. Problem 8: OpenCV Requires numpy>=2 Conversely

Symptom

After downgrading NumPy to 1.26.4, you saw:

opencv-python 4.13.0.92 requires numpy>=2
opencv-python-headless 4.13.0.92 requires numpy>=2

Cause

The current OpenCV version is too new and requires NumPy 2.x.
This conflicts with the newly downgraded numpy 1.26.4.

Solution

Uninstall conflicting versions, keep only headless, and pin to a compatible version:

python -m pip uninstall -y opencv-python opencv-python-headless
python -m pip install --no-cache-dir opencv-python-headless==4.10.0.84

You later verified successfully:

numpy 1.26.4
cv2 4.10.0

How to Quickly Identify in the Future

In server/WSL environments, prioritize installing only:

opencv-python-headless

Do not install both:

  • opencv-python
  • opencv-python-headless

10. Final Correct Installation Order

For future VMamba environment setup from scratch, the recommended order is as follows.

1) Create Environment

conda create -n vmamba_seg python=3.10 -y
conda activate vmamba_seg

2) Install PyTorch

conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia -y
conda install "mkl<2024.1" "intel-openmp<2024.1" -c conda-forge -y

3) Verify torch

python -c "import torch; print(torch.__version__); print(torch.cuda.is_available()); print(torch.version.cuda)"

4) Install OpenMMLab Dependencies

python -m pip install --upgrade pip
python -m pip install setuptools==81.0.0 wheel
python -m pip install -U openmim

mim install mmengine==0.10.1
mim install mmcv==2.1.0
python -m pip install mmsegmentation==1.2.2 mmdet==3.3.0 mmpretrain==1.2.0

5) Fix NumPy/OpenCV

python -m pip install numpy==1.26.4
python -m pip uninstall -y opencv-python opencv-python-headless
python -m pip install --no-cache-dir opencv-python-headless==4.10.0.84

6) Verify mm Packages

python -c "import mmengine, mmcv, mmseg, mmdet, mmpretrain; print(mmengine.__version__, mmcv.__version__, mmseg.__version__, mmdet.__version__, mmpretrain.__version__)"

7) Compile selective_scan

export CC=gcc-11
export CXX=g++-11

cd ~/projects/VMamba-main/kernels/selective_scan
pip install . --no-build-isolation

8) Verify selective_scan

python -c "import torch; import selective_scan_cuda_oflex; print('selective_scan ok')"

If libc10.so is missing here, supplement with:

export TORCH_LIB=$(python -c "import torch, os; print(os.path.join(os.path.dirname(torch.__file__), 'lib'))")
export LD_LIBRARY_PATH=$TORCH_LIB:$LD_LIBRARY_PATH
python -c "import torch; import selective_scan_cuda_oflex; print('selective_scan ok')"

11. General Troubleshooting Order for Future Issues

No matter what environmental problem you encounter in the future, diagnose it in these three layers first:

Layer 1: PyTorch Core Layer

First check:

python -c "import torch; print(torch.__version__); print(torch.cuda.is_available()); print(torch.version.cuda)"

If this step fails, do not proceed further.

Layer 2: Binary Compatibility Layer

Check:

  • Whether NumPy is 2.x
  • Whether OpenCV requires numpy>=2
  • Whether libc10.so is missing

Layer 3: Custom Extension Layer

Check:

  • Whether --no-build-isolation was used
  • Whether gcc is version 11
  • Whether selective_scan can actually be imported

12. One-Sentence Summary

All the issues you encountered this time boil down to three categories:

  1. Build Issues
    e.g., missing --no-build-isolation, wrong gcc version

  2. Dynamic Linking Issues
    e.g., libc10.so not found

  3. Version Compatibility Issues
    e.g., conflicts between PyTorch + MKL, NumPy 2.x, OpenCV, and MMCV combinations

As long as you categorize problems into these three types in the future, you won't get lost.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions