Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mac m1: trainer.train: ImportError: incompatible architecture (have 'x86_64', need 'arm64') #145

Closed
yiershi opened this issue Feb 19, 2024 · 2 comments

Comments

@yiershi
Copy link

yiershi commented Feb 19, 2024

System Info

Apple M1 Pro 32 GB
version: 13.6 (22G120)

Reproduction

The code that reported the error.

examples/02-basic_training.ipynb

trainer.train(batch_size=32,
              nbits=4, # How many bits will the trained model use when compressing indexes
              maxsteps=500000, # Maximum steps hard stop
              use_ib_negatives=True, # Use in-batch negative to calculate loss
              dim=128, # How many dimensions per embedding. 128 is the default and works well.
              learning_rate=5e-6, # Learning rate, small values ([3e-6,3e-5] work best if the base model is BERT-like, 5e-6 is often the sweet spot)
              doc_maxlen=256, # Maximum document length. Because of how ColBERT works, smaller chunks (128-256) work very well.
              use_relu=False, # Disable ReLU -- doesn't improve performance
              warmup_steps="auto", # Defaults to 10%
             )

After executing the cell code above, the following error is reported

[Feb 19, 13:28:20] Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...
Process Process-1:
Traceback (most recent call last):
  File "/Users/hao/miniforge3/envs/colbert/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/hao/miniforge3/envs/colbert/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/hao/miniforge3/envs/colbert/lib/python3.9/site-packages/colbert/infra/launcher.py", line 134, in setup_new_process
    return_val = callee(config, *args)
  File "/Users/hao/miniforge3/envs/colbert/lib/python3.9/site-packages/colbert/training/training.py", line 48, in train
    colbert = ColBERT(name=config.checkpoint, colbert_config=config)
  File "/Users/hao/miniforge3/envs/colbert/lib/python3.9/site-packages/colbert/modeling/colbert.py", line 24, in __init__
    ColBERT.try_load_torch_extensions(self.use_gpu)
  File "/Users/hao/miniforge3/envs/colbert/lib/python3.9/site-packages/colbert/modeling/colbert.py", line 39, in try_load_torch_extensions
    segmented_maxsim_cpp = load(
  File "/Users/hao/miniforge3/envs/colbert/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1306, in load
    return _jit_compile(
  File "/Users/hao/miniforge3/envs/colbert/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1736, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/Users/hao/miniforge3/envs/colbert/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2132, in _import_module_from_library
    module = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 565, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1173, in create_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
ImportError: dlopen(/Users/hao/Library/Caches/torch_extensions/py39_cpu/segmented_maxsim_cpp/segmented_maxsim_cpp.so, 0x0002): tried: '/Users/hao/Library/Caches/torch_extensions/py39_cpu/segmented_maxsim_cpp/segmented_maxsim_cpp.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/Users/hao/Library/Caches/torch_extensions/py39_cpu/segmented_maxsim_cpp/segmented_maxsim_cpp.so' (no such file), '/Users/hao/Library/Caches/torch_extensions/py39_cpu/segmented_maxsim_cpp/segmented_maxsim_cpp.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))

Solutions Tried But Not Working

Referenced solution: #102

I moved /Users/hao/Library/Caches/torch_extensions/py39_cpu/segmented_maxsim_cpp to other name(eg. segmented_maxsim_cpp_20240219) or removed directory ~/Library/Caches/torch_extensions, and then rerun trainer.train(), but the problem still remains.

# solution1
cd ~/Library/Caches/torch_extensions/py39_cpu/
mv segmented_maxsim_cpp segmented_maxsim_cpp_20240218

# solution2
rm -r ~/Library/Caches/torch_extensions

python env

$ python --version
Python 3.9.18

$ pip list
Package               Version
--------------------- -----------
aiohttp               3.9.1
aiosignal             1.3.1
annotated-types       0.6.0
anyio                 4.2.0
appnope               0.1.4
asttokens             2.4.1
async-timeout         4.0.3
attrs                 23.2.0
bitarray              2.9.2
blinker               1.7.0
catalogue             2.0.10
certifi               2024.2.2
charset-normalizer    3.3.2
click                 8.1.7
colbert-ai            0.2.19
comm                  0.2.1
dataclasses-json      0.6.4
datasets              2.17.0
debugpy               1.8.1
decorator             5.1.1
Deprecated            1.2.14
dill                  0.3.8
dirtyjson             1.0.8
distro                1.9.0
exceptiongroup        1.2.0
executing             2.0.1
faiss-cpu             1.7.4
filelock              3.13.1
Flask                 3.0.2
frozenlist            1.4.1
fsspec                2023.10.0
git-python            1.0.3
gitdb                 4.0.11
GitPython             3.1.42
greenlet              3.0.3
h11                   0.14.0
httpcore              1.0.3
httpx                 0.26.0
huggingface-hub       0.20.3
idna                  3.6
importlib-metadata    7.0.1
ipykernel             6.29.2
ipython               8.18.1
itsdangerous          2.1.2
jedi                  0.19.1
Jinja2                3.1.3
joblib                1.3.2
jsonpatch             1.33
jsonpointer           2.4
jupyter_client        8.6.0
jupyter_core          5.7.1
langchain             0.1.7
langchain-community   0.0.20
langchain-core        0.1.23
langsmith             0.0.87
llama-index           0.9.48
MarkupSafe            2.1.5
marshmallow           3.20.2
matplotlib-inline     0.1.6
mpmath                1.3.0
multidict             6.0.5
multiprocess          0.70.16
mypy-extensions       1.0.0
nest-asyncio          1.6.0
networkx              3.2.1
ninja                 1.11.1.1
nltk                  3.8.1
numpy                 1.26.4
onnx                  1.15.0
openai                1.12.0
packaging             23.2
pandas                2.2.0
parso                 0.8.3
pexpect               4.9.0
pillow                10.2.0
pip                   24.0
platformdirs          4.2.0
prompt-toolkit        3.0.43
protobuf              4.25.3
psutil                5.9.8
ptyprocess            0.7.0
pure-eval             0.2.2
pyarrow               15.0.0
pyarrow-hotfix        0.6
pydantic              2.6.1
pydantic_core         2.16.2
Pygments              2.17.2
pyspark               2.4.7
python-dateutil       2.8.2
python-dotenv         1.0.1
pytz                  2024.1
PyYAML                6.0.1
pyzmq                 25.1.2
RAGatouille           0.0.7.post3
regex                 2023.12.25
requests              2.31.0
ruff                  0.1.15
safetensors           0.4.2
scikit-learn          1.4.1.post1
scipy                 1.12.0
sentence-transformers 2.3.1
sentencepiece         0.1.99
setuptools            69.1.0
six                   1.16.0
smmap                 5.0.1
sniffio               1.3.0
SQLAlchemy            2.0.27
srsly                 2.4.8
stack-data            0.6.3
sympy                 1.12
tenacity              8.2.3
threadpoolctl         3.3.0
tiktoken              0.6.0
tokenizers            0.15.2
torch                 2.2.0
tornado               6.4
tqdm                  4.66.2
traitlets             5.14.1
transformers          4.37.2
typing_extensions     4.9.0
typing-inspect        0.9.0
tzdata                2024.1
ujson                 5.9.0
urllib3               2.2.1
voyager               2.0.2
wcwidth               0.2.13
Werkzeug              3.0.1
wheel                 0.42.0
wrapt                 1.16.0
xxhash                3.4.1
yarl                  1.9.4
zipp                  3.17.0
@bclavie
Copy link
Owner

bclavie commented Feb 20, 2024

Hi there!

Thank you for reporting this. Training is currently GPU/CUDA-only (this should be documented better) (cc @Anmol6 again), but this isn't quite the issue that I'd expect to pop up!

Do you have any issues loading the custom cpp code when running indexing?

@yiershi
Copy link
Author

yiershi commented Feb 22, 2024

Hi there!

Thank you for reporting this. Training is currently GPU/CUDA-only (this should be documented better) (cc @Anmol6 again), but this isn't quite the issue that I'd expect to pop up!

Do you have any issues loading the custom cpp code when running indexing?

Thanks for the reply. Now I have no issues loading the custom cpp code when running indexing

@yiershi yiershi closed this as completed Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants