You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
trying for the 1st time to fine-tune colbert-ir/colbertv2.0 on my mac, an error is shown in the log:
ValueError: DistributedDataParallel device_ids and output_device arguments only work with single-device/multiple-device GPU modules or CPU modules, but got device_ids [0], output_device 0, and module parameters {device(type='cpu')}
My code is just the following, mostly copy/paste from the readme (I removed the all_documents arg)
trainer = RAGTrainer(model_name = "MyFineTunedColBERT",
pretrained_model_name = "colbert-ir/colbertv2.0") # In this example, we run fine-tuning
# This step handles all the data processing, check the examples for more details!
trainer.prepare_training_data(raw_data=pairs,
data_out_path="../data/",
# all_documents=my_full_corpus
)
(pairs is a List of 3581 tuples such as:
('1234yf', "Le 1234yf est un bla bla...")
Here is the trace:
...
Using config.bsize = 32 (per process) and config.accumsteps = 1
[Jan 28, 14:42:04] #> Loading the queries from ../data/queries.train.colbert.tsv ...
[Jan 28, 14:42:04] #> Got 3581 queries. All QIDs are unique.
[Jan 28, 14:42:04] #> Loading collection...
0M
[Jan 28, 14:42:05] Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...
Process Process-5:
Traceback (most recent call last):
File "/Users/fps/.pyenv/versions/3.10.11/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/Users/fps/.pyenv/versions/3.10.11/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/fps/.pyenv/versions/fps_env/lib/python3.10/site-packages/colbert/infra/launcher.py", line 128, in setup_new_process
return_val = callee(config, *args)
File "/Users/fps/.pyenv/versions/fps_env/lib/python3.10/site-packages/colbert/training/training.py", line 55, in train
colbert = torch.nn.parallel.DistributedDataParallel(colbert, device_ids=[config.rank],
File "/Users/fps/.pyenv/versions/fps_env/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 603, in __init__
self._log_and_throw(
File "/Users/fps/.pyenv/versions/fps_env/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 769, in _log_and_throw
raise err_type(err_msg)
ValueError: DistributedDataParallel device_ids and output_device arguments only work with single-device/multiple-device GPU modules or CPU modules, but got device_ids [0], output_device 0, and module parameters {device(type='cpu')}.
This didn't stop the execution, which seems to be continuing (without any new output)
The text was updated successfully, but these errors were encountered:
Hi,
trying for the 1st time to fine-tune colbert-ir/colbertv2.0 on my mac, an error is shown in the log:
My code is just the following, mostly copy/paste from the readme (I removed the all_documents arg)
(pairs is a List of 3581 tuples such as:
Here is the trace:
This didn't stop the execution, which seems to be continuing (without any new output)
The text was updated successfully, but these errors were encountered: