You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trying to fine-tune a model of type float64 (e.g. MP medium model with an appropriate change to mace.tools.load_foundations to accommodate the different max_L, or the old MP large model after it's been converted to float64) fails with the error below.
Is this an issue with some fine-tuning specific code that's implicitly assuming some other dtype, or is it related to this known issue when training with pytorch with torch.set_default_dtype(torch.float64), or something else?
Traceback (most recent call last):
File "/home/cluster2/bernstei/src/work/MACE/mace_github/mace/cli/run_train.py", line 584, in <module>
main()
File "/home/cluster2/bernstei/src/work/MACE/mace_github/mace/cli/run_train.py", line 510, in main
tools.train(
File "/home/cluster2/bernstei/src/work/MACE/mace_github/mace/tools/train.py", line 92, in train
_, opt_metrics = take_step(
File "/home/cluster2/bernstei/src/work/MACE/mace_github/mace/tools/train.py", line 253, in take_step
optimizer.step()
File "/home/Software/python/system/torch/gpu/lib64/python3.9/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
return wrapped(*args, **kwargs)
File "/home/Software/python/system/torch/gpu/lib64/python3.9/site-packages/torch/optim/optimizer.py", line 373, in wrapper
out = func(*args, **kwargs)
File "/home/Software/python/system/torch/gpu/lib64/python3.9/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
ret = func(self, *args, **kwargs)
File "/home/Software/python/system/torch/gpu/lib64/python3.9/site-packages/torch/optim/adam.py", line 163, in step
adam(
File "/home/Software/python/system/torch/gpu/lib64/python3.9/site-packages/torch/optim/adam.py", line 311, in adam
func(params,
File "/home/Software/python/system/torch/gpu/lib64/python3.9/site-packages/torch/optim/adam.py", line 474, in _multi_tensor_adam
grouped_tensors = Optimizer._group_tensors_by_device_and_dtype(
File "/home/Software/python/system/torch/gpu/lib64/python3.9/site-packages/torch/optim/optimizer.py", line 397, in _group_tensors_by_device_and_dtype
return _group_tensors_by_device_and_dtype(tensorlistlist, with_indices)
File "/home/Software/python/system/torch/gpu/lib64/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/Software/python/system/torch/gpu/lib64/python3.9/site-packages/torch/utils/_foreach_utils.py", line 42, in _group_tensors_by_device_and_dtype
torch._C._group_tensors_by_device_and_dtype(tensorlistlist, with_indices).items()
RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32 notwithstanding
The text was updated successfully, but these errors were encountered:
Trying to fine-tune a model of type float64 (e.g. MP medium model with an appropriate change to
mace.tools.load_foundations
to accommodate the differentmax_L
, or the old MP large model after it's been converted tofloat64
) fails with the error below.Is this an issue with some fine-tuning specific code that's implicitly assuming some other dtype, or is it related to this known issue when training with pytorch with
torch.set_default_dtype(torch.float64)
, or something else?The text was updated successfully, but these errors were encountered: