Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat) #575

Closed
jmikedupont2 opened this issue Apr 11, 2024 · 2 comments

Comments

@jmikedupont2
Copy link

The seq_len argument is deprecated and unused. It will be removed in v4.39.
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in run_code
exec(code, run_globals)
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/cli/run_server.py", line 235, in
main()
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/cli/run_server.py", line 219, in main
server = Server(
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/server/server.py", line 237, in init
throughput_info = get_server_throughput(
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/server/throughput.py", line 83, in get_server_throughput
cache[cache_key] = measure_throughput_info(
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/server/throughput.py", line 123, in measure_throughput_info
"inference_rps": measure_compute_rps(
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/server/throughput.py", line 218, in measure_compute_rps
cache = step(cache)
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/server/throughput.py", line 215, in step
outputs = block.forward(dummy_input, use_cache=inference, layer_past=cache
if inference else None)
File "/mnt/data1/nix/time/2023/09/22/petals/.venv-omain/lib/python3.10/site-packages/tensor_parallel/tensor_parallel.py", line 99, in forward
return [self.module_shards[0](*args, **kwargs)][self.output_device_index]
File "/mnt/data1/nix/time/2023/09/22/petals/.venv-omain/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/data1/nix/time/2023/09/22/petals/.venv-omain/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/models/llama/block.py", line 264, in forward
outputs = super().forward(
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/models/llama/block.py", line 193, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/mnt/data1/nix/time/2023/09/22/petals/.venv-omain/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/data1/nix/time/2023/09/22/petals/.venv-omain/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/models/llama/block.py", line 103, in forward
key_states = torch.cat([past_key_value[0], key_states], dim=2)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat)

@igorrafael
Copy link

I have also encountered this issue.
I was using Docker Desktop (CE) on Windows 11. I validated my setup first by running ollama/ollama successfully.

@jmikedupont2
Copy link
Author

ok i fixed this in my branch rolling back the version.
meta-introspector@64e1361

the latest version of petals does not work for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants