RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat) #575

jmikedupont2 · 2024-04-11T19:04:30Z

The seq_len argument is deprecated and unused. It will be removed in v4.39.
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in run_code
exec(code, run_globals)
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/cli/run_server.py", line 235, in
main()
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/cli/run_server.py", line 219, in main
server = Server(
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/server/server.py", line 237, in init
throughput_info = get_server_throughput(
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/server/throughput.py", line 83, in get_server_throughput
cache[cache_key] = measure_throughput_info(
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/server/throughput.py", line 123, in measure_throughput_info
"inference_rps": measure_compute_rps(
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/server/throughput.py", line 218, in measure_compute_rps
cache = step(cache)
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/server/throughput.py", line 215, in step
outputs = block.forward(dummy_input, use_cache=inference, layer_past=cache if inference else None)
File "/mnt/data1/nix/time/2023/09/22/petals/.venv-omain/lib/python3.10/site-packages/tensor_parallel/tensor_parallel.py", line 99, in forward
return [self.module_shards[0](*args, **kwargs)][self.output_device_index]
File "/mnt/data1/nix/time/2023/09/22/petals/.venv-omain/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/data1/nix/time/2023/09/22/petals/.venv-omain/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/models/llama/block.py", line 264, in forward
outputs = super().forward(
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/models/llama/block.py", line 193, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/mnt/data1/nix/time/2023/09/22/petals/.venv-omain/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/data1/nix/time/2023/09/22/petals/.venv-omain/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/data1/nix/time/2023/09/22/petals/src/petals/models/llama/block.py", line 103, in forward
key_states = torch.cat([past_key_value[0], key_states], dim=2)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat)

The text was updated successfully, but these errors were encountered:

igorrafael · 2024-04-14T03:09:25Z

I have also encountered this issue.
I was using Docker Desktop (CE) on Windows 11. I validated my setup first by running ollama/ollama successfully.

jmikedupont2 · 2024-04-15T23:34:18Z

ok i fixed this in my branch rolling back the version.
meta-introspector@64e1361

the latest version of petals does not work for me

jmikedupont2 closed this as completed Apr 15, 2024

jmikedupont2 mentioned this issue Apr 16, 2024

Error trying to raise Mixtral private swarm server #569

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat) #575

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat) #575

jmikedupont2 commented Apr 11, 2024

igorrafael commented Apr 14, 2024

jmikedupont2 commented Apr 15, 2024

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat) #575

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat) #575

Comments

jmikedupont2 commented Apr 11, 2024

igorrafael commented Apr 14, 2024

jmikedupont2 commented Apr 15, 2024