Fix dummy cache allocation #574

artek0chumak · 2024-04-11T08:56:28Z

Fix allocation of the dummy key_value cache. It's not used in actual computations, but torch checkers require them to be on the correct device.

mryab · 2024-04-11T09:32:46Z

src/petals/server/throughput.py

@@ -206,7 +206,7 @@ def measure_compute_rps(
        block = block.to(dtype)
        block = convert_block(block, 0, config, tensor_parallel_devices, device, quant_type=quant_type, freeze=True)

-        cache = (DUMMY_KEY_PAST.to(dtype), DUMMY_KEY_PAST.to(dtype))
+        cache = (DUMMY_KEY_PAST.to(dtype).to(device), DUMMY_KEY_PAST.to(dtype).to(device))


Let's replace chained calls by a single call to .to

justheuristic

LGTM, sorry i took so long

Fix dummy cache allocation

f02bd57

artek0chumak mentioned this pull request Apr 11, 2024

Error trying to raise Mixtral private swarm server #569

Closed

artek0chumak self-assigned this Apr 11, 2024

artek0chumak requested a review from justheuristic April 11, 2024 09:07

mryab reviewed Apr 11, 2024

View reviewed changes

artek0chumak added 2 commits April 11, 2024 11:37

Try mps device selecting

e5dddfe

Rechain reloc

0ca54a5

justheuristic approved these changes Apr 16, 2024

View reviewed changes

artek0chumak merged commit 30f522d into main Apr 16, 2024
11 checks passed

artek0chumak deleted the fix_cache_alloc branch April 16, 2024 17:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dummy cache allocation #574

Fix dummy cache allocation #574

artek0chumak commented Apr 11, 2024

mryab Apr 11, 2024

justheuristic left a comment

Fix dummy cache allocation #574

Fix dummy cache allocation #574

Conversation

artek0chumak commented Apr 11, 2024

mryab Apr 11, 2024

Choose a reason for hiding this comment

justheuristic left a comment

Choose a reason for hiding this comment