Fix dtype- and device-related client issues #98

borzunov · 2022-11-29T10:37:55Z

This PR:

Makes inference/forward/backward calls on client remember the dtype and device of source tensors, then move/cast the outputs to the same dtype/device. This way:
- Users don't need to make changes in the code launching RemoteSequential to make it run on a different device.
- model.generate() also starts to support both CPU and GPU.
- See the draft of the GPU-based Colab notebook for running inference/generate/forward/backward through the public swarm with BLOOM-176B.
Sets default low_cpu_mem_usage=True, client's request timeout to 20 sec.
Removes excess casts to float32 left in Dmitry's code.
(minor) Improves error messages.

borzunov added 8 commits November 29, 2022 07:23

Show tracebacks in case of empty error messages

e37b2f5

Remove warning about RemoteSequential being experimental

a16b78f

Set low_cpu_mem_usage=True by default

6e7565e

Don't cast logits to float32 on GPU

86d08bf

Set low_cpu_mem_usage=True by default (fix)

3a37e4f

Move inference/fwd/bwd outputs to the same devices and dtypes as inputs

6190a59

Remove excess .float() cast

d8dac55

Increase default client's timeout

3866188

borzunov merged commit ab41223 into main Nov 29, 2022

Provide feedback