To support larger-scale model deployment requirements, PyTorchEngine provides multi-node deployment support. Below are the detailed steps for deploying a tp=16
model across two 8-GPU nodes.
To ensure consistency across the cluster environment, it is recommended to use Docker to set up the cluster. Create containers on each node as follows:
docker run -it \
--network host \
-v $MODEL_PATH:$CONTAINER_MODEL_PATH \
openmmlab/lmdeploy:latest
Important
Ensure that the model is placed in the same directory on all node containers.
Select one node as the head node and run the following command in its container:
ray start --head --port=$DRIVER_PORT
On the other nodes, use the following command in their containers to join the cluster created by the head node:
ray start --address=$DRIVER_NODE_ADDR:$DRIVER_PORT
run ray status
on head node to check the cluster.
Important
Ensure that DRIVER_NODE_ADDR
is the address of the head node and DRIVER_PORT
matches the port number used during the head node initialization.
In the head node's container, you can use all functionalities of PyTorchEngine as usual.
lmdeploy serve api_server \
$CONTAINER_MODEL_PATH \
--backend pytorch \
--tp 16
from lmdeploy import pipeline, PytorchEngineConfig
if __name__ == '__main__':
model_path = '/path/to/model'
backend_config = PytorchEngineConfig(tp=16)
with pipeline(model_path, backend_config=backend_config) as pipe:
outputs = pipe('Hakuna Matata')
Note
PyTorchEngine will automatically choose the appropriate launch method (single-node/multi-node) based on the tp
parameter and the number of devices available in the cluster. If you want to enforce the use of the Ray cluster, you can configure distributed_executor_backend='ray'
in PytorchEngineConfig
or use the environment variable LMDEPLOY_EXECUTOR_BACKEND=ray
.
By following the steps above, you can successfully deploy PyTorchEngine in a multi-node environment and leverage the Ray cluster for distributed computing.
Warning
To achieve better performance, we recommend users to configure a higher-quality network environment (such as using InfiniBand) to improve engine efficiency.