Skip to content

Latest commit

 

History

History
39 lines (27 loc) · 1.44 KB

running.md

File metadata and controls

39 lines (27 loc) · 1.44 KB

Running BytePS

BytePS follows the same running model as MXNet's PS implemenation, and provides a script, launcher/launcher.py, to help you start individual processes.

Let's say you have two worker machines (or docker containers) that have GPUs, one machine or container as a server, and a scheduler. The scheduler binds on 10.0.0.1 and port 9000. The workers and the server can connect to the scheduler via the IP and port using TCP.

To use launcher/launcher.py, NVIDIA_VISIBLE_DEVICES should exist -- either automatically set by nvidia-docker, or manually set by you.

On worker 0, run:

DMLC_ROLE=worker DMLC_PS_ROOT_URI=10.0.0.1 DMLC_PS_ROOT_PORT=9000 \
DMLC_WORKER_ID=0 DMLC_NUM_WORKER=2 DMLC_NUM_SERVER=1 \
python launcher/launcher.py YOUR_COMMAND

On worker 1, run (only DMLC_WORKER_ID is different from above):

DMLC_ROLE=worker DMLC_PS_ROOT_URI=10.0.0.1 DMLC_PS_ROOT_PORT=9000 \
DMLC_WORKER_ID=1 DMLC_NUM_WORKER=2 DMLC_NUM_SERVER=1 \
python launcher/launcher.py YOUR_COMMAND

On the server, run (remove DMLC_WORKER_ID, and set role to server):

DMLC_ROLE=server DMLC_PS_ROOT_URI=10.0.0.1 DMLC_PS_ROOT_PORT=9000 \
DMLC_NUM_WORKER=2 DMLC_NUM_SERVER=1 python launcher/launcher.py

On the scheduler, run (remove DMLC_WORKER_ID, and set role to scheduler):

DMLC_ROLE=scheduler DMLC_PS_ROOT_URI=10.0.0.1 DMLC_PS_ROOT_PORT=9000 \
DMLC_NUM_WORKER=2 DMLC_NUM_SERVER=1 python launcher/launcher.py

The order of above commands does not matter.