inference without horovod

dmlc · Jul 10, 2020 · 4074a26 · 4074a26
1 parent 31cb953
commit 4074a26
Show file tree

Hide file tree

Showing 2 changed files with 13 additions and 1 deletion.
diff --git a/scripts/question_answering/README.md b/scripts/question_answering/README.md
@@ -70,6 +70,13 @@ python run_squad.py \
     --overwrite_cache \
 ```
 
+We support multi-GPU training via horovod:
+
+```bash
+mpirun -np 4 -H localhost:4 python run_squad.py \
+    --comm_backend horovod \
+    ...
+```
 As for ELECTRA model, we fine-tune it with layer-wise learning rate decay as
 
 ```bash

diff --git a/scripts/question_answering/run_squad.py b/scripts/question_answering/run_squad.py
@@ -806,8 +806,13 @@ def predict_extended(original_feature,
 
 
 def evaluate(args, last=True):
+    store, num_workers, rank, local_rank, is_master_node, ctx_l = init_comm(
+        args.comm_backend, args.gpus)
+    # only evaluate once
+    if rank != 0:
+        return
     ctx_l = parse_ctx(args.gpus)
-    logging.info('Srarting inference without horovod')
+    logging.info('Srarting inference without horovod on the first node')
 
     cfg, tokenizer, qa_net, use_segmentation = get_network(
         args.model_name, ctx_l, args.classifier_dropout)