PyTorch Distributed Data Parallel for Transformer Models

A simple example for distributed training and evaluating a Transformer-based text classification model on a single node with multiple GPUs. This code is based on https://theaisummer.com/distributed-training-pytorch/

Train:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 text_cls_ddp.py --batch_size 8

Test only:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 text_cls_test_ddp.py --batch_size 8

https://pytorch.org/docs/master/notes/ddp.html

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
data_parallel.py		data_parallel.py
distributed_data_parallel.py		distributed_data_parallel.py
fp16_ddp.py		fp16_ddp.py
single_gpu.py		single_gpu.py
text_cls_ddp.py		text_cls_ddp.py
text_cls_test_ddp.py		text_cls_test_ddp.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

data_parallel.py

data_parallel.py

distributed_data_parallel.py

distributed_data_parallel.py

fp16_ddp.py

fp16_ddp.py

single_gpu.py

single_gpu.py

text_cls_ddp.py

text_cls_ddp.py

text_cls_test_ddp.py

text_cls_test_ddp.py

utils.py

utils.py

Repository files navigation

PyTorch Distributed Data Parallel for Transformer Models

About

Releases

Packages

Languages

Han8931/nlp_ddp

Folders and files

Latest commit

History

Repository files navigation

PyTorch Distributed Data Parallel for Transformer Models

About

Resources

Stars

Watchers

Forks

Languages