Baechi-PyTorch: Automated Model Parallelism in PyTorch

What? To train large DNN's over GPUs with limited memory, the model must be split across multiple devices - Model Parallelism. Similarly, training times can be reduced by distributing parallel branches on the model across the devices.

Why? Currently, the process is manual and largely based on heuristics, as we demonstrate here (Section 1.2)

How? In Baechi, we adopt an algorithmic approach to the placement problem for running DNN training graphs on a small cluster of memory-constrained devices. Baechi-PyTorch , automatically and optimally splits the model, given a number of GPU devices and their memory capacities.

Please find the design and usage information for Baechi-PyTorch here: link

Tensorflow implementation of Baechi can be found here: Baechi
The corresponding paper presented at SoCC 2020.

Draft of Baechi Extended version paper is here. (Currently, under review)

For any queries, suggestions etc please feel free to reach out at cshetty2@illinois.edu

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
baechi_core		baechi_core
docs		docs
experiment_scripts		experiment_scripts
main_training		main_training
model_library		model_library
traces		traces
utils		utils
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

baechi_core

baechi_core

docs

docs

experiment_scripts

experiment_scripts

main_training

main_training

model_library

model_library

traces

traces

utils

utils

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Baechi-PyTorch: Automated Model Parallelism in PyTorch

About

Releases

Packages

Languages

chiragcshetty/BaechiPyTorch

Folders and files

Latest commit

History

Repository files navigation

Baechi-PyTorch: Automated Model Parallelism in PyTorch

About

Resources

Stars

Watchers

Forks

Languages