Optimising the Training of Stable Deep Neural Network Architectures using Synthetic Gradients in PyTorch

This work is an extension of the Thesis component of MSc Computing Science which I studied at Imperial College London. The original project focused on implementing stable deep neural network architectures and optimising the training process's computational efficiency using the parallel variable distribution (PVD) algorithm.

The focus has shifted to achieving parallelisation by utilising synthetic gradients that work as seen in the paper Decoupled Neural Interfaces using Synthetic Gradients. In the training process for neural networks, the input data is first propgated in the forward direction, then the error is calculated and finally the error gradients are propagated backwards through the network and the weights are updated. However, the process is locked as layers (which can be grouped in "modules") must wait for both input features to flow through earlier sections of the network and for the error gradient to propagate backwards through the layers ahead. This results in forward and backward locking.

The synthetic gradients speed-up backpropagation by approximating the error between modules of layers and this unlocks the backward connection and allows for the distribution of training across multiple processors.

For this project you will need Python 3 and the following libraries:

Pytorch 0.4.1
torch
torchvision
autograd
multiprocessing
h5py
sklearn
numpy
scipy
os

Scripts for synthetic gradients:

The scripts below build, train and test a DNN. The user is free to specify the DNN architecture, hyperparameters and dataset within the script.

Runs standard DNN with no synthetic gradients

fullmodel.py

Distributes training over multiple processes

distSg.py

Distributes training over multiple processes and also uses a multilevel learning scheme

distMulti.py

The PVD code was not distributed across multiple processors but theoretical training times were calculated.

PVD.py trains the DNN using the PVD algorithm.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data		data
.gitattributes		.gitattributes
Antisymmetric.py		Antisymmetric.py
PVD.py		PVD.py
ResNet.py		ResNet.py
Verlet.py		Verlet.py
antiTest.py		antiTest.py
dataloader.py		dataloader.py
distMulti.py		distMulti.py
distPseudoCode.py		distPseudoCode.py
distSg.py		distSg.py
ellipse.py		ellipse.py
fullmodel.py		fullmodel.py
hdf5Test.py		hdf5Test.py
init_HDF5.py		init_HDF5.py
leapTest.py		leapTest.py
leapfrog.py		leapfrog.py
parallelNetworks.py		parallelNetworks.py
readme.md		readme.md
resTest.py		resTest.py
run_standard_sg.py		run_standard_sg.py
swiss.py		swiss.py
synthetic.py		synthetic.py
testParallelNet.py		testParallelNet.py
trialGeneralizedLinearModels.py		trialGeneralizedLinearModels.py
verletTest.py		verletTest.py

TheoryDev/Deep-neural-network-training-optimisation

Folders and files

Latest commit

History

Repository files navigation

Optimising the Training of Stable Deep Neural Network Architectures using Synthetic Gradients in PyTorch

Scripts for synthetic gradients:

Runs standard DNN with no synthetic gradients

Distributes training over multiple processes

Distributes training over multiple processes and also uses a multilevel learning scheme

The PVD code was not distributed across multiple processors but theoretical training times were calculated.

About

Resources

Stars

Watchers

Forks

Languages