PipeLearn: Pipeline Parallelism for Collaborative Machine Learning

Introduction

Collaborative machine learning (CML) techniques, such as federated learning, were proposed to collaboratively train deep learning models using multiple end-user devices and a server. CML techniques preserve the privacy of end-users as it does not require user data to be transferred to the server. Instead local models are trained and shared with the server. However, the low resource utilisation of CML techniques makes the training process inefficient, thereby limiting the use of CML in the real-world. Idling resources both on the server and devices due to sequential computation and communication is the principal cause of low resource utilisation. A novel framework PipeLearn that leverages pipeline parallelism for CML techniques is developed to substantially improve resource utilisation. A new training pipeline is designed to parallelise the computations on different hardware resources and communication on different bandwidth resources, thereby accelerating the training process in CML.

PipeLearn redesigns the training process of DNNs. Traditionally, training a DNN involves the forward propagation pass (or forward pass) and backward propagation pass (or backward pass). In the forward pass, one batch of input data (also known as a mini-batch) is used as input for the first input layer and the output of each layer is passed on to subsequent layers to compute the loss function. In the backward pass, the loss function is passed from the last DNN layer to the first layer to compute the gradients of the DNN model parameters.

PipeLearn divides the DNN into two parts and deploys them on the server and devices. Then the forward and backward passes are reordered for multiple mini-batches. Each device executes the forward pass for multiple mini-batches in sequence. The immediate result of each forward pass (smashed data or activations) is transmitted to the server, which runs the forward and backward passes for the remaining layers and sends the gradients of the activations back to the device. The device then sequentially performs the backward passes for the mini-batches. The devices operate in parallel, and the local models are aggregated at a set frequency. Since many forward passes occur sequentially on the device, the communication for each forward pass overlaps the computation of the following forward passes. Also, in PipeLearn, the server and device computations occur simultaneously for different mini-batches. Thus, PipeLearn reduces the idle time of devices and servers by overlapping server and device-side computations and server-device communication.

The above figure shows a training iteration in conventional split learning and PipeLearn. Computation and communication are overlapped in the training pipeline in PipeLearn. Compared with conventional split learning, the idle time in PipeLearn is significantly reduced.

Code Instruction

Environment Setup

This is a Python project and the recommended python version is python3.6. The dependency required for this project are listed in requirement.txt. You can install the dependency by

pip install -r requirements.txt

Collaborative machine learning requires a server and multiple devices. They all need the above environment. The dataset used is CIFAR-10, which will be downloaded automatically the first time the code is run.

Configuration

Before running the code, you need to personalise the config.json file. The file is in form of JSON and the meaning of items are listed below.

Config Item	Type	Description
experiment_name	string	The name of this experiments, e.g. "test01".
server_address	string	The server IP address.
port	int	The port of server.
client_num	int	The number of devices involved.
dataset_name	string	The dataset on which the model is trained. Available datasets include "CIFAR-10", "MNIST" and "SVHN".
model_name	string	The deep learning models. Available models include "VGG5", "VGG11", "VGG16", "ResNet18", "ResNet50", "MobileNetSmall", "MobileNetLarge".
data_size	int	The size of training data for each device.
test_data_size	int	The size of test data.
epoch_num	int	The epoch number of training.
aggregation_frequency	int	The number of training rounds at the end of which the aggregation happens. "-1" represent aggregation happens at the end of each epoch.
batch_size	int	The size of data batch used in each training round.
enable_val	bool	Whether calculate validation accuracy when training.
enable_test	bool	Whether calculate test accuracy at the end of training.
layer_num_on_client	int	How many layers deployed at the device side.
pipe_batch_num	int	How many batches are used in parallel in each training round.
profile_iter_num	int	How many training iteration the profiler takes to determine layer_num_on_client and pipe_batch_num. Set it to 0 to disable the profiler.
uplink_bandwidth	int	The uplink network bandwidth (Mbps).
downlink_bandwidth	int	The downlink network bandwidth (Mbps).

Running the Code

You need to start the project on the server first, and then start the project on devices.

Running on server:

python run.py -s

where -s means the code is running on server.

Running on device:

python run.py -i {device_index}

where -i represent the index of current device which starts from 0.

Results

The results, including model accuracy, training time, idle time, etc., are saved in results/results.csv and also printed on screen.

Citation

TBD

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
figs		figs
networks		networks
node		node
util		util
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
config.json		config.json
config.py		config.py
requirements.txt		requirements.txt
run.py		run.py
run_client.py		run_client.py
run_server.py		run_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PipeLearn: Pipeline Parallelism for Collaborative Machine Learning

Introduction

Code Instruction

Environment Setup

Configuration

Running the Code

Results

Citation

About

Releases

Packages

Languages

License

blessonvar/PipeLearn

Folders and files

Latest commit

History

Repository files navigation

PipeLearn: Pipeline Parallelism for Collaborative Machine Learning

Introduction

Code Instruction

Environment Setup

Configuration

Running the Code

Results

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages