Unifying data transfer with numpy array #121

hcngac · 2021-11-28T03:54:19Z

All data transfer are now in numpy array.

Description

For model weights, the transfer type is an OrderedDict{name: numpy.nparray}
For features, the transfer type is an list[(numpy.nparray, numpy.nparray)], first value is feature while second value is target.

How has this been tested?

Tested with the following config:

'configs/MNIST/fedavg_lenet5_noniid'
'configs/MNIST/fedavg_lenet5'
'configs/MNIST/fedprox_lenet5'
'configs/MNIST/mistnet_lenet5'
'configs/MNIST/mistnet_pretrain_lenet5'

Please help test for mindspore and Tensorflow. I don't have a proper machine for testing for now.

Types of changes

Bug fix (non-breaking change which fixes an issue) Fixes #
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.

…ut_shape function

…tem-main

baochunli · 2021-11-28T04:07:54Z

There is no reason why numpy arrays need to be used at all — except when it is absolutely necessary. See my comments from the closed issue. In my opinion, it adds to the overhead of computation. If Tensors can be pickled, they should be used for communication.

baochunli · 2021-11-28T04:10:45Z

Also, this "feature" of always converting to numpy made the code much less elegant and much harder to follow, which is in my opinion not a good design. One should focus more on DataProcessors that actually do something with the tensors, not converting them to numpy arrays.

hcngac · 2021-11-28T04:11:54Z

This is to prepare for #119 to avoid implementing the same local differential privacy method for different framework.

baochunli · 2021-11-28T04:15:50Z

I don't think we should worry too much about having the same differential privacy across frameworks. Most of the code on differential privacy would most likely be framework-agnostic anyway — see servers/fedavg.py that uses the same code to do federated averaging for all the frameworks. At least, such worry should not lead to adding more computation overhead because of this idea of converting to numpy arrays -- it's just too high price to pay for something that's not necessarily an issue at all.

hcngac · 2021-11-28T04:17:27Z

The overly complicated code is because currently the server/client and algorithm base classes seem to imply that framework-specific implementation should live within algorithm class and should not leak into server/client class. However some of the inherited classes of client/server does have strong assumption on the framework used. Not sure about the design choice but I opted to keep them separated as much as possible.

baochunli · 2021-11-28T04:17:34Z

Instead, we should work on differential privacy methods on PyTorch, make it work first, and then see what it takes to bring it over the TensorFlow and other frameworks. This avoids all the boilerplate code and overhead of converting to numpy.

hcngac · 2021-11-28T04:19:02Z

Instead, we should work on differential privacy methods on PyTorch, make it work first, and then see what it takes to bring it over the TensorFlow and other frameworks. This avoids all the boilerplate code and overhead of converting to numpy.

Thats understandable.

baochunli · 2021-11-28T04:23:57Z

MindSpore and TensorFlow tests are now part of our continuous integration (CI) tests. These tests will be run and results will be shown for all PRs.

baochunli · 2021-11-28T04:26:17Z

The overly complicated code is because currently the server/client and algorithm base classes seem to imply that framework-specific implementation should live within algorithm class and should not leak into server/client class. However some of the inherited classes of client/server does have strong assumption on the framework used. Not sure about the design choice but I opted to keep them separated as much as possible.

They should indeed be separate — if some of the classes (excluding code in examples/) in clients or servers are framework-specific, these issues need to be eventually fixed. All framework-specific code should reside in Algorithms, Datasources, Samplers, and Trainers. Clients and servers should be framework-agnostic.

baochunli · 2021-11-28T04:28:02Z

Perhaps a good starting point is to try to add differential privacy by using random response (in utils/). It was previously used only in MistNet, but not using the new DataProcessors.

hcngac · 2021-11-28T04:30:37Z

The overly complicated code is because currently the server/client and algorithm base classes seem to imply that framework-specific implementation should live within algorithm class and should not leak into server/client class. However some of the inherited classes of client/server does have strong assumption on the framework used. Not sure about the design choice but I opted to keep them separated as much as possible.

They should indeed be separate — if some of the classes (excluding code in examples/) in clients or servers are framework-specific, these issues need to be eventually fixed. All framework-specific code should reside in Algorithms, Datasources, Samplers, and Trainers. Clients and servers should be framework-agnostic.

That clears things up.

Continuing on #119, DataProcessors are also framework-specific for now. Next step for the issue would be to implement the DataProcessor pipeline onto server/client, and a sample DataProcessor for LDP with random response.

hcngac · 2021-11-28T04:34:50Z

There is also one customise_server_payload that shall be replaced with DataProcessor

baochunli · 2021-11-28T04:37:33Z

One more comment about the new Serializers: I think they represent boilerplate code that's not useful enough, for the reason that pickle is the only way to do it anyway as far as I know. I am not aware of any other way.

baochunli · 2021-11-28T04:40:22Z

There is also one customise_server_payload that shall be replaced with DataProcessor

This makes sense.

hcngac · 2021-11-28T04:44:46Z

There is one thing for using numpy for data transfer: for ./run --config=configs/MNIST/mistnet_lenet5.yml, feature transfer size is 311.3 MB before, and 14.08 MB after. I have not verified the correctness or anything else for now, just this discovery.

baochunli · 2021-11-28T05:09:53Z

That's great to hear.

hcngac · 2021-11-28T05:35:25Z

pytorch/pytorch#19408

Could be related, as the features transferred is a list of (torch.Tensor, torch.Tensor)

baochunli · 2021-11-28T18:13:03Z

The PR failed the YOLOv5 PyTorch test (./run --config=configs/YOLO/fedavg_yolov5.yml), showing the perils of changing too much codebase that has already been "field-tested."

…er scalar input

hcngac · 2021-11-28T19:53:28Z

Closing this PR and moving to the next step now

hcngac and others added 15 commits November 18, 2021 11:49

Sampler required DataSource object, eliminate FeatureDataset classes

f9de5b1

Merge branch 'TL-System:main' into main

3e71801

DataSource.input_shape static method, new datasource_registry.get_inp…

0c6cc9e

…ut_shape function

Merge branch 'TL-System:main' into main

48704c2

Merge branch 'TL-System:main' into main

d70b24d

Base classes for data preprocessor

de19ba6

Formatter

c05c7a0

Reorg: model parameter and feature have different preprocessors

dff928c

Reorg

58d87da

to and from numpy

6496579

unified transfer format

224884c

features to numpys for transfer

86608d3

fix examples with customize_server_payload

51a9d8b

Merge branch 'main' of https://github.com/TL-System/plato into TL-Sys…

f73d460

…tem-main

mindspore

3ce987e

Fix: use torch.tensor() instead of torch.Tensor() for the behavior ov…

20d3483

…er scalar input

hcngac closed this Nov 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unifying data transfer with numpy array #121

Unifying data transfer with numpy array #121

hcngac commented Nov 28, 2021

baochunli commented Nov 28, 2021 •

edited

baochunli commented Nov 28, 2021 •

edited

hcngac commented Nov 28, 2021

baochunli commented Nov 28, 2021

hcngac commented Nov 28, 2021 •

edited

baochunli commented Nov 28, 2021

hcngac commented Nov 28, 2021

baochunli commented Nov 28, 2021

baochunli commented Nov 28, 2021

baochunli commented Nov 28, 2021

hcngac commented Nov 28, 2021

hcngac commented Nov 28, 2021

baochunli commented Nov 28, 2021

baochunli commented Nov 28, 2021

hcngac commented Nov 28, 2021

baochunli commented Nov 28, 2021

hcngac commented Nov 28, 2021

baochunli commented Nov 28, 2021 •

edited

hcngac commented Nov 28, 2021

Unifying data transfer with numpy array #121

Unifying data transfer with numpy array #121

Conversation

hcngac commented Nov 28, 2021

Description

How has this been tested?

Types of changes

Checklist:

baochunli commented Nov 28, 2021 • edited

baochunli commented Nov 28, 2021 • edited

hcngac commented Nov 28, 2021

baochunli commented Nov 28, 2021

hcngac commented Nov 28, 2021 • edited

baochunli commented Nov 28, 2021

hcngac commented Nov 28, 2021

baochunli commented Nov 28, 2021

baochunli commented Nov 28, 2021

baochunli commented Nov 28, 2021

hcngac commented Nov 28, 2021

hcngac commented Nov 28, 2021

baochunli commented Nov 28, 2021

baochunli commented Nov 28, 2021

hcngac commented Nov 28, 2021

baochunli commented Nov 28, 2021

hcngac commented Nov 28, 2021

baochunli commented Nov 28, 2021 • edited

hcngac commented Nov 28, 2021

baochunli commented Nov 28, 2021 •

edited

baochunli commented Nov 28, 2021 •

edited

hcngac commented Nov 28, 2021 •

edited

baochunli commented Nov 28, 2021 •

edited