Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unifying data transfer with numpy array #121

Closed
wants to merge 16 commits into from
Closed

Unifying data transfer with numpy array #121

wants to merge 16 commits into from

Conversation

hcngac
Copy link
Contributor

@hcngac hcngac commented Nov 28, 2021

All data transfer are now in numpy array.

Description

For model weights, the transfer type is an OrderedDict{name: numpy.nparray}
For features, the transfer type is an list[(numpy.nparray, numpy.nparray)], first value is feature while second value is target.

How has this been tested?

Tested with the following config:

'configs/MNIST/fedavg_lenet5_noniid'
'configs/MNIST/fedavg_lenet5'
'configs/MNIST/fedprox_lenet5'
'configs/MNIST/mistnet_lenet5'
'configs/MNIST/mistnet_pretrain_lenet5'

Please help test for mindspore and Tensorflow. I don't have a proper machine for testing for now.

Types of changes

  • Bug fix (non-breaking change which fixes an issue) Fixes #
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@baochunli
Copy link
Collaborator

baochunli commented Nov 28, 2021

There is no reason why numpy arrays need to be used at all — except when it is absolutely necessary. See my comments from the closed issue. In my opinion, it adds to the overhead of computation. If Tensors can be pickled, they should be used for communication.

@baochunli
Copy link
Collaborator

baochunli commented Nov 28, 2021

Also, this "feature" of always converting to numpy made the code much less elegant and much harder to follow, which is in my opinion not a good design. One should focus more on DataProcessors that actually do something with the tensors, not converting them to numpy arrays.

@hcngac
Copy link
Contributor Author

hcngac commented Nov 28, 2021

This is to prepare for #119 to avoid implementing the same local differential privacy method for different framework.

@baochunli
Copy link
Collaborator

I don't think we should worry too much about having the same differential privacy across frameworks. Most of the code on differential privacy would most likely be framework-agnostic anyway — see servers/fedavg.py that uses the same code to do federated averaging for all the frameworks. At least, such worry should not lead to adding more computation overhead because of this idea of converting to numpy arrays -- it's just too high price to pay for something that's not necessarily an issue at all.

@hcngac
Copy link
Contributor Author

hcngac commented Nov 28, 2021

The overly complicated code is because currently the server/client and algorithm base classes seem to imply that framework-specific implementation should live within algorithm class and should not leak into server/client class. However some of the inherited classes of client/server does have strong assumption on the framework used. Not sure about the design choice but I opted to keep them separated as much as possible.

@baochunli
Copy link
Collaborator

Instead, we should work on differential privacy methods on PyTorch, make it work first, and then see what it takes to bring it over the TensorFlow and other frameworks. This avoids all the boilerplate code and overhead of converting to numpy.

@hcngac
Copy link
Contributor Author

hcngac commented Nov 28, 2021

Instead, we should work on differential privacy methods on PyTorch, make it work first, and then see what it takes to bring it over the TensorFlow and other frameworks. This avoids all the boilerplate code and overhead of converting to numpy.

Thats understandable.

@baochunli
Copy link
Collaborator

MindSpore and TensorFlow tests are now part of our continuous integration (CI) tests. These tests will be run and results will be shown for all PRs.

@baochunli
Copy link
Collaborator

The overly complicated code is because currently the server/client and algorithm base classes seem to imply that framework-specific implementation should live within algorithm class and should not leak into server/client class. However some of the inherited classes of client/server does have strong assumption on the framework used. Not sure about the design choice but I opted to keep them separated as much as possible.

They should indeed be separate — if some of the classes (excluding code in examples/) in clients or servers are framework-specific, these issues need to be eventually fixed. All framework-specific code should reside in Algorithms, Datasources, Samplers, and Trainers. Clients and servers should be framework-agnostic.

@baochunli
Copy link
Collaborator

Perhaps a good starting point is to try to add differential privacy by using random response (in utils/). It was previously used only in MistNet, but not using the new DataProcessors.

@hcngac
Copy link
Contributor Author

hcngac commented Nov 28, 2021

The overly complicated code is because currently the server/client and algorithm base classes seem to imply that framework-specific implementation should live within algorithm class and should not leak into server/client class. However some of the inherited classes of client/server does have strong assumption on the framework used. Not sure about the design choice but I opted to keep them separated as much as possible.

They should indeed be separate — if some of the classes (excluding code in examples/) in clients or servers are framework-specific, these issues need to be eventually fixed. All framework-specific code should reside in Algorithms, Datasources, Samplers, and Trainers. Clients and servers should be framework-agnostic.

That clears things up.

Continuing on #119, DataProcessors are also framework-specific for now. Next step for the issue would be to implement the DataProcessor pipeline onto server/client, and a sample DataProcessor for LDP with random response.

@hcngac
Copy link
Contributor Author

hcngac commented Nov 28, 2021

There is also one customise_server_payload that shall be replaced with DataProcessor

@baochunli
Copy link
Collaborator

One more comment about the new Serializers: I think they represent boilerplate code that's not useful enough, for the reason that pickle is the only way to do it anyway as far as I know. I am not aware of any other way.

@baochunli
Copy link
Collaborator

There is also one customise_server_payload that shall be replaced with DataProcessor

This makes sense.

@hcngac
Copy link
Contributor Author

hcngac commented Nov 28, 2021

There is one thing for using numpy for data transfer: for ./run --config=configs/MNIST/mistnet_lenet5.yml, feature transfer size is 311.3 MB before, and 14.08 MB after. I have not verified the correctness or anything else for now, just this discovery.

@baochunli
Copy link
Collaborator

That's great to hear.

@hcngac
Copy link
Contributor Author

hcngac commented Nov 28, 2021

pytorch/pytorch#19408

Could be related, as the features transferred is a list of (torch.Tensor, torch.Tensor)

@baochunli
Copy link
Collaborator

baochunli commented Nov 28, 2021

The PR failed the YOLOv5 PyTorch test (./run --config=configs/YOLO/fedavg_yolov5.yml), showing the perils of changing too much codebase that has already been "field-tested."

@hcngac
Copy link
Contributor Author

hcngac commented Nov 28, 2021

Closing this PR and moving to the next step now

@hcngac hcngac closed this Nov 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants