-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] Local Differential Privacy Methods #119
Comments
To combine with adding model/feature compression methods later, I also propose to implement it as a middleman or preprocessor interface that can accept a number of data preprocessing modules before data is sent between clients and server. It will also call for unifying the data transfer format between clients and server such that we don't have to implement the same LDP algorithm on different data format e.g. PyTorch/Tensorflow/NumPy. |
Challenging to do but makes sense. |
In investigation of the data transferred, we can standardise two types of data format:
The |
A new class called Two new config entries can be added for both client and server, one for data receiving and one for sending. Each entry should be a list of the names of For example when a client receives a piece of data from the server, the data is first processed by the list of For example when a client is ready to send a piece of data, the data is first processed by the list of Same can be said for the server. |
A serializer/deserializer |
I like the design of the |
Individual client can have different privacy requirement and might opt to set their own LDP parameter i.e. epsilon. Is the performance of federated learning model under varying privacy parameter a valid research question? Support for variable privacy parameter for individual clients? Or a randomised privacy parameter distribution for clients? |
Fact is, even the performance of federated learning with the same set of parameters across all clients is not well understood. |
I briefly studied the other two LDP methods, they seem to be applied to statistical data e.g. counters and frequencies. I cannot determine if they are suitable, and how can they be applied to the FL scenario. |
Beyond randomized response (using unary encoding, which is implemented already in Plato), one can implement other, more mainstream mechanisms of differential privacy. These include (but not limited to) the Laplace mechanism, the Gaussian mechanism, the Exponential mechanism_, and the sparse vector technique. More detailed descriptions of these differential privacy mechanisms can be found here: Programming Differential Privacy These mechanisms should all be fairly easy to implement. For example, the Laplace mechanism involves simply calling the function For a more detailed coverage, you can also find a comparison between randomized response and the Laplace mechanism in this research paper: |
Not sure about the applicability of exponential mechanism and sparse vector technique as exponential mechanism is more suitable for querying for a discrete/finite result e.g. actual inferencing, and sparse vector technique seems to be applied to finding a query that has result above the threshold. Not exactly a data feeding process into the model training. Also, is applying clipping to features and model parameters make sense? I assume its ok for features, but not sure for model parameters. |
No, I agree that the exponential mechanism and sparse vector technique do not make much sense. We actually don't apply Gaussian or Laplacian mechanisms on model parameters — we apply them on gradients during training inside the training loop (where clipping is done as well). So some of your recent code needs to be redone. I am working on this in the 'gradient_dp' branch. |
Is your feature request related to a problem? Please describe.
Currently there is only one implementation of local differential privacy (LDP): RAPPOR[1], implemented in https://github.com/TL-System/plato/blob/main/plato/utils/unary_encoding.py and it is not decoupled with algorithm implementation.
plato/plato/algorithms/mistnet.py
Lines 52 to 64 in fac44a6
plato/plato/algorithms/mindspore/mistnet.py
Lines 44 to 48 in fac44a6
plato/examples/nnrt/nnrt_algorithms/mistnet.py
Lines 60 to 65 in fac44a6
This feature request calls for a modular LDP plugin interface and a number of different other methods e.g. [2][3]
Describe the solution you'd like
Unified data exchange format between clients and server.Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
To be filled.
Additional context
Add any other context or screenshots about the feature request here.
[1] Ú. Erlingsson, V. Pihur, and A. Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pages 1054–1067. ACM, 2014.
[2] Differential Privacy Team, Apple. Learning with privacy at scale. 2017.
[3] B. Ding, J. Kulkarni, and S. Yekhanin. Collecting telemetry data privately. In Advances in Neural Information Processing Systems 30, December 2017.
The text was updated successfully, but these errors were encountered: