[FR] Local Differential Privacy Methods #119

hcngac · 2021-11-24T17:44:54Z

Is your feature request related to a problem? Please describe.
Currently there is only one implementation of local differential privacy (LDP): RAPPOR[1], implemented in https://github.com/TL-System/plato/blob/main/plato/utils/unary_encoding.py and it is not decoupled with algorithm implementation.

plato/plato/algorithms/mistnet.py

Lines 52 to 64 in fac44a6

    
           _randomize = getattr(self.trainer, "randomize", None) 
        
           for inputs, targets, *__ in data_loader: 
        
               with torch.no_grad(): 
        
                   logits = self.model.forward_to(inputs, cut_layer) 
        
                   if epsilon is not None: 
        
                       logits = logits.detach().numpy() 
        
                       logits = unary_encoding.encode(logits) 
        
                       if callable(_randomize): 
        
                           logits = self.trainer.randomize( 
        
                               logits, targets, epsilon) 
        
                       else: 
        
                           logits = unary_encoding.randomize(logits, epsilon)

plato/plato/algorithms/mindspore/mistnet.py

Lines 44 to 48 in fac44a6

    
           if epsilon is not None: 
        
               logits = logits.asnumpy() 
        
               logits = unary_encoding.encode(logits) 
        
               logits = unary_encoding.randomize(logits, epsilon) 
        
               logits = mindspore.Tensor(logits.astype('float32'))

plato/examples/nnrt/nnrt_algorithms/mistnet.py

Lines 60 to 65 in fac44a6

    
           if epsilon is not None: 
        
               logits = unary_encoding.encode(logits) 
        
               if callable(_randomize): 
        
                   logits = self.trainer.randomize(logits, targets, epsilon) 
        
               else: 
        
                   logits = unary_encoding.randomize(logits, epsilon)

This feature request calls for a modular LDP plugin interface and a number of different other methods e.g. [2][3]

Describe the solution you'd like

~~Unified data exchange format between clients and server.~~
A modular interface for plugging in data processing modules into the server-client data exchange.
A config entry for enabling specific data processing modules.
LDP modules implementation.
Test on the theoretical property of modules i.e. ε-LDP

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
To be filled.

Additional context
Add any other context or screenshots about the feature request here.
[1] Ú. Erlingsson, V. Pihur, and A. Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pages 1054–1067. ACM, 2014.
[2] Differential Privacy Team, Apple. Learning with privacy at scale. 2017.
[3] B. Ding, J. Kulkarni, and S. Yekhanin. Collecting telemetry data privately. In Advances in Neural Information Processing Systems 30, December 2017.

hcngac · 2021-11-24T17:47:23Z

To combine with adding model/feature compression methods later, I also propose to implement it as a middleman or preprocessor interface that can accept a number of data preprocessing modules before data is sent between clients and server. It will also call for unifying the data transfer format between clients and server such that we don't have to implement the same LDP algorithm on different data format e.g. PyTorch/Tensorflow/NumPy.

baochunli · 2021-11-24T18:03:38Z

Challenging to do but makes sense.

hcngac · 2021-11-26T01:31:18Z

In investigation of the data transferred, we can standardise two types of data format:

features, in the form of a list of numpy arrays, with each numpy array represents a feature extracted from an input.
model parameters, in the form of a ordered dict of (layer name) to (numpy array of the parameters of that layer).

The Algorithm class should be responsible in providing the formatting to and from the framework format into our standard format in the method extract_weights, load_weights and extract_features. Loading of features in servers can be implemented in the feature.DataSource class.

hcngac · 2021-11-26T01:40:26Z

A new class called DataProcessor is proposed. Each DataProcessor class should have a process method that processes the data.

Two new config entries can be added for both client and server, one for data receiving and one for sending. Each entry should be a list of the names of DataProcessor classes in the order of processor application.

For example when a client receives a piece of data from the server, the data is first processed by the list of DataProcessor in the order of the config list., and then passed to the client's remaining handling.

For example when a client is ready to send a piece of data, the data is first processed by the list of DataProcessor in the order of the config list., and then sent out to the server.

Same can be said for the server.

hcngac · 2021-11-26T01:45:49Z

A serializer/deserializer DataProcessor class can also be mandated as the last/first DataProcessor to provide different transfer encoding than Python pickle.

baochunli · 2021-11-26T04:01:58Z

I like the design of the DataProcessor class. For now, however, I think we should continue to use pickle for data transfers, as this is challenging to replace -- it would require changing a lot of existing data transfer code that is field tested.

hcngac · 2021-11-29T17:34:02Z

Individual client can have different privacy requirement and might opt to set their own LDP parameter i.e. epsilon. Is the performance of federated learning model under varying privacy parameter a valid research question?

Support for variable privacy parameter for individual clients? Or a randomised privacy parameter distribution for clients?

baochunli · 2021-11-29T19:02:44Z

Fact is, even the performance of federated learning with the same set of parameters across all clients is not well understood.

hcngac · 2021-12-01T22:45:34Z

I briefly studied the other two LDP methods, they seem to be applied to statistical data e.g. counters and frequencies.

I cannot determine if they are suitable, and how can they be applied to the FL scenario.

baochunli · 2021-12-02T20:55:36Z

Beyond randomized response (using unary encoding, which is implemented already in Plato), one can implement other, more mainstream mechanisms of differential privacy. These include (but not limited to) the Laplace mechanism, the Gaussian mechanism, the Exponential mechanism_, and the sparse vector technique. More detailed descriptions of these differential privacy mechanisms can be found here:

Programming Differential Privacy

These mechanisms should all be fairly easy to implement. For example, the Laplace mechanism involves simply calling the function np.random.laplace() in Python's numpy framework.

For a more detailed coverage, you can also find a comparison between randomized response and the Laplace mechanism in this research paper:

http://ceur-ws.org/Vol-1558/paper35.pdf

hcngac · 2021-12-09T23:00:35Z

Not sure about the applicability of exponential mechanism and sparse vector technique as exponential mechanism is more suitable for querying for a discrete/finite result e.g. actual inferencing, and sparse vector technique seems to be applied to finding a query that has result above the threshold. Not exactly a data feeding process into the model training.

Also, is applying clipping to features and model parameters make sense? I assume its ok for features, but not sure for model parameters.

baochunli · 2021-12-10T00:56:03Z

No, I agree that the exponential mechanism and sparse vector technique do not make much sense.

We actually don't apply Gaussian or Laplacian mechanisms on model parameters — we apply them on gradients during training inside the training loop (where clipping is done as well). So some of your recent code needs to be redone. I am working on this in the 'gradient_dp' branch.

hcngac added the enhancement New feature or request label Nov 24, 2021

hcngac assigned baochunli Nov 24, 2021

hcngac mentioned this issue Nov 24, 2021

Add general data transfer protocol between client and server [FR] #108

Closed

hcngac mentioned this issue Nov 28, 2021

Unifying data transfer with numpy array #121

Closed

6 tasks

baochunli closed this as completed Jan 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR] Local Differential Privacy Methods #119

[FR] Local Differential Privacy Methods #119

hcngac commented Nov 24, 2021 •

edited

Loading

hcngac commented Nov 24, 2021 •

edited

Loading

baochunli commented Nov 24, 2021

hcngac commented Nov 26, 2021 •

edited

Loading

hcngac commented Nov 26, 2021

hcngac commented Nov 26, 2021

baochunli commented Nov 26, 2021 •

edited

Loading

hcngac commented Nov 29, 2021

baochunli commented Nov 29, 2021

hcngac commented Dec 1, 2021 •

edited

Loading

baochunli commented Dec 2, 2021

hcngac commented Dec 9, 2021

baochunli commented Dec 10, 2021

[FR] Local Differential Privacy Methods #119

[FR] Local Differential Privacy Methods #119

Comments

hcngac commented Nov 24, 2021 • edited Loading

hcngac commented Nov 24, 2021 • edited Loading

baochunli commented Nov 24, 2021

hcngac commented Nov 26, 2021 • edited Loading

hcngac commented Nov 26, 2021

hcngac commented Nov 26, 2021

baochunli commented Nov 26, 2021 • edited Loading

hcngac commented Nov 29, 2021

baochunli commented Nov 29, 2021

hcngac commented Dec 1, 2021 • edited Loading

baochunli commented Dec 2, 2021

hcngac commented Dec 9, 2021

baochunli commented Dec 10, 2021

hcngac commented Nov 24, 2021 •

edited

Loading

hcngac commented Nov 24, 2021 •

edited

Loading

hcngac commented Nov 26, 2021 •

edited

Loading

baochunli commented Nov 26, 2021 •

edited

Loading

hcngac commented Dec 1, 2021 •

edited

Loading