Dp scaffold #48

jewelltaylor · 2023-08-09T01:56:24Z

PR Type

[Feature | Fix | Documentation | Other() ]

Short Description

Adding DP-SCAFFOLD, a variant of the SCAFFOLD method with instance level differential privacy guarantees against the server or a third party with access to the final model. As part of this, I also extended SCAFFOLD (client and server) to include the option for warm initialization of control variates. This is to stay consistent with the DP-SCAFFOLD paper and official implementation. In both cases, when using warm initialization and not, DP-SCAFFOLD offers the same privacy guarantees as DP-FedAvg. For details of the privacy analysis refer to section 4 in the paper and section B in the supplementary materials.

I also created an instance level privacy client where we take care of the opacus setup under the hood. Right now computing the privacy loss requires manually specifying the number of samples per client and total data size. I have added a ticket to take advantage of the functionality we have built out to fetch client sample counts automatically.

Tests Added

A few tests for ScaffoldClient, DPScaffoldClient, Scaffold Strategy and InstanceLevelPrivacyClient methods

… server weights.

fl4health/privacy/fl_accountants.py

examples/dp_scaffold_example/server.py

fl4health/clients/scaffold_client.py

examples/fedprox_example/server.py

fl4health/server/scaffold_server.py

emersodb · 2023-08-15T11:56:42Z

I have what is probably a silly question, because I haven't read the DP-scaffold paper in a lot of detail. Do they also use Opacus to do their DP optimization? If so that's great. I just want to make sure our implementation matches theirs. Computing the covariates using Opacus is a bit cloudy for me, in the sense that, I understand what's stored in the parameter grads after a single batch backwards pass, but is the same accumulation stored in them after Opacus does single point gradients for all items in a batch? I would say probably but I'm not 100% sure.

fl4health/privacy/fl_accountants.py

fl4health/server/scaffold_server.py

jewelltaylor · 2023-08-15T13:24:16Z

I have what is probably a silly question, because I haven't read the DP-scaffold paper in a lot of detail. Do they also use Opacus to do their DP optimization? If so that's great. I just want to make sure our implementation matches theirs. Computing the covariates using Opacus is a bit cloudy for me, in the sense that, I understand what's stored in the parameter grads after a single batch backwards pass, but is the same accumulation stored in them after Opacus does single point gradients for all items in a batch? I would say probably but I'm not 100% sure.

They do not use opacus. They implement the DP by hand: https://github.com/maxencenoble/Differential-Privacy-for-Heterogeneous-Federated-Learning/blob/ecad8acb687b974ee917c2cb27515e913ace4d47/flearn/users/user_avg.py#L103. I opted to use opacus to stay consistent with our existing implementation. I assumed that opacus is accumulating across per sample gradient, the only difference from a regular optimizer is that per sample clipping and noise is being added. I just read through this medium article put out by opacus with some additional insight into what they are doing under the hood. What do you think?

emersodb · 2023-08-15T14:51:31Z

I have what is probably a silly question, because I haven't read the DP-scaffold paper in a lot of detail. Do they also use Opacus to do their DP optimization? If so that's great. I just want to make sure our implementation matches theirs. Computing the covariates using Opacus is a bit cloudy for me, in the sense that, I understand what's stored in the parameter grads after a single batch backwards pass, but is the same accumulation stored in them after Opacus does single point gradients for all items in a batch? I would say probably but I'm not 100% sure.

They do not use opacus. They implement the DP by hand: https://github.com/maxencenoble/Differential-Privacy-for-Heterogeneous-Federated-Learning/blob/ecad8acb687b974ee917c2cb27515e913ace4d47/flearn/users/user_avg.py#L103. I opted to use opacus to stay consistent with our existing implementation. I assumed that opacus is accumulating across per sample gradient, the only difference from a regular optimizer is that per sample clipping and noise is being added. I just read through this medium article put out by opacus with some additional insight into what they are doing under the hood. What do you think?

If you're confident in it I'm good with that! Implementing our own would not be fun and Opacus allows for the enforcement of a lot of things, including replacing layers that do not admit DP, like batch norms. I just wanted to double check that we'd thought it through 🙂

jewelltaylor · 2023-08-15T17:17:52Z

I have what is probably a silly question, because I haven't read the DP-scaffold paper in a lot of detail. Do they also use Opacus to do their DP optimization? If so that's great. I just want to make sure our implementation matches theirs. Computing the covariates using Opacus is a bit cloudy for me, in the sense that, I understand what's stored in the parameter grads after a single batch backwards pass, but is the same accumulation stored in them after Opacus does single point gradients for all items in a batch? I would say probably but I'm not 100% sure.

They do not use opacus. They implement the DP by hand: https://github.com/maxencenoble/Differential-Privacy-for-Heterogeneous-Federated-Learning/blob/ecad8acb687b974ee917c2cb27515e913ace4d47/flearn/users/user_avg.py#L103. I opted to use opacus to stay consistent with our existing implementation. I assumed that opacus is accumulating across per sample gradient, the only difference from a regular optimizer is that per sample clipping and noise is being added. I just read through this medium article put out by opacus with some additional insight into what they are doing under the hood. What do you think?

If you're confident in it I'm good with that! Implementing our own would not be fun and Opacus allows for the enforcement of a lot of things, including replacing layers that do not admit DP, like batch norms. I just wanted to double check that we'd thought it through 🙂

Yeah I am pretty confident! Just to be sure, I added and took on a small ticket in the backlog to explore this further. I thought it may be good to explore outside of this PR because it will involve a bit of an opacus deep dive to be absolutely sure

emersodb

All the changes look good to me. The tests were a great add along with some of the modularity etc.

John Jewell and others added 8 commits July 27, 2023 16:38

First pass of adding DP Scaffold with example

415cd46

Add warm start option for scaffold

7b3f193

Use instance level privacy client in instance level dp example

cda9398

Add test on instance level privacy client. Clean up fixture a bit.

5175056

Add scaffold stategy tests for computing updated control variates and…

575c674

… server weights.

Add simple test for DPScaffoldClient

23f7f60

Add privacy accounting to dp scaffold example

8f72d17

Update readme for dp scaffold example

1248e79

jewelltaylor commented Aug 9, 2023

View reviewed changes

fl4health/privacy/fl_accountants.py Outdated Show resolved Hide resolved

jewelltaylor commented Aug 9, 2023

View reviewed changes

examples/dp_scaffold_example/server.py Show resolved Hide resolved

jewelltaylor commented Aug 9, 2023

View reviewed changes

examples/dp_scaffold_example/server.py Show resolved Hide resolved

jewelltaylor added 2 commits August 9, 2023 11:16

Merge branch 'main' into dp-scaffold

041f078

Update to remove unusused imports

fcf088f

jewelltaylor requested review from emersodb, fatemetkl, sanaAyrml and yuchongzhang August 9, 2023 15:36