Skip to content

Conversation

@jewelltaylor
Copy link
Contributor

PR Type

[Feature | Fix | Documentation | Other() ]

Short Description

Adding DP-SCAFFOLD, a variant of the SCAFFOLD method with instance level differential privacy guarantees against the server or a third party with access to the final model. As part of this, I also extended SCAFFOLD (client and server) to include the option for warm initialization of control variates. This is to stay consistent with the DP-SCAFFOLD paper and official implementation. In both cases, when using warm initialization and not, DP-SCAFFOLD offers the same privacy guarantees as DP-FedAvg. For details of the privacy analysis refer to section 4 in the paper and section B in the supplementary materials.

I also created an instance level privacy client where we take care of the opacus setup under the hood. Right now computing the privacy loss requires manually specifying the number of samples per client and total data size. I have added a ticket to take advantage of the functionality we have built out to fetch client sample counts automatically.

Tests Added

  • A few tests for ScaffoldClient, DPScaffoldClient, Scaffold Strategy and InstanceLevelPrivacyClient methods

@emersodb
Copy link
Collaborator

I have what is probably a silly question, because I haven't read the DP-scaffold paper in a lot of detail. Do they also use Opacus to do their DP optimization? If so that's great. I just want to make sure our implementation matches theirs. Computing the covariates using Opacus is a bit cloudy for me, in the sense that, I understand what's stored in the parameter grads after a single batch backwards pass, but is the same accumulation stored in them after Opacus does single point gradients for all items in a batch? I would say probably but I'm not 100% sure.

@jewelltaylor
Copy link
Contributor Author

I have what is probably a silly question, because I haven't read the DP-scaffold paper in a lot of detail. Do they also use Opacus to do their DP optimization? If so that's great. I just want to make sure our implementation matches theirs. Computing the covariates using Opacus is a bit cloudy for me, in the sense that, I understand what's stored in the parameter grads after a single batch backwards pass, but is the same accumulation stored in them after Opacus does single point gradients for all items in a batch? I would say probably but I'm not 100% sure.

They do not use opacus. They implement the DP by hand: https://github.com/maxencenoble/Differential-Privacy-for-Heterogeneous-Federated-Learning/blob/ecad8acb687b974ee917c2cb27515e913ace4d47/flearn/users/user_avg.py#L103. I opted to use opacus to stay consistent with our existing implementation. I assumed that opacus is accumulating across per sample gradient, the only difference from a regular optimizer is that per sample clipping and noise is being added. I just read through this medium article put out by opacus with some additional insight into what they are doing under the hood. What do you think?

@emersodb
Copy link
Collaborator

I have what is probably a silly question, because I haven't read the DP-scaffold paper in a lot of detail. Do they also use Opacus to do their DP optimization? If so that's great. I just want to make sure our implementation matches theirs. Computing the covariates using Opacus is a bit cloudy for me, in the sense that, I understand what's stored in the parameter grads after a single batch backwards pass, but is the same accumulation stored in them after Opacus does single point gradients for all items in a batch? I would say probably but I'm not 100% sure.

They do not use opacus. They implement the DP by hand: https://github.com/maxencenoble/Differential-Privacy-for-Heterogeneous-Federated-Learning/blob/ecad8acb687b974ee917c2cb27515e913ace4d47/flearn/users/user_avg.py#L103. I opted to use opacus to stay consistent with our existing implementation. I assumed that opacus is accumulating across per sample gradient, the only difference from a regular optimizer is that per sample clipping and noise is being added. I just read through this medium article put out by opacus with some additional insight into what they are doing under the hood. What do you think?

If you're confident in it I'm good with that! Implementing our own would not be fun and Opacus allows for the enforcement of a lot of things, including replacing layers that do not admit DP, like batch norms. I just wanted to double check that we'd thought it through 🙂

@jewelltaylor
Copy link
Contributor Author

I have what is probably a silly question, because I haven't read the DP-scaffold paper in a lot of detail. Do they also use Opacus to do their DP optimization? If so that's great. I just want to make sure our implementation matches theirs. Computing the covariates using Opacus is a bit cloudy for me, in the sense that, I understand what's stored in the parameter grads after a single batch backwards pass, but is the same accumulation stored in them after Opacus does single point gradients for all items in a batch? I would say probably but I'm not 100% sure.

They do not use opacus. They implement the DP by hand: https://github.com/maxencenoble/Differential-Privacy-for-Heterogeneous-Federated-Learning/blob/ecad8acb687b974ee917c2cb27515e913ace4d47/flearn/users/user_avg.py#L103. I opted to use opacus to stay consistent with our existing implementation. I assumed that opacus is accumulating across per sample gradient, the only difference from a regular optimizer is that per sample clipping and noise is being added. I just read through this medium article put out by opacus with some additional insight into what they are doing under the hood. What do you think?

If you're confident in it I'm good with that! Implementing our own would not be fun and Opacus allows for the enforcement of a lot of things, including replacing layers that do not admit DP, like batch norms. I just wanted to double check that we'd thought it through 🙂

Yeah I am pretty confident! Just to be sure, I added and took on a small ticket in the backlog to explore this further. I thought it may be good to explore outside of this PR because it will involve a bit of an opacus deep dive to be absolutely sure

@emersodb emersodb self-requested a review August 15, 2023 17:28
Copy link
Collaborator

@emersodb emersodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the changes look good to me. The tests were a great add along with some of the modularity etc.

@jewelltaylor jewelltaylor merged commit 2382cf5 into main Aug 15, 2023
@jewelltaylor jewelltaylor deleted the dp-scaffold branch August 15, 2023 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants