Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the server state h? #3

Closed
wizard1203 opened this issue Sep 12, 2022 · 1 comment
Closed

About the server state h? #3

wizard1203 opened this issue Sep 12, 2022 · 1 comment

Comments

@wizard1203
Copy link

Thanks for your awesome works. I'm trying to re-implement your algorithm. But when I read the source code, I cannot find where is the server state $h^t$.

If I understand correctly, in this line https://github.com/alpemreacar/FedDyn/blob/48a19fac440ef079ce563da8e0c2896f8256fef9/utils_methods.py#L389, the local_param_list_curr is the local grad $\nabla L_k (\theta_k^t)$, cld_mdl_param_tensor is the global model parameter $\theta^{t-1}$,
In this line https://github.com/alpemreacar/FedDyn/blob/48a19fac440ef079ce563da8e0c2896f8256fef9/utils_methods.py#L397, the cld_mdl_param is the new global model parameter $\theta^t$, and it seems that the np.mean(local_param_list, axis=0) is the $-\frac{1}{\alpha} h^t$.

Thus, the code means that $h^t = -\frac{\alpha}{m} \sum_{k \in \left[ m \right] } \nabla L_k (\theta_k) $,
in which here I ignore the $t$ because the summation is conducted on all clients, we cannot know the timestamp of $\nabla L_k (\theta_k^t)$ because of randomly client selection.

So here the actual $h^t$ is not strictly calculated as
$h^t = h^{t-1} - \alpha \frac{1}{m} (\sum_{k\in {P}_t} \theta_k^t - \theta^{t-1} )$.

@alpemreacar
Copy link
Owner

Hi,

local_param_list corresponds to $-\frac{1}{\alpha}\nabla L_k(\theta_k^{t})$ as stated in L392.

While calculating the server model we just average them which corresponds to $-\frac{1}{\alpha}\frac{1}{m}\sum_k \nabla L_k(\theta_k^{t})$ which is as expected.

While updating individual terms, we do not need to include $\alpha$, we use $-\frac{1}{\alpha}\nabla L_k(\theta_k^{t})=-\frac{1}{\alpha}\nabla L_k(\theta_k^{t-1})+(\theta_k^t -\theta^{t-1})$.

In short, it is just a scaled version $-\frac{1}{\alpha}$.

Feel free to follow up if something is unclear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants