Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(New data partition strategy) Extended Dirichlet strategy by combining Pathological heterogeneous setting and Practical heterogeneous setting in pFL. #139

Closed
liyipeng00 opened this issue Nov 3, 2023 · 10 comments
Labels
enhancement New feature or request

Comments

@liyipeng00
Copy link

liyipeng00 commented Nov 3, 2023

Recently, I find one new data partition strategy called Extended Dirichlet strategy ~~~ ours :), which could be added in this repo.

It combines the two common partition strategies (i.e., Quantity-based class imbalance and Diribution-based class imbalance in Li et al. (2022) or Pathological heterogeneous setting and Practical heterogeneous setting in zhang et al. (2023)) to generate arbitrarily heterogeneous data. The difference is to add a step of allocating classes (labels) to determine the number of classes per client (denoted by $C$) before allocating samples via Dirichlet distribution (with concentrate parameter $\alpha$).

The issue is from FedLab. The implementation is in convergence. You can find more details in Convergence Analysis of Sequential Federated Learning on Heterogeneous Data.
[Figure:
Row 1: $C=2$ with $\alpha=0.1$, $\alpha=1.0$, $\alpha=10.0$;
Row 2: $C=5$ with $\alpha=0.1$, $\alpha=1.0$, $\alpha=10.0$;
Row 3: $C=10$ with $\alpha=0.1$, $\alpha=1.0$, $\alpha=10.0$; ]

Li, Q., Diao, Y., Chen, Q., & He, B. (2022, May). Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE) (pp. 965-978). IEEE.

Zhang, J., Hua, Y., Wang, H., Song, T., Xue, Z., Ma, R., & Guan, H. (2023, June). FedALA: Adaptive local aggregation for personalized federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 9, pp. 11237-11244).

@TsingZ0
Copy link
Owner

TsingZ0 commented Nov 4, 2023

You can contribute to our project by submitting a pull request that adds the Extended Dirichlet strategy. We may add it when we have free time.

@liyipeng00
Copy link
Author

liyipeng00 commented Nov 4, 2023

Thanks for your approval. I'm happy to contribute to this repo. Since I'm not familiar how to pull requests, it may cost some time. By the way, we find that the first implementation of Dir-Partition comes from "Bayesian nonparametric federated learning of neural networks", which could be clarified in the README.md.

@liyipeng00
Copy link
Author

liyipeng00 commented Nov 6, 2023

^o^/, I have added ExDir successfully. I have only added some codes, so it is safe to add this strategy to the original code.

One example: MNIST, num_clients=10, num_classes=10, C=5 and alpha=100.0

Note that here we set min_require_size_per_label = max(C * num_clients // num_classes // 5, 1), so it can be expected that there are some clients whose number of labels is 4 (less than 5). You can set it bigger to satisfy your requirements, which may increase searching time in some cases.

image

@TsingZ0
Copy link
Owner

TsingZ0 commented Nov 12, 2023

Nice work! We will review it several weeks later, after the CVPR deadline.

@liyipeng00
Copy link
Author

Best of luck with your CVPR paper!

@TsingZ0
Copy link
Owner

TsingZ0 commented Apr 18, 2024

Sorry for the late reply due to my busy schedule. I only have time to check PR these days. Since PFLlib has moved forward with massive changes, your original PR is unable to be directly merged. Could you please update your PR to match the latest version? Thanks for your time!

@TsingZ0 TsingZ0 added the enhancement New feature or request label Apr 18, 2024
@liyipeng00
Copy link
Author

liyipeng00 commented Apr 19, 2024

Thanks for your approval. I have updated the pull request, with Extended Dirichlet strategy added. Feel free to change the code to meet the style of PFLlib, and just call me if issues appear.

python generate_MNIST.py noniid - exdir

I would be very grateful, if you could add some statements to introduce exdir in the README.md.

One simple example

This strategy combines the popular Dirichlet-based data partition strategy with Quantity-based class imbalance.

Thanks for your approval again.

@TsingZ0
Copy link
Owner

TsingZ0 commented Apr 19, 2024

Thank you for your update, I'll check it as soon as possible.

@TsingZ0
Copy link
Owner

TsingZ0 commented Apr 23, 2024

All done, please check it.

@liyipeng00
Copy link
Author

Thanks for your patience and kindness. I have checked it and have no further problems.

@TsingZ0 TsingZ0 closed this as completed Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants