(New data partition strategy) Extended Dirichlet strategy by combining Pathological heterogeneous setting and Practical heterogeneous setting in pFL. #139

liyipeng00 · 2023-11-03T06:35:46Z

Recently, I find one new data partition strategy called Extended Dirichlet strategy ~~~ ours :), which could be added in this repo.

It combines the two common partition strategies (i.e., Quantity-based class imbalance and Diribution-based class imbalance in Li et al. (2022) or Pathological heterogeneous setting and Practical heterogeneous setting in zhang et al. (2023)) to generate arbitrarily heterogeneous data. The difference is to add a step of allocating classes (labels) to determine the number of classes per client (denoted by $C$) before allocating samples via Dirichlet distribution (with concentrate parameter $\alpha$).

The issue is from FedLab. The implementation is in convergence. You can find more details in Convergence Analysis of Sequential Federated Learning on Heterogeneous Data.
[Figure:
Row 1: $C=2$ with $\alpha=0.1$, $\alpha=1.0$, $\alpha=10.0$;
Row 2: $C=5$ with $\alpha=0.1$, $\alpha=1.0$, $\alpha=10.0$;
Row 3: $C=10$ with $\alpha=0.1$, $\alpha=1.0$, $\alpha=10.0$; ]

Li, Q., Diao, Y., Chen, Q., & He, B. (2022, May). Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE) (pp. 965-978). IEEE.

Zhang, J., Hua, Y., Wang, H., Song, T., Xue, Z., Ma, R., & Guan, H. (2023, June). FedALA: Adaptive local aggregation for personalized federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 9, pp. 11237-11244).

TsingZ0 · 2023-11-04T11:50:38Z

You can contribute to our project by submitting a pull request that adds the Extended Dirichlet strategy. We may add it when we have free time.

liyipeng00 · 2023-11-04T12:54:43Z

Thanks for your approval. I'm happy to contribute to this repo. Since I'm not familiar how to pull requests, it may cost some time. By the way, we find that the first implementation of Dir-Partition comes from "Bayesian nonparametric federated learning of neural networks", which could be clarified in the README.md.

liyipeng00 · 2023-11-06T07:43:35Z

^o^/, I have added ExDir successfully. I have only added some codes, so it is safe to add this strategy to the original code.

One example: MNIST, num_clients=10, num_classes=10, C=5 and alpha=100.0

Note that here we set min_require_size_per_label = max(C * num_clients // num_classes // 5, 1), so it can be expected that there are some clients whose number of labels is 4 (less than 5). You can set it bigger to satisfy your requirements, which may increase searching time in some cases.

TsingZ0 · 2023-11-12T05:38:03Z

Nice work! We will review it several weeks later, after the CVPR deadline.

liyipeng00 · 2023-11-12T08:48:53Z

Best of luck with your CVPR paper!

TsingZ0 · 2024-04-18T09:35:20Z

Sorry for the late reply due to my busy schedule. I only have time to check PR these days. Since PFLlib has moved forward with massive changes, your original PR is unable to be directly merged. Could you please update your PR to match the latest version? Thanks for your time!

liyipeng00 · 2024-04-19T07:39:43Z

Thanks for your approval. I have updated the pull request, with Extended Dirichlet strategy added. Feel free to change the code to meet the style of PFLlib, and just call me if issues appear.

python generate_MNIST.py noniid - exdir

I would be very grateful, if you could add some statements to introduce exdir in the README.md.

One simple example

This strategy combines the popular Dirichlet-based data partition strategy with Quantity-based class imbalance.

Thanks for your approval again.

TsingZ0 · 2024-04-19T09:06:19Z

Thank you for your update, I'll check it as soon as possible.

TsingZ0 · 2024-04-23T08:17:28Z

All done, please check it.

liyipeng00 · 2024-04-24T03:13:44Z

Thanks for your patience and kindness. I have checked it and have no further problems.

liyipeng00 mentioned this issue Nov 6, 2023

Update dataset_utils.py with Extended Dirichlet strategy added #141

Closed

TsingZ0 added the enhancement New feature or request label Apr 18, 2024

liyipeng00 mentioned this issue Apr 19, 2024

Add Extended Dirichlet strategy #185

Merged

TsingZ0 closed this as completed Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(New data partition strategy) Extended Dirichlet strategy by combining Pathological heterogeneous setting and Practical heterogeneous setting in pFL. #139

(New data partition strategy) Extended Dirichlet strategy by combining Pathological heterogeneous setting and Practical heterogeneous setting in pFL. #139

liyipeng00 commented Nov 3, 2023 •

edited

Loading

TsingZ0 commented Nov 4, 2023

liyipeng00 commented Nov 4, 2023 •

edited

Loading

liyipeng00 commented Nov 6, 2023 •

edited

Loading

TsingZ0 commented Nov 12, 2023

liyipeng00 commented Nov 12, 2023

TsingZ0 commented Apr 18, 2024

liyipeng00 commented Apr 19, 2024 •

edited

Loading

TsingZ0 commented Apr 19, 2024

TsingZ0 commented Apr 23, 2024

liyipeng00 commented Apr 24, 2024

(New data partition strategy) Extended Dirichlet strategy by combining Pathological heterogeneous setting and Practical heterogeneous setting in pFL. #139

(New data partition strategy) Extended Dirichlet strategy by combining Pathological heterogeneous setting and Practical heterogeneous setting in pFL. #139

Comments

liyipeng00 commented Nov 3, 2023 • edited Loading

TsingZ0 commented Nov 4, 2023

liyipeng00 commented Nov 4, 2023 • edited Loading

liyipeng00 commented Nov 6, 2023 • edited Loading

TsingZ0 commented Nov 12, 2023

liyipeng00 commented Nov 12, 2023

TsingZ0 commented Apr 18, 2024

liyipeng00 commented Apr 19, 2024 • edited Loading

TsingZ0 commented Apr 19, 2024

TsingZ0 commented Apr 23, 2024

liyipeng00 commented Apr 24, 2024

liyipeng00 commented Nov 3, 2023 •

edited

Loading

liyipeng00 commented Nov 4, 2023 •

edited

Loading

liyipeng00 commented Nov 6, 2023 •

edited

Loading

liyipeng00 commented Apr 19, 2024 •

edited

Loading