Skip to content

The demo code of creating non iid dataset based on dirichelet distribution. Use Pytorch Framework

Notifications You must be signed in to change notification settings

dixiyao/Create-Non-IID-dataset-torch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Create Non IID dataset

Introduction

This is a demo code about how to creating non iid datasets in federated learning, using Dirichelet distribution. It is a common way to use Dirichelet distribution process to construct non iid datasets with centralized dataset, e.g. Cifar10. Dir(α),smaller α represents stronger non iid.

This code is used for generating non iid dataset in paper

Dixi Yao*, Lingdong Wang*, Jiayu Xu, Liyao Xiang, Shuo Shao, Yingqi Chen, Yanjun Tong.
Federated Model Search via Reinforcement Learning
International Conference on Distributed Computing Systems 2021.

Paper

Requirements

numpy
torch

Usage

function partition_data in the file noniid.py, direcly appoints the location of data dir and generate a list of dataloaders, each data loader for each client in FL. A sample usage

X_train, y_train, X_test, y_test, net_dataidx_map, traindata_cls_counts = partition_data('./data',10)

Further Improvement

Currently, the size of the datasets among each client is random or averaged. In the acutal Federated Learning Setting, sometimes the dataset size among clients are different. Further imporvement will be about how to appoint the size of dataset on each client.

About

The demo code of creating non iid dataset based on dirichelet distribution. Use Pytorch Framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages