# CIFAR100

Loads a federated version of the CIFAR-100 dataset. The dataset is downloaded and cached locally. If previously downloaded, it tries to load the dataset from cache. The dataset is derived from the [CIFAR-100 dataset](https://www.cs.toronto.edu/~kriz/cifar.html). The training and testing examples are partitioned across 500 and 100 clients (respectively). No clients share any data samples, so it is a true partition of CIFAR-100. The train clients have string client IDs in the range [0-499], while the test clients have string client IDs in the range [0-99]. The train clients form a true partition of the CIFAR-100 training split, while the test clients form a true partition of the CIFAR-100 testing split. The data partitioning is done using a hierarchical Latent Dirichlet Allocation
(LDA) process, referred to as the [Pachinko Allocation Method](https://people.cs.umass.edu/~mccallum/papers/pam-icml06.pdf) (PAM). This method uses a two-stage LDA process, where each client has an associated multinomial distribution over the coarse labels of CIFAR-100, and a coarse-to-fine label multinomial distribution for that coarse label over the labels under that coarse label. The coarse label multinomial is drawn from a symmetric Dirichlet with parameter 0.1, and each coarse-to-fine multinomial distribution is drawn from a symmetric Dirichlet with parameter 10. Each client has 100 samples. To generate a sample for the client, we first select a coarse label by drawing from the coarse label multinomial distribution, and then draw a fine label using the coarse-to-fine multinomial distribution. We then randomly draw a sample from CIFAR-100 with that label (without replacement). If this exhausts the set of samples with this label, we remove the label from the coarse-to-fine multinomial and re-normalize the multinomial distribution.

Data set sizes:

- train: 50,000 examples
- test: 10,000 examples

## Data Download

In [None]:
!cd ../benchmark/datasets/cifar100 && mkdir -pv data/raw 
!cd ../benchmark/datasets/cifar100/data/raw && wget --no-check-certificate --no-proxy https://fedml.s3-us-west-1.amazonaws.com/fed_cifar100.tar.bz2
!cd ../benchmark/datasets.cifar100/data/raw && tar -xvf fed_cifar100.tar.bz2 && rm fed_cifar100.tar.bz2

### Valid Dataset

In [3]:
from benchmark.datasets.cifar100 import get_cifar100
dataset = get_cifar100('../benchmark/datasets/cifar100/data')
print(dataset)
x, y = dataset[0]
print(x.shape, y.shape)

CIFAR100(total_parts: 500, total_samples: <bound method CIFAR100.total_samples of <benchmark.datasets.cifar100.cifar100.CIFAR100 object at 0x7fa8d8a266d0>>, current_parts: 0)
torch.Size([3, 24, 24]) torch.Size([])


## FedAvg

In [2]:
!python -m openfed.tools.simulator --nproc 11  --logdir /tmp ../main.py\
    --task cifar100\
    --data_root ../benchmark/datasets/cifar100/data\
    --epochs 1\
    --rounds 4000\
    --act_clts 10\
    --tst_act_clts 10\
    --max_acg_step -1\
    --optim fedavg\
    --optim_args momentum:0.9 weight_decay:1e-4\
    --co_lr 0.05\
    --ag_lr 1.0\
    --bz 10\
    --gpu\
    --log_dir logs\
    --seed 0

Note: Stdout and stderr for collaborator-1 will be written to /tmp/openfed_node_collaborator-1_stdout, /tmp/openfed_node_collaborator-1_stderr respectively.
Note: Stdout and stderr for collaborator-2 will be written to /tmp/openfed_node_collaborator-2_stdout, /tmp/openfed_node_collaborator-2_stderr respectively.
Note: Stdout and stderr for collaborator-3 will be written to /tmp/openfed_node_collaborator-3_stdout, /tmp/openfed_node_collaborator-3_stderr respectively.
Note: Stdout and stderr for collaborator-4 will be written to /tmp/openfed_node_collaborator-4_stdout, /tmp/openfed_node_collaborator-4_stderr respectively.
Note: Stdout and stderr for collaborator-5 will be written to /tmp/openfed_node_collaborator-5_stdout, /tmp/openfed_node_collaborator-5_stderr respectively.
Note: Stdout and stderr for collaborator-6 will be written to /tmp/openfed_node_collaborator-6_stdout, /tmp/openfed_node_collaborator-6_stderr respectively.
Note: Stdout and stderr for collaborator-7 will be written

>>> Register hooks...
	Train Part: 500
	Activated Train Part: 10
	Test Part: 100
	Activated Test Part: 10
train: (4.58, 9.10%) test: (6.69, 0.20%):   0%| | 2/4000 [05:19<176:51:09, 159.2^C
Traceback (most recent call last):
  File "../main.py", line 367, in <module>
Killing subprocess 49860
Killing subprocess 49861
Killing subprocess 49862
Killing subprocess 49863
Killing subprocess 49864
Killing subprocess 49865
Killing subprocess 49866
Killing subprocess 49867
Killing subprocess 49868
Killing subprocess 49869
Killing subprocess 49870
Main process received SIGINT, exiting
