# **Sharing a Fixed Partition of Participants' Data**

Let us assume the case that fellow participants *cannot be trusted*. In this setup, each participants has to share exactly the same data each time.

To model this we can create datasets as follows:
1. We begin with processing `non-iid50` data.
2. In this dataset, each participant has approx. 5420 samples.
3. Our goal: every participant shall have at least approx. 542 samples from the 10 digits.
4. Originally, they have approx. 301 samples/digits, so:

> $301+9x = 542$
>
> $x \approxeq 27$

5. We have to share at least 27 samples from each participants to create the uniform distributions.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
NON_IID50_PATH = "../../data/participants/non_iid50/" # input
FIXED_SHARED_FILE = "../../data/participants/fix_shared/shared.csv" #output
SAMPLES_TO_SHARE = 27 #num. samples/digit/clients

## Shared data portion

In [3]:
shared_df = None

for partip in range(10):
    participant_df = pd.read_csv(NON_IID50_PATH+"participant%d.csv"%partip)
    for digit in range(10):
        digit_df = participant_df[participant_df["label"] == digit].sample(n=SAMPLES_TO_SHARE).reset_index(drop=True)
        shared_df = pd.concat([shared_df, digit_df])

In [11]:
shared_df.head()

Unnamed: 0,label,1x1,1x2,1x3,1x4,1x5,1x6,1x7,1x8,1x9,...,28x19,28x20,28x21,28x22,28x23,28x24,28x25,28x26,28x27,28x28
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [5]:
shared_df.to_csv(FIXED_SHARED_FILE, index=False)