# How to use `Partitioner`s

The aim of this tutorial is to make you familiar with the available `Partitioner`s that `Flower Datasets` have out-of-the-box.

# What is `Partitioner`?

`Partitioner` is an object responsible for dividing a dataset according to a chosen strategy. There are many `Partitioner`s that you can use and all of them inherit from the `Partitioner` object which is an abstract class providing basic structure and methods that need to be implemented for any new `Partitioner` to integrate with the rest of `Flower Datasets` code. The creation of different `Partitioner` differs but the behavior is the same = they produce the same type of objects.



## IidPartitioner Creation

Let's create (instantiate) the most basic partitioner, `IidPartitioner` and learn how it interacts with `FederatedDataset`.

In [None]:
from flwr_datasets.partitioner import IidPartitioner

partitioner = IidPartitioner(num_partitions=10)

Right now the partitioner does not have access to any data therefore it has nothing to partition. `FederatedDataset` is responsible for assigning data to a `partitioner`(s).

What **part** of the data is assigned to partitioner?

In centralized (traditional) ML, there exist a strong concept of the splits of the dataset. Typically you can hear about train/valid/test splits. In FL research, if we don't have an already divided datasets (e.g. by `user_id`), we simulate such division of a centralized dataset. The goal of that operation is to resemble an FL scenario where the data is spread around the clients. In Flower Datasets you decide what split of the dataset will be partitioned. You can also resplit the datasets such that you use a more non-custom split, or merge the whole train and test split into a single dataset but that's not a part of this tutorial.

Let's see how you specify the split for partitioning.

## How do you specify the split to partition?

The specification of the split happens as you specify the `partitioners` argument for `FederatedDataset`. It maps `partition_id: str` to the partitioner that will be used for that split of the data. In the example below we're using `train` split of the `cifar10` dataset to partition.

(If you're unsure why/how we chose the name of the `dataset` and how to customize it, see the first tutorial.)

In [None]:
from flwr_datasets import FederatedDataset

fds = FederatedDataset(dataset="cifar10", partitioners={"train": partitioner})
iid_partition = fds.load_partition(partition_id=0)
iid_partition

Dataset({
    features: ['img', 'label'],
    num_rows: 5000
})

In [None]:
# Let's take a look at the first three samples
iid_partition[:3]

{'img': [<PIL.PngImagePlugin.PngImageFile image mode=RGB size=32x32>,
  <PIL.PngImagePlugin.PngImageFile image mode=RGB size=32x32>,
  <PIL.PngImagePlugin.PngImageFile image mode=RGB size=32x32>],
 'label': [1, 2, 6]}

## Using Different `Partitioners`

**Why would you need to use different `Partitioner`s?**

There are a few way that the data partitioning is simulated in the literature, `Flower Datasets` let's you work with the different approaches that have been proposed so far. If enable you to simulate partitions with different properties and different levels of heterogeneity.


**How to use different `Partitioner`s?**

To use a different `Partitioner` you just need to create a different object (note it has typically different parameters that you need to specify). Then you pass as before to the `FederatedDataset`.

<div style="max-width:80%; margin-left: auto; margin-right: auto">
    <img src="./_static/tutorial-quickstart/partitioner-flexibility.png" alt="Partitioner flexibility display">
</div>
See the only changing part in yellow.


### `PathologicalPartitioner`

Now, we are going to create partitions that have only a subset of labels per in each partition.

In [None]:
from flwr_datasets.partitioner import PathologicalPartitioner

pathological_partitioner = PathologicalPartitioner(
    num_partitions=10, partition_by="label", num_classes_per_partition=2
)

fds = FederatedDataset(
    dataset="cifar10", partitioners={"train": pathological_partitioner}
)
partition_pathological = fds.load_partition(partition_id=0)
partition_pathological

Dataset({
    features: ['img', 'label'],
    num_rows: 2501
})

In [None]:
# Let's take a look at the first three samples
partition_pathological[:3]

{'img': [<PIL.PngImagePlugin.PngImageFile image mode=RGB size=32x32>,
  <PIL.PngImagePlugin.PngImageFile image mode=RGB size=32x32>,
  <PIL.PngImagePlugin.PngImageFile image mode=RGB size=32x32>],
 'label': [0, 0, 7]}

In [None]:
import numpy as np

np.unique(partition_pathological["label"])

array([0, 7])

# Final remarks
Congratulations, you now know how to use different `Partitioner`s with `FederatedDataset` in Flower Datasets.

# Next Steps
This is the second quickstart tutorial from the Flower Datasets series. See next tutorials:

* Visualize Label Distribution [link](https://flower.ai/docs/datasets/how-to-visualize-label-distribution.html).

Previous tutorials:
* Quickstart Basics [link](https://flower.ai/docs/datasets/quickstart-tutorial.html)