# Set-up and Preparation

In this chapter, all our examples will be centered around training an [PyTorch](https://pytorch.org/)-based image classifier on the [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset. 

Our goal is to show case how you can easily convert and run these examples using the Federated Averaging ([FedAvg](https://arxiv.org/abs/1602.05629)) algorithm, with [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html)'s APIs. 

Let's first go ahead and set everything up.

## Install NVIDIA FLARE and dependencies

Install nvflare and requirements


In [None]:
! pip install nvflare

In [None]:
! pip install -r code/requirements.txt

## Prepare data

The CIFAR10 dataset has the following classes: `airplane`, `automobile`, `bird`, `cat`, `deer`, `dog`, `frog`, `horse`, `ship`, `truck`.

The images in CIFAR-10 are of size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size.

![image](code/img/cifar10.png)

Before we start the training, we will first need to prepare the data. 

### Download the data

We've added a script in [code/data/download.py](code/data/download.py) that allows for downloading the CIFAR-10 dataset to a common location, so that all federated jobs could access it. The content of the script is displayed below:

```python

import argparse
import torchvision.datasets as datasets

# default dataset path
CIFAR10_ROOT = "/tmp/nvflare/data/cifar10"

def define_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument("--dataset_path", type=str, default=CIFAR10_ROOT, nargs="?")
    args = parser.parse_args()
    return args

def main(args):
    datasets.CIFAR10(root=args.dataset_path, train=True, download=True)
    datasets.CIFAR10(root=args.dataset_path, train=False, download=True)
```

The script takes a root `dataset_path` and downloads the training and test datasets to the given root directory from the torchvision dataset. 

Let's run the script:

In [None]:
! python3 code/data/download.py

We can examine the downloaded data

In [None]:
!tree /tmp/nvflare/data/cifar10/cifar-10-batches-py/

### Split the data

In real-world scenarios, the data will be distributed among different clients/sites. Since we are simulating real-world data, we need to split the data into different clients/sites. How to split the data depends on the type of problem or type of data. 

For simplicity, in this example we assume all clients will have the same data for horizontal federated learning cases.
Thus we do not do a data split, but rather point all clients to the same data location.

Next step, we will run our first federated application! Let's jump to: [run pytorch federated learning job](../01.1_running_federated_learning_job/running_pytorch_fl_job.ipynb).
