<img src="./images/DLI_Header.png">

# Workshop Assessment

Congratulations on all your work thus far in the workshop! You have covered a lot of ground, and now is your chance to demonstrate what you've learned on a novel problem. If you are able to do so successfully you will earn a certificate of competency in the course.

## Assessment Duration

You may or may not have sufficient time to complete this assessment before the alotted time for the workshop today. Please don't worry, if you are unable to complete the assessment before the workshop ends today, you may return to this interactive session at your leisure to try the assessment again.

Your work will **not be saved** in between interactive sessions, so it is important if you would like to continue where you leave off between interactive sessions to **save any relevant files to your local machine** before exiting an interactive session and then, via JupyterLab's file menu, drag and drop them back into new interactive sessions so you can pick up where you left off.

You might consider taking a look at the browser tab where you launched this interactive sessions to check how much time you have remaining at the moment before your session times out. Again, you can use that same browser page to re-launch the session at your leisure.

## The Assessment Prompt

For the assessment, you will refactor [`assessment.py`](assessment.py), which already runs successfully on a single GPU, to run instead on all 4 GPUs available in this environment, using `DDP`. Open [the file now](assessment.py) and spend several minutes familiarizing yourself with the code, which you'll notice trains on [the CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html), which consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

**Your goal will be to achieve a validation accuracy of at 0.75 for at least two consecutive epochs in under 240 seconds.**

As it stands, `assessment.py` can achieve a validation accuracy of 0.75 for at least two consecutive epochs, however, it takes well over the alotted time to do so. Immediately below is the output from a run of `assessment.py` performed at an earlier time, so that you do not have to take the time to run the script yourself:

```
Epoch =  1: Cumulative Time = 88.982, Epoch Time = 88.982, Images/sec = 561.0093580791115, Training Accuracy = 0.505, Validation Loss = 1.126, Validation Accuracy = 0.611
Epoch =  2: Cumulative Time = 177.335, Epoch Time = 88.353, Images/sec = 565.0089655045808, Training Accuracy = 0.685, Validation Loss = 0.885, Validation Accuracy = 0.708
Epoch =  3: Cumulative Time = 264.656, Epoch Time = 87.321, Images/sec = 571.6856192674654, Training Accuracy = 0.747, Validation Loss = 0.604, Validation Accuracy = 0.800
Epoch =  4: Cumulative Time = 352.513, Epoch Time = 87.857, Images/sec = 568.1936135215041, Training Accuracy = 0.781, Validation Loss = 0.566, Validation Accuracy = 0.808
Early stopping after epoch 4
```

However, if you would like to run the script yourself, feel free to execute the cell below:

In [None]:
!python3 assessment.py --batch-size 128 --target-accuracy 0.75 --patience 2

## Guidelines for the Assessment

For the sake of your own learning, we challenge you to work as much from scratch as you can to solve the assessment. However, to support you in your work, should you need them, a copy of both [the notebook](99_Reference_DDP.ipynb) and [the solution script](lab-2_fashion_mnist_solution.py) from lab 2 have been provided to you to serve as a refresher.

You should run the cell below to check your work. Please note that you will need to update `assessment.py`, at the least to expect additional arguments, before the cell below will run without error.

In [None]:
!python3 solution.py --node-id 0 --num-gpus 4 --num-nodes 1  --batch-size 128 --target-accuracy 0.75 --patience 2

Once you are able to execute the cell above, observing a `Cumulative Time` of less than 240 seconds, return to the browser tab where you launched this interactive environment and click the **ASSESS** button. Doing so will kick off the assessment harness, which will run your version of `assessment.py` in its entiretly and perform several checks to make sure you have completed the objectives. The assessment will take as long to run as your script takes to complete, so please be patient.

Once your code has run and been evaluated, you will recieve a pop-up message indicating whether or not you have completed the assessment. If you have, you will receive a link to your certificate by way of email. If not, you will receive a message indicating what you may still need to do.

## Steps

`import torch.distributed as dist`

```python
# Allow input args for num_nodes, node_id, num_gpus
parser.add_argument('--num-nodes', type=int, default=1,
                    help='Number of available nodes/hosts')
parser.add_argument('--node-id', type=int, default=0,
                    help='Unique ID to identify the current node/host')
parser.add_argument('--num-gpus', type=int, default=1,
                    help='Number of GPUs in each node')
```

```python
# Establish world size
WORLD_SIZE = args.num_gpus * args.num_nodes
```

```python
# Set `MASTER_ADDR` and `MASTER_PORT` environment variables
os.environ['MASTER_ADDR'] = 'localhost' 
os.environ['MASTER_PORT'] = '9956'
```

```python
# Move single-GPU `main` functionality into `worker` function that expects `local_rank` in addition to passed-in `args`
def worker(local_rank, args):
```

```python
# Establish this worker's global rank
global_rank = args.node_id * args.num_gpus + local_rank
```

```python
# Create distributed process group
dist.init_process_group( 
backend='nccl',  
world_size=WORLD_SIZE, 
rank=global_rank 
)
```

```python
# Only download data if local_rank is 0
download = True if local_rank == 0 else False
if local_rank == 0:
```

```python
# Wait until data is downloaded before loading it to non-0 ranks
dist.barrier()
```

```python
# Create distributed samplers for train and test data
train_sampler = torch.utils.data.distributed.DistributedSampler(
train_set,
num_replicas=WORLD_SIZE,
rank=global_rank
)

test_sampler = torch.utils.data.distributed.DistributedSampler(
test_set,
num_replicas=WORLD_SIZE,
rank=global_rank
)
```

```python
# Use distributed samplers for train and test data loader
# Training data loader
train_loader = torch.utils.data.DataLoader(train_set, 
                                           batch_size=args.batch_size, drop_last=True, sampler=train_sampler)
# Validation data loader
test_loader = torch.utils.data.DataLoader(test_set,
                                          batch_size=args.batch_size, drop_last=True, sampler=test_sampler)
```

```python
# Set device to appropriate GPU for this rank
device = torch.device("cuda:" + str(local_rank) if torch.cuda.is_available() else "cpu")
```

```python
# Wrap model with nn.parallel.DistributedDataParallel
model = nn.parallel.DistributedDataParallel(model, device_ids=[local_rank])
```

```python
# Change distributed shuffling pattern for each epoch
train_sampler.set_epoch(epoch)
```

```python

```