# Loading Image Data

So far we have been working with fairly artificial datasets that you would not typically be using in real projects. Instead, yo will likely be dealing with full-sized images like you would get from smart phone cameras. In this notebook, we will look at how to load images and use them to train neural networks.

We will be using a [dataset of cat and dog photos](https://www.kaggle.com/c/dogs-vs-cats) available from Kaggle. Here are a couple example images:

<img src='https://github.com/udacity/deep-learning-v2-pytorch/raw/master/intro-to-pytorch/assets/dog_cat.png'>

We will use this dataset to train a neural network that can differentiate between cats and dogs. This used to be a serious challenge for computer vision systems.

In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torchvision import datasets, transforms

import helper

The easiest way to load image data is with `datasets.ImageFolder` from `torchvision` ([documentation](http://pytorch.org/docs/master/torchvision/datasets.html#imagefolder)). In general you will use `ImageFolder` like so:

```python
dataset = datasets.ImageFolder('path/to/data/', transform=transform)
```

where `'path/to/data'` is the file path to the data directory and `transform` is a list of processing steps built with the [`transforms`](http://pytorch.org/docs/master/torchvision/transforms.html) module from `torchvision`. ImageFolder expects the files and directories to be constructed like so:
```
root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png

root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png
```

where each class has its own directory (`cat` and `dog`) for the images. The images are then labeled with the class taken from the directory name. So here, the image `123.png` would be loaded with the class label `cat`. You can download the dataset already structured like this [from here](https://s3.amazonaws.com/content.udacity-data.com/nd089/Cat_Dog_data.zip). This is already split into a training set and test set.

### Transforms

#### [fill in here...]

### Data Loaders

With the `ImageFolder` loaded, you have to pass it to a [`DataLoader`](http://pytorch.org/docs/master/data.html#torch.utils.data.DataLoader). The `DataLoader` takes a dataset (such as you would get from `ImageFolder`) and returns batches of images and the corresponding labels. You can set various parameters like the batch size and if the data is shuffled after each epoch.

```python
dataloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True)
```

Here `dataloader` is a [generator](https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/). To get data out of it, you need to loop through it or convert it to an iterator and call `next()`.

```python
# Looping through it, get one batch on each loop
for images, labels in dataloader:
  pass

# Get one batch
images, labels = next(iter(dataloader))
```

>**Exercise:** Load images from the `Cat_Dog_data/train` folder, define a few transforms, then build the dataloader.

In [5]:
# Uncomment these lines to download and unzip files as  directory
#!wget https://s3.amazonaws.com/content.udacity-data.com/nd089/Cat_Dog_data.zip
#!unzip Cat_Dog_data.zip

In [None]:
data_dir = 'Cat_Dog_data/train'