In [None]:
import fastbook
fastbook.setup_book()

from fastbook import *
from fastai.vision.widgets import *

# Preparation

Set the path where to find the gathered images and show a single image to validate wether we are in the correct location.

In [None]:
from PIL import Image
path = 'images/'
im = Image.open(path + 'grape/grape-0.jpg')
im.to_thumb(128,128)

Create a variable "filenames" that contains a listing of all paths to all files in the data folder.

In [None]:
from fastai.vision.all import *
filenames = get_image_files(path)
filenames

Verify all images, to make sure none of them are corrupt files.

In [None]:
failed = verify_images(filenames)
failed

Now we have to read-in the data for our modelling. First lets create our own DataLoaders object. This is done in FastAI by using the 'data block API'.

### Class DataBlock
- blocks
    - Tuple where we specify what types we want for the independent (= what we are using to make predictions from) and dependent (= the categories for each image) variables.
- get_items
    - We have to tell FastAI how to get a list of images, this is our dataset. The get_image_files function takes a path, and returns a list of all of the images in that path.
- splitter
    - We want to split our training and validation sets randomly. The seed makes sure we always get the same random values each time we run this codeblock.
- get_y
    - y = the dependent variable = our labels. We tall FastAI to call the function parent_label to create the labels in our dataset based on the directory names. (In our case "apple", "avocado", "apple, "grape" & "orange")
- item_tfms
    - We have to transform the images all in the same size, so that we can feed several images at once to the model. The function Resize(128) will resize all the images to 128x128 pixels.

In [None]:
fruits = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=Resize(128))

Now we have our DataBlock object, we have to feed it the path to where it can find the data. In our case this is 'images/'.

FastAI will do all the rest: loading the data into train/valid sets.

In [None]:
dls = fruits.dataloaders(path)

By default, FastAI has a default batch_size of 64.

### Random crop size
Research shows, it's usually best to do a kind of random resized cropping. So every epoch, another random section of the image is chosen. So every epoch, the network focuses on a different part of the image. We give a min_scale, otherwise it is possible for the network to randomly select a very small piece.

### Other augmentations
We also rotate, flip and adjust the brightness to have more unique images to train the model.

In [None]:
fruits = fruits.new(item_tfms=RandomResizedCrop(128, min_scale=0.5), batch_tfms=[Rotate(), Flip(), Brightness()])
dls = fruits.dataloaders(path)
dls.train.show_batch(max_n=8, nrows=2, unique=True)

## Transfer learning
We make use of the vgg19_bn network to do trasfer learning.
We tried different pretrained CNN networks, and concluded that the vgg19_bn network gave the best result.

We make use of 4 fine-tune iterations. 1 fine-tune means: train the randomly added layers for one epoch, with other layers frozen. Then unfreeze all of the layers and trains them all for the number of epochs requested.

(We don't specify the learning rate: so the default is 1e-3 = 0.001)

In [None]:
metrics = [accuracy, error_rate]
our_out_of_the_box_model = cnn_learner(dls, vgg19_bn, loss_func=CrossEntropyLossFlat(), metrics=metrics )
our_out_of_the_box_model.fine_tune(4)

# Model performance
## Confusion matrix

In [None]:
interp = ClassificationInterpretation.from_learner(our_out_of_the_box_model)
interp.plot_confusion_matrix()

In [None]:
our_out_of_the_box_model.save('model')
our_out_of_the_box_model.export()