# Fruit classifier

This notebook trains a classifier for fruits. We use the [Fruits 360](https://www.kaggle.com/moltean/fruits) dataset from Kaggle.

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
import os
from fastai.vision import *
from fastai.metrics import error_rate

In [None]:
bs = 64

## Downloading the data

You have to create an account with kaggle if you don't already have one. Then you should download your api credentials and paste them below.

In [None]:
!pip install kaggle

In [None]:
from kaggle.api.kaggle_api_extended import KaggleApi

# Set environment variables.
os.environ['KAGGLE_USERNAME'] = "x" # change this line to match your credentials
os.environ['KAGGLE_KEY'] = "x" # change this line to match your credentials

# Authenticate and download dataset
api = KaggleApi()
api.authenticate()
api.dataset_download_files("moltean/fruits", unzip=True)
path = (Path('./fruits-360/Training'))

Once we have the dataset downloaded, we can verify the images using `verify_images` ([Docs](https://docs.fast.ai/vision.data.html#verify_images)). This function will remove any images that cannot be used.

In [None]:
for c in path.ls():
    print(c)
    verify_images(path/c, delete=True, max_size=500)

Next we should convert the data to the correct format (an `ImageDataBunch`).

In [None]:
np.random.seed(2)
data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2,
        ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

## Looking at the data

If everything went OK, you should have 114 classes loaded.

In [None]:
data.c

In [None]:
data.show_batch(rows=3, figsize=(7,8))

## Training: resnet34

Next we'll train the model using resnet34.

In [None]:
learn = cnn_learner(data, models.resnet34, metrics=error_rate)

In [None]:
learn.fit_one_cycle(4)

In [None]:
learn.save('stage-1')

## Results

We inspect the results using `ClassificationInterpretation`.

In [None]:
interp = ClassificationInterpretation.from_learner(learn)
losses,idxs = interp.top_losses()
len(data.valid_ds)==len(losses)==len(idxs)

Next we can inspect the top losses. Using domain knowledge, you can learn why the model misclassified these examples and determine which, if any, features to add. 

In [None]:
interp.plot_top_losses(9, figsize=(15,11), heatmap=False)

Finally, you should look at the [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix). If your model got some examples wrong, but not consistenly misclassifies a category, it is considered normal behaviour.

In [None]:
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)

## Fine tuning learning rates

By plotting the error using multiple learning rates, we can choose a learning rate that better fits our model.

In [None]:
learn.lr_find()
learn.recorder.plot()

In [None]:
learn.unfreeze()
learn.fit_one_cycle(2, max_lr=slice(1e-6,10 ** -5.5))

By [Rick Wierenga](https://twitter.com/rickwierenga).