# Transfer learning application using `fastai`

Libraries built on top of `pytorch` such as `fastai` provide easy ways of using pre-trained models. The next snippets of code use a pre-trained `mobilenet` network and re-trains the last layer to classify our fossil dataset.

In [None]:
from fastai.vision import ImageDataBunch, models, accuracy, cnn_learner, ClassificationInterpretation

Load the data using the `ImageDataBunch` object. Its method `from_folder` needs as an argument the parent directory where the images are. Note that it'll infer classes and will take care of processing all images so that they're consistent for training. You can also specify the location of a validation dataset in that directory. There are various other `from_` methods available. Check them out in the `fastai` [docs](https://docs.fast.ai/vision.data.html#ImageDataBunch).

## Data loading - Local

You can get the data here >> https://swung-data.s3.amazonaws.com/fossilnet/fossilnet-png-224px.zip

In [None]:
import glob

len([f for f in glob.glob("../data/fossils/fossilnet-png-224px/train/*/*")])

In [None]:
from PIL import Image

Image.open('../data/fossils/fossilnet-png-224px/train/ammonites/00001.png')

In [None]:
data_dir = '../data/fossils/fossilnet-png-224px/'

## Data loading - GDrive

You will need to add this folder to your Google Drive: https://drive.google.com/drive/folders/14lrVjhBQP7z4eGP4mAQetPC-KD0a0RMo?usp=sharing

This next cell is for loading the data on Google Drive:

In [None]:
import glob
from google.colab import drive

drive.mount('/content/gdrive')

len([f for f in glob.glob("/content/gdrive/My Drive/fossilnet/train/*/*.png")])

This target directory contains:

In [None]:
ls "/content/gdrive/My Drive/fossilnet/"

Inside each of those there's a per-class folder containing al the image files:

In [None]:
ls "/content/gdrive/My Drive/fossilnet/train"

In [None]:
from PIL import Image

img = Image.open('/content/gdrive/My Drive/fossilnet/train/ammonites/00001.png')
img

In [None]:
data_dir = "/content/gdrive/My Drive/fossilnet/"

## Make a `fastai` dataset

We can instantiate the `ImageDataBunch` object to collect all images like this:

In [None]:
data = ImageDataBunch.from_folder(data_dir,
                                  valid='val',
                                  size=128,
                                  classes=['ammonites', 'plants', 'fishes', 'trilobites'],
                                 )

We can create a "learner" object using a pre-trained network using the `cnn_learner` function. Note that this is where we specify what trained network we want to use and what metrics we want to track. Check out what other parameters and options there are in the fastai [docs](https://docs.fast.ai/vision.learner.html)

In [None]:
dir(models)

In [None]:
cnn = cnn_learner(data, models.mobilenet_v2, metrics=accuracy)

Finally, train the learner. It'll take a few minutes.

In [None]:
cnn.fit(5)

In [None]:
from fastai.basics import DatasetType

cnn.show_results(DatasetType.Train)

## Inference

We can make a prediction on a single image. The model 'knows' that it has to resize the image, etc.

In [None]:
from fastai.vision import open_image
import requests
import io

In [None]:
r = requests.get("https://geology.com/articles/green-river-fossils/fossil-fish-lg.jpg")
    
item = open_image(io.BytesIO(r.content))

cnn.predict(item)

In [None]:
cnn.data.classes

## More evaluation tools

`fastai` also provides supporting objects to asses the quality of the predictions. Below, we'll use it to visualize the predictions of the worst performing classifications based on the training loop we ran above

In [None]:
interp = ClassificationInterpretation.from_learner(cnn)

In [None]:
interp.confusion_matrix()

In [None]:
interp.plot_top_losses(9, figsize=(12,12))

# 'Probability' is the P of the actual class.