# Swimming Mammal Recognition

I will try and buil a ML algorithm that is capable of identifying almost 28k unique individual marine mammals belonging to 30 species.

Dataset can be downloaded from [here](https://www.kaggle.com/bdsaglam/happy-whale-512?select=sample_submission.csv). This is the "simplified version" of the [original dataset](https://www.kaggle.com/c/happy-whale-and-dolphin/data) in that all the figures have the same size (512x512) -- if smaller they have been padded. 

The files are already divided into train and test folders, although there is no `species` label associated to each test image. *** Review this ***

My idea is:
1. train a non-pretrained resnet34 network from scratches (`xresnet34` in the fastai libraries);
1. train a pretrained resnet34 network;
1. compare the results and comment on that.

## Initial imports and setups

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
# import fastai
# fastai.__version__ 
# 2.5.3

from fastai.vision.all import *
from fastai.callback import *
from fastai.metrics import error_rate
import pandas as pd
import os, csv

#! Fixing issues with parallel processes (or at least suppressing Warnings...)
# os.environ["OMP_NUM_THREADS"] = "1"

## Creating dataloader

Defining all the paths and reading the csv file containing the labels (i.e. the mammal species) associated to each image.

In [3]:
path = os.getcwd()
path +='/archive'
path_images = path+'/train_images/'
path_img = path+'/train.csv'
#* Read csv file using pandas
df =pd.read_csv(path_img)
#df.head	#* printing datafile names

Creating the dictionary with the structure:

```
	image_name: species
```

and the function to retrieve the species starting from the filename.

In [4]:
species = {}
with open(path_img) as file:
	reader = csv.DictReader(file)
	for row in reader:
		species[row['image']] = row['species']

def label_func(fname):
    return species[str(fname).split('/')[-1]]

Given our ***dataset*** we want to create a dataloader for it, that splits the dataset into a *test* and a *training* set. Images are resized to 64x64 in order to make the process quicker on a laptop, and are padded if necessary (should not, since the images I refer to in the link above are all larger).

I load the images from the folder specified in `path/train_images` using the labels loaded from the csv file, saved in `df`:

In [6]:
dls = ImageDataLoaders.from_df(df, 
	path,
	folder = 'train_images',
	item_tfms=Resize(64,method=ResizeMethod.Pad),
	batch_tfms=Normalize.from_stats(*imagenet_stats))

# dls.show_batch()

In [7]:
# getting all images in subfolders
fnames = get_image_files(path)
#fnames[:10]

I now create the DataBlock, i.e. how to assemble data as follows:

1. grab the images with the built-in feature `get_image_files`;
1. grab the species associated to each image using the function I have defined above `label_func`;
1. transform the images, resizing them to 64x64 and padding them;
1. splitting the images into a validation and training set, setting myself a `seed` for the random number generator;
1. normalising the images within the batch

In [None]:
dblock = DataBlock(blocks = (ImageBlock, CategoryBlock),
	get_items = get_image_files,
	get_y = label_func,
	item_tfms=Resize(64,method=ResizeMethod.Pad),
	splitter=RandomSplitter(valid_pct=.1, seed=42),
	# batch_tfms=[*aug_transforms(size=64, max_warp=0), Normalize.from_stats(*imagenet_stats)]
	batch_tfms=Normalize.from_stats(*imagenet_stats)
	)


In [None]:
dsets = dblock.datasets(path)
no_species = len(dsets.vocab) # returns the total number of species
# dsets.train[:10]

We don't need to do anything else since the images are already of size 512x512 and if smaller, these are patched.

We can now transform the DataBlock into DataLoaders, using a batchsize of 16

In [None]:
dls = dblock.dataloaders(path_images)
# dls.show_batch(max_n=16)

## Start the learning process

We want to train a model from scratch since there is no such a pre-trained NN.

We might use a CNN learner with the `pretrained` option set to `False`, or otherwise we can a set of resnet models that have "all the tricks from modern research incorporated". Basically I use xresnet34 and xresnet50 models, specifying the number of classes we expect to see as a result, and nothing else. I will do and comment all the steps for the xresnet34 and then repeat them all in the xresnet50 case.

### xresnet34

I define here two models to use for fitting, namely xresnet34 and xresnet50 with no particular activation functions or any further specification.

In [None]:
netw34_subopt = xresnet34(n_out=no_species, pretrained = False)
# netw34_subopt[0]

Launching the learning process. In order to do so, I specify the metrics, `accuracy` and `error_rate`.

In [None]:
learn34_subopt = Learner(dls,model = netw34_subopt, metrics =[accuracy,error_rate])
#  metrics=[partial(accuracy_multi, thresh=0.85)], loss_func=BCEWithLogitsLossFlat(thresh=0.5))

First, I launch the learning process without any suggestion about the learning rate, and see how it goes:

epoch	train_loss	valid_loss	accuracy	error_rate	time
0	1.095772	1.174001	0.635509	0.364491	29:32
1	0.738535	0.781627	0.756026	0.243974	29:50
2	0.505455	0.569351	0.820694	0.179306	31:07
3	0.301352	0.420873	0.866745	0.133255	29:24
4	0.186283	0.377023	0.881246	0.118754	29:47

In [None]:
learn34_subopt.fit_one_cycle(5)

I now try and optimise the learning rate, re-launching the `fit_one_cycle` procedure with the optimal lr obtained with `lr_find()`:

In [None]:
learn34_subopt.save('dorsal_xresnet34_v1')
# learn34_subopt.summary()

In [None]:
learn34_subopt.lr_find()

The following cell yields:

| epoch | train_loss | valid_loss | accuracy | error_rate | time |
| --- | --- | --- | --- | --- | --- |
| 0 | 0.180385 | 0.379558 | 0.881246 | 0.118754 | 29:57

In [None]:
learn34_subopt.fit_one_cycle(1,4e-5)
# Yields: 
#epoch	train_loss	valid_loss	accuracy	error_rate	time
#0	0.180385	0.379558	0.881246	0.118754	29:57

Checking the results:

In [None]:
learn34_subopt.show_results()

In [None]:
interp34 = Interpretation.from_learner(learn34_subopt)
interp34. plot_top_losses(16,figsize=(15,10))

In [None]:
learn34_subopt.save('dorsal_xresnet34_v2')

### xresnet50

In [None]:
netw50_subopt = xresnet50(n_out=no_species, pretrained = False)
learn50_subopt = Learner(dls,model = netw50_subopt, metrics=[accuracy,error_rate,Precision(),Recall()])
# netw50_subopt[0]

In [None]:
learn50_subopt.fit_one_cycle(5)

In [None]:
learn50_subopt.lr_find()

In [None]:
learn50_subopt.fit_one_cycle(3,1e-3)

In [None]:
learn50_subopt.show_results()

In [None]:
interp50 = Interpretation.from_learner(learn50_subopt)
interp50. plot_top_losses(16,figsize=(15,10))

In [None]:
learn50_subopt.save('dorsal_xresnet50')

In [None]:
learn50.save('dorsal_fin_resnet50_1')

In [None]:
learn50.show_results()

## Second

In [None]:
dblock2 = DataBlock(blocks = (ImageBlock, CategoryBlock),
	get_items = get_image_files,
	get_y = label_func,
	item_tfms=Resize(256,method=ResizeMethod.Pad),
	splitter=RandomSplitter(valid_pct=.1, seed=42),
	batch_tfms=[*aug_transforms(size=128, max_warp=0), Normalize.from_stats(*imagenet_stats)]
	# batch_tfms=Normalize.from_stats(*imagenet_stats)
	)


In [None]:
dls2 = dblock2.dataloaders(
	path_images
	# multiprocessing_context=torch.multiprocessing.get_context('spawn')
	)

In [None]:
netw34_opt = xresnet34(n_out=no_species, pretrained = False)
learn34_opt = Learner(dls2,model = netw34_opt, metrics =[accuracy,error_rate])
learn34_opt.lr_find()

## Third

In [None]:
learn = cnn_learner(dls2, resnet34, metrics=error_rate)
learn.lr_find()

## Results

Let's see what results we have got. 

We will first see which were the categories that the model most confused with one another. We will try to see if what the model predicted was reasonable or not. In this case the mistakes look reasonable (none of the mistakes seems obviously naive). This is an indicator that our classifier is working correctly. 

Furthermore, when we plot the confusion matrix, we can see that the distribution is heavily skewed: the model makes the same mistakes over and over again but it rarely confuses other categories. This suggests that it just finds it difficult to distinguish some specific categories between each other; this is normal behaviour.

In [None]:
learn34_subopt.load('dorsal_fin_1')
#learn50.load('dorsal_fin_resnet50_1')

In [None]:
interp = ClassificationInterpretation.from_learner(learn34_subopt)
#interp50 = ClassificationInterpretation.from_learner(learn50)

losses,idxs = interp.top_losses()
# losses50,idxs50 = interp50.top_losses()

len(dls.valid_ds)==len(losses)==len(idxs)
# len(dls.valid_ds)==len(losses50)==len(idxs50)

In [None]:
interp.plot_top_losses(9, figsize=(15,11))

In [None]:
interp50.plot_top_losses(9, figsize=(15,11))

In [None]:
doc(interp.plot_top_losses)
doc(interp50.plot_top_losses)

In [None]:
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)
interp50.plot_confusion_matrix(figsize=(12,12), dpi=60)

In [None]:
interp50.most_confused(min_val=2)
interp.most_confused(min_val=2)

## Unfreezing, fine-tuning, and learning rates

Since our model is working as we expect it to, we will *unfreeze* our model and train some more.

In [None]:
learn.unfreeze()

In [None]:
learn.fit_one_cycle(4)

Hm, that didn't really help. Good lesson to learn: just increasing the number of epochs may not always be a good strategy. Let's try something else. The "learning rate" is a *hyperparameter* that determines how quickly we make changes to the weights of the NN.

In [None]:
learn.load('pets-stage-1');

In [None]:
learn.lr_find()

In [None]:
learn.recorder.plot()

In [None]:
learn.unfreeze()
learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4))

That's a pretty accurate model!

## Training: resnet50

Now we will train in the same way as before but with one caveat: instead of using resnet34 as our backbone we will use resnet50 (resnet34 is a 34 layer residual network while resnet50 has 50 layers. It will be explained later in the course and you can learn the details in the [resnet paper](https://arxiv.org/pdf/1512.03385.pdf)).

Basically, resnet50 usually performs better because it is a deeper network with more parameters. Let's see if we can achieve a higher performance here. To help it along, let's us use larger images too, since that way the network can see more detail. We reduce the batch size a bit since otherwise this larger network will require more GPU memory.

In [None]:
data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(),
                                   size=299, bs=bs//2).normalize(imagenet_stats)

In [None]:
learn = cnn_learner(data, models.resnet50, metrics=error_rate)

In [None]:
learn.lr_find()
learn.recorder.plot()

In [None]:
learn.fit_one_cycle(8)

In [None]:
learn.save('pets-stage-1-50')

In [None]:
learn.load('pets-stage-1-50');

It's astonishing that it's possible to recognize pet breeds so accurately! Let's see if full fine-tuning helps:

In [None]:
learn.lr_find()
learn.recorder.plot()

In [None]:
learn.unfreeze()
learn.fit_one_cycle(3, max_lr=slice(1e-6,1e-4))

If it doesn't, you can always go back to your previous model.

In [None]:
learn.load('pets-stage-1-50');

In [None]:
interp = ClassificationInterpretation.from_learner(learn)

In [None]:
interp.plot_top_losses(9, figsize=(15,11))

In [None]:
interp.most_confused(min_val=2)


In [None]:
path = untar_data(URLs.MNIST_SAMPLE); path

In [None]:
tfms = get_transforms(do_flip=False)
data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=26)

In [None]:
data.show_batch(rows=3, figsize=(5,5))

In [None]:
learn = cnn_learner(data, models.resnet18, metrics=accuracy)
learn.fit(2)

In [None]:
df = pd.read_csv(path/'labels.csv')
df.head()

In [None]:
data = ImageDataBunch.from_csv(path, ds_tfms=tfms, size=28)

In [None]:
data.show_batch(rows=3, figsize=(5,5))
data.classes

In [None]:
data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)
data.classes

In [None]:
fn_paths = [path/name for name in df['name']]; fn_paths[:2]

In [None]:
pat = r"/(\d)/\d+\.png$"
data = ImageDataBunch.from_name_re(path, fn_paths, pat=pat, ds_tfms=tfms, size=24)
data.classes

In [None]:
data = ImageDataBunch.from_name_func(path, fn_paths, ds_tfms=tfms, size=24,
        label_func = lambda x: '3' if '/3/' in str(x) else '7')
data.classes

In [None]:
labels = [('3' if '/3/' in str(x) else '7') for x in fn_paths]
labels[:5]

In [None]:
data = ImageDataBunch.from_lists(path, fn_paths, labels=labels, ds_tfms=tfms, size=24)
data.classes

In [None]:
doc(ImageDataBunch.from_name_re)