## Image classification with Convolutional Neural Networks

Welcome to the first week of the second deep learning certificate! We're going to use convolutional neural networks (CNNs) to allow our computer to see - something that is only possible thanks to deep learning.

## Dogbreeds Kaggle Competition

We're going to try to create a model to enter the Dogs vs Cats competition at Kaggle. There are 25,000 labelled dog and cat photos available for training, and 12,500 in the test set that we have to try to label for this competition. According to the Kaggle web-site, when this competition was launched (end of 2013): "State of the art: The current literature suggests machine classifiers can score above 80% accuracy on this task". So if we can beat 80%, then we will be at the cutting edge as of 2013!

In [1]:
# Put these at the top of every notebook, to get automatic reloading and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline

Here we import the libraries we need. We'll learn about what each does during the course.

In [2]:
# This file contains all the main external libs we'll use
from fastai.imports import *

In [3]:
from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *
import pandas as pd

In [48]:
#torch.cuda.set_device(1)
PATH = "data/dogbreeds/"

In [58]:
sz=224
#arch=resnext101_64
arch=resnet50
bs=58

In [59]:
label_csv=f'{PATH}labels.csv'
type(label_csv)
print(label_csv)

data/dogbreeds/labels.csv


In [60]:
n=len(list(open(label_csv)))-1
print(n)
val_idxs=get_cv_idxs(n)
print(len(val_idxs))

10222
2044


## Lets look at some dog pictures

In [61]:
!ls {PATH}

labels.csv	sample_submission.csv	   test      tmp    train.zip
labels.csv.zip	sample_submission.csv.zip  test.zip  train


In [62]:
label_df=pd.read_csv(label_csv)

In [63]:
label_df.head()

Unnamed: 0,id,breed
0,000bec180eb18c7604dcecc8fe0dba07,boston_bull
1,001513dfcb2ffafc82cccf4d8bbaba97,dingo
2,001cdf01b096e06d78e9e5112d419397,pekinese
3,00214f311d5d2247d5dfe4fe24b2303d,bluetick
4,0021f9ceb3235effd7fcde7f7538ed62,golden_retriever


In [64]:
tfms=tfms_from_model(arch, sz, aug_tfms=transforms_side_on, max_zoom=1.1)
data = ImageClassifierData.from_csv(PATH, 'train', f'{PATH}labels.csv', test_name='test',val_idxs=val_idxs, suffix='.jpg', tfms=tfms, bs=bs)

In [65]:
fn=PATH+data.trn_ds.fnames[0]
fn

'data/dogbreeds/train/001513dfcb2ffafc82cccf4d8bbaba97.jpg'

Here is how the raw data looks like

In [66]:
img=PIL.Image.open(fn); 

In [67]:
size_d={k:PIL.Image.open(PATH+k).size for k in data.trn_ds.fnames}

In [68]:
row_sz,col_sz=list(zip(*size_d.values()))
row_sz=np.array(row_sz)
col_sz=np.array(col_sz)

In [69]:
len(data.test_ds)


10357

In [70]:
def get_data(sz,bs):
    tfms=tfms_from_model(arch,sz,aug_tfms=transforms_side_on,max_zoom=1.1)
    data= ImageClassifierData.from_csv(PATH,'train',f'{PATH}labels.csv',test_name='test',val_idxs=val_idxs,suffix='.jpg',tfms=tfms, bs=bs)
    return data if sz>300 else data.resize(340,'tmp')

In [71]:
data=get_data(sz,bs)




In [72]:
data.trn_ds

<fastai.dataset.FilesIndexArrayDataset at 0x7f413dda8470>

In [79]:

learn = ConvLearner.pretrained(arch, data, precompute=True, ps=0.5)
learn.fit(0.1,5, cycle_len=1)

[0.      0.99056 0.51493 0.84907]                            
[1.      0.68134 0.51108 0.86056]                            
[2.      0.55632 0.52291 0.85769]                            
[3.      0.45987 0.48901 0.86344]                            
[4.      0.41669 0.52042 0.86097]                            



In [80]:
learn.save('224_pre')
learn.load('224_pre')

In [81]:
learn.set_data(get_data(299,bs))




In [83]:
learn.freeze()

In [None]:
learn.fit(0.01,3, cycle_len=1, cycle_mult=2)

[0.      0.42166 0.40125 0.88355]                            
[1.      0.40729 0.39414 0.88068]                            
[2.      0.37906 0.38799 0.88355]                            
 77%|███████▋  | 109/141 [01:21<00:24,  1.33it/s, loss=0.428]