# Dog Breeds Image Classifier
### This model can recognize five different dog breeds: Siberian Husky, Shih Tzu and Pug!


 We first make the set up of the notebook and import some libraries 

In [None]:
!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

[K     |████████████████████████████████| 720 kB 7.4 MB/s 
[K     |████████████████████████████████| 1.2 MB 16.7 MB/s 
[K     |████████████████████████████████| 46 kB 5.6 MB/s 
[K     |████████████████████████████████| 188 kB 34.1 MB/s 
[K     |████████████████████████████████| 56 kB 5.4 MB/s 
[K     |████████████████████████████████| 51 kB 411 kB/s 
[?25hMounted at /content/gdrive


In [None]:
from fastbook import *
from fastai.vision.widgets import *
from zipfile import ZipFile
import requests
import urllib.request

### Downloading and extracting the dataset

Using the variable 'dog_breeds' we set up all the categories that our model can predict, and then we download directly from the [GitHub repository of our project](https://github.com/OhLK/Project-LE530) the dataset containing images of the three different breeds of dog. After downloading the dataset, we use a function to extract it, because it is downloaded as a '.zip' file.

In [None]:
dog_breeds = 'siberian_husky','shih_tzu','pug'

path = URLs.path('dataset')
if not path.exists():
    path.mkdir(parents=true)

url = 'https://codeload.github.com/OhLK/Project-LE530/zip/refs/heads/main'
r = requests.get(url)

with open(path/'dataset.zip', "wb") as code:
    code.write(r.content)

urllib.request.urlretrieve(url)

with ZipFile(path/'dataset.zip', 'r') as zipObj:
   zipObj.extractall(path)

path = path/'Final-Project-main/dataset'
path.ls()

Now we check if there are some corrupted images in our dataset and then remove them.

In [None]:
fns = get_image_files(path)
fns

In [None]:
failed = verify_images(fns)
warnings.filterwarnings("ignore", "(Possibly )?Corrupt EXIF data", UserWarning)

In [None]:
failed.map(Path.unlink)

### Creating a DataBlock and a DataLoader

In [None]:
dogs = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.3, seed=42), #30% to the validation set, random image distribution
    get_y=parent_label,
    item_tfms=Resize(128))

After creating the `DataBlock` above, we create a `DataLoader` to hold our data and suplly us with usefull functions.

In [None]:
dls = dogs.dataloaders(path)

Now, we can check what is inside of our `DataLoader`. Let's see a batch with fifteen images from our valid dataset.

In [None]:
dls.valid.show_batch(max_n=15, nrows=3)

###Doing image augmentation

Here is an example of image augmentation using `aug_transforms`.

In [None]:
dogs = dogs.new(item_tfms=Resize(128), batch_tfms=aug_transforms(mult=2))
dls = dogs.dataloaders(path)
dls.train.show_batch(max_n=8, nrows=2, unique=True)

Below we apply `aug_transforms `to our whole dataset. That's a good thing to do because the model will train with more varied image positions, quality, and angles.

In [None]:
dogs = dogs.new(
    item_tfms=RandomResizedCrop(224, min_scale=0.5),
    batch_tfms=aug_transforms())
dls = dogs.dataloaders(path)

### Training our model

We already have preperad our data and now we're ready to train our model!

In [None]:
learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(4)

### Checking the results after the training process

We use the confusion matrix to see the results of our model and check if it has done some wrong predictions.

In [None]:
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

Now we can plot the predictions with the higher loss and see if there are some images with wrong labels or images that simply shouldn't be in our dataset.

In [None]:
interp.plot_top_losses(3, nrows=3)

Above our model has helped us to see if there are images that shouldn't be in our dataset using `interp.plot_top_losses`. If we want, we can delete these images using the widget below.

In [None]:
cleaner = ImageClassifierCleaner(learn)
cleaner

In [None]:
for idx in cleaner.delete(): cleaner.fns[idx].unlink()
for idx,cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat)

After cleaning our dataset using our model to help us, we can retrain our model and increase its accuracy.

###Exporting the trained model to a '.pkl' file

In [None]:
learn.export()

In [None]:
path = Path()
path.ls(file_exts='.pkl')

In [None]:
learn_inf = load_learner(path/'export.pkl')

After exporting the trained model to a '.pkl' file, we use it to predict what is the breed of the dog in the image below:

In [None]:
path = URLs.path('dataset')
path = path/'Project-LE530-main/dataset'

im = Image.open(path/'siberian_husky/00000117.jpg')
im.to_thumb(300,300)

In [None]:
learn_inf.predict(path/'siberian_husky/00000111.jpg')

It predicted right! It is a SIberian Husky, the tensor above show us the probabilty of the the image be of one of the five breeds.

In [None]:
learn_inf.dls.vocab

The vocab is simply the categories that our model can recognize.